[CSDb] - User Forums - Is there a difference packer? (=diff + patch)

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Is there a difference packer? (=diff + patch)

2007-02-16 17:38

tlr

Registered: Sep 2003
Posts: 1814

Is there a difference packer? (=diff + patch)

Is there a difference packer somewhere?
I have two similar binaries, but not identical.
They have code inserted in different places, so they don't line up. To support this the tool must handle inserted/removed sequences.

I found this Rizla+ V1.4, but it seems to handle only aligned data with a few differing bytes/strings.

Are there better ones?

2007-02-16 18:01

Twoflower

Registered: Jan 2002
Posts: 436

Rizla is the only one i've heard about. Which is a shame really.

2007-02-16 18:54

tlr

Registered: Sep 2003
Posts: 1814

Yes. It would be useful in many cases where there is a selectable version of things.
I remember stacking the Laser Genious low mem+high mem versions together and packing them up.
Wasn't very efficient. I guess the window of the packer wasn't big enough to catch the similarities.

A really efficient difference utility could use the fact that only certain instructions does absolute references and pack stuff that adheres to that rule more efficiently.

2007-02-17 09:48

MagerValp

Registered: Dec 2001
Posts: 1082

LZSS with 16-bit back references and the old binary as a pre-populated dictionary would do the trick...

2007-02-18 14:51

algorithm

Registered: May 2002
Posts: 707

if they are near identical, then modify them manually

2007-02-18 14:54

tlr

Registered: Sep 2003
Posts: 1814

Quote: if they are near identical, then modify them manually

Like I said. The same code isn't in the same place. Here and there code is inserted or removed. Besides they are ~110 blocks each.
I have some ideas, but as usual it's no fun redoing something that already might exists in a better version. ;)

2007-02-18 15:29

algorithm

Registered: May 2002
Posts: 707

such a thing is trivial to code though. surprised that there is only 1 program that does this

2007-02-18 15:47

chatGPZ

Registered: Dec 2001
Posts: 11523

there is such a program on one of the "coders orgasm" discs aswell. and i doubt there arent more, like algorithm said, its pretty trivial =)

2007-02-18 16:55

tlr

Registered: Sep 2003
Posts: 1814

Quote: there is such a program on one of the "coders orgasm" discs aswell. and i doubt there arent more, like algorithm said, its pretty trivial =)

Where can I find it? It's not in the database.
EDIT: found it. It's Rizla+ V1.4, which doesn't handle any moved blocks at all.

Sure, it's not _that_ difficult, but it's not entierly trivial either.
The LZSS with 64k back references will probably work somewhat. Exomizer has a 64k window IIRC. However the results are not very convincing.

For it to be any efficient it must probably take into account that a moved chunk of code has a different base address for its absolute references. (the binaries I intend to differentiate are mostly code)

What I consider doing is making a program that finds absolute instructions that references to within say 10-bits signed from the original location and replace those with a relative pseudo equivalent.
If we hypotetically can find a byte that is not used in our binary, this could be <code>, <index | offs msb>, <offs lsb>, where index is a 6-bit index into a table containing the opcodes ($4c, $20, $8d, ...)
(maybe 5-bit is enough)

Code matching does not need to be perfect. It's probably ok to say that for example any $4c-byte followed by an address within the range is transformed even though it was data or misaligned code.

I believe that after such a transformation LZ77/LZSS could be much more efficient.

2007-02-18 20:10

Mace

Registered: May 2002
Posts: 1799

TLR, it's very interesting to see where you're going, but I don't understand very much of it :-)

What do you mean with:Quote:

absolute instructions that references to within say 10-bits signed from the original location and replace those with a relative pseudo equivalent.

and Quote:

Code matching does not need to be perfect. It's probably ok to say that for example any $4c-byte followed by an address within the range is transformed even though it was data or misaligned code

2007-02-18 20:34

tlr

Registered: Sep 2003
Posts: 1814

Quote:

What do you mean with:
Quote:

absolute instructions that references to within say 10-bits signed from the original location and replace those with a relative pseudo equivalent.

In short, replace instructions that have absolute addressing mode, with fake instructions that have relative addressing mode.
Quote:

and
Quote:

Code matching does not need to be perfect. It's probably ok to say that for example any $4c-byte followed by an address within the range is transformed even though it was data or misaligned code

...don't care if it's not really an instruction, just look at each byte we are scanning from bottom and up. If the byte we are looking at looks like an absolute addressing mode instruction, convert it to relative if possible.

This last one is not improving anything other than simplifying the analysis. No need to understand what is code and what is data. Matching will be a little less perfect, but probably not much.

Refresh

Subscribe to this thread: