| |
tlr
Registered: Sep 2003 Posts: 1790 |
Is there a difference packer? (=diff + patch)
Is there a difference packer somewhere?
I have two similar binaries, but not identical.
They have code inserted in different places, so they don't line up. To support this the tool must handle inserted/removed sequences.
I found this Rizla+ V1.4, but it seems to handle only aligned data with a few differing bytes/strings.
Are there better ones? |
|
| |
Twoflower
Registered: Jan 2002 Posts: 434 |
Rizla is the only one i've heard about. Which is a shame really.
|
| |
tlr
Registered: Sep 2003 Posts: 1790 |
Yes. It would be useful in many cases where there is a selectable version of things.
I remember stacking the Laser Genious low mem+high mem versions together and packing them up.
Wasn't very efficient. I guess the window of the packer wasn't big enough to catch the similarities.
A really efficient difference utility could use the fact that only certain instructions does absolute references and pack stuff that adheres to that rule more efficiently. |
| |
MagerValp
Registered: Dec 2001 Posts: 1078 |
LZSS with 16-bit back references and the old binary as a pre-populated dictionary would do the trick... |
| |
algorithm
Registered: May 2002 Posts: 705 |
if they are near identical, then modify them manually |
| |
tlr
Registered: Sep 2003 Posts: 1790 |
Quote: if they are near identical, then modify them manually
Like I said. The same code isn't in the same place. Here and there code is inserted or removed. Besides they are ~110 blocks each.
I have some ideas, but as usual it's no fun redoing something that already might exists in a better version. ;)
|
| |
algorithm
Registered: May 2002 Posts: 705 |
such a thing is trivial to code though. surprised that there is only 1 program that does this |
| |
chatGPZ
Registered: Dec 2001 Posts: 11386 |
there is such a program on one of the "coders orgasm" discs aswell. and i doubt there arent more, like algorithm said, its pretty trivial =) |
| |
tlr
Registered: Sep 2003 Posts: 1790 |
Quote: there is such a program on one of the "coders orgasm" discs aswell. and i doubt there arent more, like algorithm said, its pretty trivial =)
Where can I find it? It's not in the database.
EDIT: found it. It's Rizla+ V1.4, which doesn't handle any moved blocks at all.
Sure, it's not _that_ difficult, but it's not entierly trivial either.
The LZSS with 64k back references will probably work somewhat. Exomizer has a 64k window IIRC. However the results are not very convincing.
For it to be any efficient it must probably take into account that a moved chunk of code has a different base address for its absolute references. (the binaries I intend to differentiate are mostly code)
What I consider doing is making a program that finds absolute instructions that references to within say 10-bits signed from the original location and replace those with a relative pseudo equivalent.
If we hypotetically can find a byte that is not used in our binary, this could be <code>, <index | offs msb>, <offs lsb>, where index is a 6-bit index into a table containing the opcodes ($4c, $20, $8d, ...)
(maybe 5-bit is enough)
Code matching does not need to be perfect. It's probably ok to say that for example any $4c-byte followed by an address within the range is transformed even though it was data or misaligned code.
I believe that after such a transformation LZ77/LZSS could be much more efficient.
|
| |
Mace
Registered: May 2002 Posts: 1799 |
TLR, it's very interesting to see where you're going, but I don't understand very much of it :-)
What do you mean with:Quote:absolute instructions that references to within say 10-bits signed from the original location and replace those with a relative pseudo equivalent.
and Quote:Code matching does not need to be perfect. It's probably ok to say that for example any $4c-byte followed by an address within the range is transformed even though it was data or misaligned code
?? |
| |
tlr
Registered: Sep 2003 Posts: 1790 |
Quote: What do you mean with:
Quote:absolute instructions that references to within say 10-bits signed from the original location and replace those with a relative pseudo equivalent.
In short, replace instructions that have absolute addressing mode, with fake instructions that have relative addressing mode.
Quote: and
Quote:Code matching does not need to be perfect. It's probably ok to say that for example any $4c-byte followed by an address within the range is transformed even though it was data or misaligned code
...don't care if it's not really an instruction, just look at each byte we are scanning from bottom and up. If the byte we are looking at looks like an absolute addressing mode instruction, convert it to relative if possible.
This last one is not improving anything other than simplifying the analysis. No need to understand what is code and what is data. Matching will be a little less perfect, but probably not much. |