| |
chatGPZ
Registered: Dec 2001 Posts: 11386 |
WANTED: Packer for "Solid Archive" Memory Depacking
I was just looking for this and apparently it doesnt exist, a packer that covers the following usecase:
- pack many small files into one chunk of packed data
- uses dictionary and whatever other lookup data for all files
- depacker allows to depack individual files of that chunk into memory (randomly)
Think of eg a music collection - there you have many relatively small files which are very similar (or even identical) in large parts (the replayer code). With this kind of data you'd get a much better ratio when building a dictionary across all the files instead of packing each file separately.
I know one of you packer dudes are bored at easter.... :) |
|
| |
Martin Piper
Registered: Nov 2007 Posts: 722 |
I did this for the Hunter's Moon video sequences. The dictionary helped compression where the scene was drawing or replacing areas of the screen across several different frames. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Would you care to provide an example test corpus?
Also, I assume you just want some kind of "Unpack chunk #n" API, where you load an index register with the chunk number before calling decrunch, and the decruncher determines the destination address from information in the archive? |
| |
Martin Piper
Registered: Nov 2007 Posts: 722 |
Is all the data meant to be in memory? For example:
$400-$7fff is the demo code
$8000-$afff is a fixed address for uncompressed music
$b000-$fff0 is all compressed data and the decompression API.
Where you can do:
lda #[0-xx] ; Choose the "file number"
jsr $b000 ; Decompress the "file number" into $8000 |
| |
chatGPZ
Registered: Dec 2001 Posts: 11386 |
Quote:Would you care to provide an example test corpus?
you could just use ... eg the gt2 example tunes for a test :)
Quote:Also, I assume you just want some kind of "Unpack chunk #n" API, where you load an index register with the chunk number before calling decrunch
exactly
Quote:the decruncher determines the destination address from information in the archive?
thats a detail that should be optional imho, saving the destination address in the file or not, and being able to override it when depacking. i can see usecases for all of that :)
Quote:Is all the data meant to be in memory?
Yes.
But don't get too specific on the "music" usecase - it could also be videoframes to be played in any order which are not always depacked to the same location because of double buffering. |
| |
Martin Piper
Registered: Nov 2007 Posts: 722 |
In this case, example logic of the compression algorithm would be: "if there are long strings of literals, or its more efficient to encode a copy from the global dictionary instead of a copy from the uncompressed data, then include that data in the dictionary."
So there would be:
Literal
Copy range from uncompressed data
Copy range from global dictionary data
Hmm... |
| |
TBH
Registered: Mar 2010 Posts: 21 |
There may be some cases where it would be useful to encode the packed data relationally and provide a relative offset.
So, for strings of character values (a=0, b=1, etc):
abcde, bcde, cde, de, e
and
fghij, ghij, hij, ij, j
Store only the data:
"abcde"
Then access with the Index 0-4 plus an offset of either 0 or 4 (perhaps packed).
So to retrieve "hij":
Index = 2, offset = 7
The above is contrived, and would rely on there being many patterns that can be transposed in some manner, but this approach might be useful. |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
Quoting TBHThere may be some cases where it would be useful to encode the packed data relationally and provide a relative offset. Compression based on shortest common superstring is implemented in Compactor V2.0 . |
| |
Martin Piper
Registered: Nov 2007 Posts: 722 |
Interesting. With a short test of four music files, to test the sparse common data dictionary:
Total original data size: 3064 + 2811 + 4228 + 2607 = 12710 bytes
Compressed without dictionary: 2190 + 2278 + 2969 + 2051 = 9488 bytes
Compressed with 1024 byte dictionary: 1581 + 1909 + 2641 + 1733 = 7864 bytes
Bytes saved in compressed data due to dictionary: 9488 - 7864 = 1624 bytes
- 1024 bytes from original dictionary = 600 bytes
The 1024 byte dictionary was filled after the second file, which is why the first file benefits most from the dictionary. However all files do show some benefit from the dictionary.
The benefit of the dictionary increases if more files, sharing at least some common data, are compressed.
I need to write a 6502 decompress now... |
| |
Martin Piper
Registered: Nov 2007 Posts: 722 |
The common data for these music files seems to be the note hi/lo table values and some position independent code. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11386 |
Don't get too focussed on that example - there are many other usecases :) |
| |
Martin Piper
Registered: Nov 2007 Posts: 722 |
I'm just using it as easily accessible data files with some commonality. It's not tweaked for music specifically. It would work with graphics data too. |
| |
Martin Piper
Registered: Nov 2007 Posts: 722 |
6502 code works. I'll put a demo on my github in a bit... |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
How do you decide which sequences to put into the dictionary vs. which to keep as local back-references, ideally with some optimality? |
| |
Martin Piper
Registered: Nov 2007 Posts: 722 |
Frequency of use combined with encoding of offset efficiency. |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
"The 1024 byte dictionary was filled after the second file, which is why the first file benefits most from the dictionary"
doesnt sound optimal, dictionary should contain most used strings across all files regardless order. |
| |
Martin Piper
Registered: Nov 2007 Posts: 722 |
That was from an earlier test before I added the dictionary optimisation pass.
There is a configurable limit to the dictionary size as well. |
| |
Martin Piper
Registered: Nov 2007 Posts: 722 |
Readme: https://github.com/martinpiper/C64Public/blob/master/Dictionary..
Demo: https://github.com/martinpiper/C64Public/raw/master/DictionaryC..
The decompression source is a readable reference implementation. There is plenty of opportunity for optimisation.
I've got some ideas for more compression optimisations, but it will do for now. |
| |
Martin Piper
Registered: Nov 2007 Posts: 722 |
PS. The compression has been optimised again and the tool/example updated.
Original data size: 12710 bytes
Previous compressed size: 7864 bytes
New compressed size: 7574 bytes
Basically it's trying different compression options (and using multiple threads) to hunt for the best savings. |