[CSDb] - User Forums - WANTED: Packer for "Solid Archive" Memory Depacking

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > WANTED: Packer for "Solid Archive" Memory Depacking

2022-04-13 00:07

chatGPZ

Registered: Dec 2001
Posts: 11386

WANTED: Packer for "Solid Archive" Memory Depacking

I was just looking for this and apparently it doesnt exist, a packer that covers the following usecase:

- pack many small files into one chunk of packed data
- uses dictionary and whatever other lookup data for all files
- depacker allows to depack individual files of that chunk into memory (randomly)

Think of eg a music collection - there you have many relatively small files which are very similar (or even identical) in large parts (the replayer code). With this kind of data you'd get a much better ratio when building a dictionary across all the files instead of packing each file separately.

I know one of you packer dudes are bored at easter.... :)

2022-04-13 04:20

Martin Piper

Registered: Nov 2007
Posts: 722

I did this for the Hunter's Moon video sequences. The dictionary helped compression where the scene was drawing or replacing areas of the screen across several different frames.

2022-04-13 04:26

ChristopherJam

Registered: Aug 2004
Posts: 1409

Would you care to provide an example test corpus?

Also, I assume you just want some kind of "Unpack chunk #n" API, where you load an index register with the chunk number before calling decrunch, and the decruncher determines the destination address from information in the archive?

2022-04-13 10:03

Martin Piper

Registered: Nov 2007
Posts: 722

Is all the data meant to be in memory? For example:
$400-$7fff is the demo code
$8000-$afff is a fixed address for uncompressed music
$b000-$fff0 is all compressed data and the decompression API.

Where you can do:
lda #[0-xx] ; Choose the "file number"
jsr $b000 ; Decompress the "file number" into $8000

2022-04-13 12:27

chatGPZ

Registered: Dec 2001
Posts: 11386

Quote:

Would you care to provide an example test corpus?

you could just use ... eg the gt2 example tunes for a test :)
Quote:

Also, I assume you just want some kind of "Unpack chunk #n" API, where you load an index register with the chunk number before calling decrunch

exactly
Quote:

the decruncher determines the destination address from information in the archive?

thats a detail that should be optional imho, saving the destination address in the file or not, and being able to override it when depacking. i can see usecases for all of that :)
Quote:

Is all the data meant to be in memory?

Yes.

But don't get too specific on the "music" usecase - it could also be videoframes to be played in any order which are not always depacked to the same location because of double buffering.

2022-04-18 03:45

Martin Piper

Registered: Nov 2007
Posts: 722

In this case, example logic of the compression algorithm would be: "if there are long strings of literals, or its more efficient to encode a copy from the global dictionary instead of a copy from the uncompressed data, then include that data in the dictionary."

So there would be:
Literal
Copy range from uncompressed data
Copy range from global dictionary data

Hmm...

2022-04-25 11:42

TBH

Registered: Mar 2010
Posts: 21

There may be some cases where it would be useful to encode the packed data relationally and provide a relative offset.

So, for strings of character values (a=0, b=1, etc):
abcde, bcde, cde, de, e
and
fghij, ghij, hij, ij, j

Store only the data:
"abcde"

Then access with the Index 0-4 plus an offset of either 0 or 4 (perhaps packed).

So to retrieve "hij":
Index = 2, offset = 7

The above is contrived, and would rely on there being many patterns that can be transposed in some manner, but this approach might be useful.

2022-04-25 11:50

Krill

Registered: Apr 2002
Posts: 2980

Quoting TBH

There may be some cases where it would be useful to encode the packed data relationally and provide a relative offset.

Compression based on shortest common superstring is implemented in Compactor V2.0 .

2022-04-29 17:20

Martin Piper

Registered: Nov 2007
Posts: 722

Interesting. With a short test of four music files, to test the sparse common data dictionary:

Total original data size: 3064 + 2811 + 4228 + 2607 = 12710 bytes

Compressed without dictionary: 2190 + 2278 + 2969 + 2051 = 9488 bytes

Compressed with 1024 byte dictionary: 1581 + 1909 + 2641 + 1733 = 7864 bytes

Bytes saved in compressed data due to dictionary: 9488 - 7864 = 1624 bytes
- 1024 bytes from original dictionary = 600 bytes

The 1024 byte dictionary was filled after the second file, which is why the first file benefits most from the dictionary. However all files do show some benefit from the dictionary.

The benefit of the dictionary increases if more files, sharing at least some common data, are compressed.

I need to write a 6502 decompress now...

2022-04-29 17:24

Martin Piper

Registered: Nov 2007
Posts: 722

The common data for these music files seems to be the note hi/lo table values and some position independent code.

2022-04-29 17:44

chatGPZ

Registered: Dec 2001
Posts: 11386

Don't get too focussed on that example - there are many other usecases :)

2022-04-30 03:12

Martin Piper

Registered: Nov 2007
Posts: 722

I'm just using it as easily accessible data files with some commonality. It's not tweaked for music specifically. It would work with graphics data too.

2022-05-02 13:19

Martin Piper

Registered: Nov 2007
Posts: 722

6502 code works. I'll put a demo on my github in a bit...

2022-05-02 14:25

Krill

Registered: Apr 2002
Posts: 2980

How do you decide which sequences to put into the dictionary vs. which to keep as local back-references, ideally with some optimality?

2022-05-02 14:37

Martin Piper

Registered: Nov 2007
Posts: 722

Frequency of use combined with encoding of offset efficiency.

2022-05-02 16:21

Oswald

Registered: Apr 2002
Posts: 5094

"The 1024 byte dictionary was filled after the second file, which is why the first file benefits most from the dictionary"

doesnt sound optimal, dictionary should contain most used strings across all files regardless order.

2022-05-02 17:34

Martin Piper

Registered: Nov 2007
Posts: 722

That was from an earlier test before I added the dictionary optimisation pass.

There is a configurable limit to the dictionary size as well.

2022-05-02 17:39

Martin Piper

Registered: Nov 2007
Posts: 722

Readme: https://github.com/martinpiper/C64Public/blob/master/Dictionary..

Demo: https://github.com/martinpiper/C64Public/raw/master/DictionaryC..

The decompression source is a readable reference implementation. There is plenty of opportunity for optimisation.

I've got some ideas for more compression optimisations, but it will do for now.

2022-05-05 07:18

Martin Piper

Registered: Nov 2007
Posts: 722

PS. The compression has been optimised again and the tool/example updated.
Original data size: 12710 bytes
Previous compressed size: 7864 bytes
New compressed size: 7574 bytes

Basically it's trying different compression options (and using multiple threads) to hunt for the best savings.

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

Dr. Doom/RAD
DanPhillips
Yogibear/Protovision
Guests online: 72

Top Demos

1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.6)
5 Edge of Disgrace  (9.6)
6 What Is The Matrix 2  (9.6)
7 The Demo Coder  (9.6)
8 Uncensored  (9.6)
9 Comaland 100%  (9.6)
10 Wonderland XIV  (9.6)

Top onefile Demos

1 Layers  (9.6)
2 No Listen  (9.6)
3 Party Elk 2  (9.6)
4 Cubic Dream  (9.6)
5 Copper Booze  (9.6)
6 Rainbow Connection  (9.5)
7 Dawnfall V1.1  (9.5)
8 Onscreen 5k  (9.5)
9 Morph  (9.5)
10 Libertongo  (9.5)

Top Groups

1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Triad  (9.3)
5 Censor Design  (9.3)

Top Musicians

1 Rob Hubbard  (9.7)
2 Mutetus  (9.7)
3 Jeroen Tel  (9.7)
4 Linus  (9.6)
5 Stinsen  (9.6)

Page generated in: 0.084 sec.