[CSDb] - User Forums - LZW compression on C64

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > LZW compression on C64

2008-09-10 20:27

Krill

Registered: Apr 2002
Posts: 2980

LZW compression on C64

Hello,

i'm looking for a tool that LZW-compresses data and comes with C64 decompression code. I guess nothing like that exists yet, but i'm asking anyways.

LZW works with a library to look up uncompressed chunks of data to insert in place of their respective compressed ones in the output data, rather than LZ77 which works with back-references to already-decompressed data, as commonly done on the C64.

This sort of compression is useable for streaming compressed data through the C64, and hasn't been used up to its potential on the C64 yet, afaik.

... 9 posts hidden. Click here to view all posts....

2008-09-12 14:32

Krill

Registered: Apr 2002
Posts: 2980

doynax: Thanks for the source. I guess the vanilla byteboozer decompressor is easier to change for my purposes, though. :)

Both buffers are resizeable, and the track buffer is not actually a track buffer. The blocks are put there in the correct order, and a reader just needs to increment its pointer linearly, wait if there's a gap, and wrap around if the end of the buffer has been reached. I will do some tests anyway to see how large that buffer should be.

Decompression in advance is something that the application can do if it chooses to. The loader itself does decompression on demand and could easily report the amount of continuous input stream bytes remaining in the buffer, so the application can choose whether to have them decompressed now or later.

2008-09-12 15:28

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quoting Krill

Thanks for the source. I guess the vanilla byteboozer decompressor is easier to change for my purposes, though. :)

You may be right.
I figure you'll want to tweak the compressor a bit so matches don't reach outside the window, and as well as make sure that neither individual matches nor literal runs cross the window boundary. Then add wrapping logic to wherever the destination pointer crosses a page, as well as when the match source pointer is calculated.
This version does this kind of thing already and will hopefully compress better and decompress faster than the original ByteBoozer package. The code is quite a bit larger though and leaves a lot to be desired when it comes to testing and documentation.

Quote:

Both buffers are resizeable, and the track buffer is not actually a track buffer. The blocks are put there in the correct order, and a reader just needs to increment its pointer linearly, wait if there's a gap, and wrap around if the end of the buffer has been reached. I will do some tests anyway to see how large that buffer should be.

Interesting. Then how do you figure out which sectors are relevant ahead of time? I guess you could make an extra scanning pass to read all of the track/sector links, or assume the interleave is constant and make an educated guess, or... something.

Quote:

Decompression in advance is something that the application can do if it chooses to. The loader itself does decompression on demand and could easily report the amount of continuous input stream bytes remaining in the buffer, so the application can choose whether to have them decompressed now or later.

It just seemed surprisingly tricky to put everything together and make the damned thing work without having the consumer and the decruncher stepping on each others' toes, especially when the consumer can interrupt the cruncher at any time (e.g. the buffer pointers had to be read and updated without races conditions.)
But you're probably right. Different applications have different needs and can make different assumptions, and trying to write a generic loader system which handles it all is likely a bad idea. I just haven't thought much beyond the needs of my particular pet project ;)

2008-09-12 19:02

Krill

Registered: Apr 2002
Posts: 2980

doynax: Hmm, i certainly won't fix any compressor to work with streaming - i'll just see if i can use your stuff and skip it if not.

The loader does an extra scan of each track after seeking, so it knows all the links and thus the blocks to fetch and which index they have, so everything is known to put them at the right place. Without streaming, no extra buffers are needed this way. The overhead is ok compared to the gain, but i still need to implement that special format which will render that extra revolution obselete at the cost of 2 data bytes per block.

About race conditions, does your solution work with interrupts? Because otherwise you'd have none of those, i guess. My loader normally doesn't use interrupts, unless you enable the non-blocking API which keeps polling and loading blocks in the background while the main thread does whatever it is supposed to.

2008-09-12 20:08

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quoting Krill

About race conditions, does your solution work with interrupts? Because otherwise you'd have none of those, i guess. My loader normally doesn't use interrupts, unless you enable the non-blocking API which keeps polling and loading blocks in the background while the main thread does whatever it is supposed to.

Well, aren't interrupts pretty much mandatory when doing background decrunching? I mean you're going to want to spend whatever spare cycles are left in a frame fetching and decompressing sectors.

Not that it was all that bad really, but it's not the kind of thing that can be tacked on by the user afterwards either unless you've designed your loader for it.

2008-09-12 22:50

Jazzcat

Registered: Feb 2002
Posts: 1044

Here is an article Antitrack wrote for the Domination magazine (later published in Recollection).

http://www.atlantis-prophecy.org/recollection/?load=online&issu..

2008-09-13 12:43

Krill

Registered: Apr 2002
Posts: 2980

doynax:

My loader works like this (some of it at least in my head):
There's a get-n-bytes call to the loader to get a pointer to a consecutive chunk of n decompressed stream bytes. This chunk will be situated in the LZ77 back-reference look-up buffer, of course.
Now, that routine will call the decompressor (which holds its current state between calls). The decompressor then repeatedly calls a get-compressed-stream-byte routine to consume data from the compressed input stream.
This get-compressed-stream-byte routine gets the data from the "track buffer". But before returning the next stream byte, it checks if the drive has a block ready for transfer - if so (and the buffer has space for it), that block is fetched and put in the buffer. Then the drive will take a while until the next block is ready to transfer from the drive. In the meanwhile, the next few bytes are returned, or - if a gap is reached - the call waits until the next needed block is ready, then fetches it, and then returns the next byte.

So this all works without interrupts. If you want to have a background thread doing all the decompression and loading work, just make a little threading environment yourself. That is, have context-switch interrupts once in a while and then switch registers, the flags and the stack pointer, and decide yourself on the fly how many cycles that loading/decompression thread will get.

2008-09-13 13:44

HCL

Registered: Feb 2003
Posts: 728

I don't know.. This sounds completely fukkedup. But i suppose it's genious in some way :P. However, as long as all this doesn't render a demo in one or another way, it will stay fukkedup in my world ;).

2008-09-13 13:55

doynax
Account closed

Registered: Oct 2004
Posts: 212

The problem with that is approach is that designing the LZ routine as a state machine where you request each individual byte would be quite inefficient. Instead I call the cruncher to unpack an entire page per invocation, but also since stopping in the middle of a match or literal run would be rather inconvenient it will additionally complete the final sequence and spills up to 255 bytes into the next page (since that's the maximum run length.)
As for fetching sectors, well there's just not much to it in my case. There's a single sector buffer which the cruncher can request to refill whenever it happens to run out, and of course the drive is intended to fetch the next sector while the previous one is being decompressed.

The thread scheduling policy is rather simple, whatever time is left after the game has done its thing is spent on streaming. This means that you don't need split stacks and the like, a simple frame IRQ to run the game while letting the cruncher work in the background will suffice. So at the start of each frame interrupt the game looks at the LZ write pointer to see if there's enough fresh bytes available before the read pointer to let the frame proceed, and if not it simply returns from the interrupt and awaits the next frame.
Similarly the decompression loop must not overwrite whatever the game is trying to read so before fetching the next page so its got a busy-waiting loop checking whether there's enough space between game's read pointer and the LZ write pointer to call the decrunching routine.
There's some other details involved here, like the need for some extra safety space to account for the aforementioned decruncher 'spill.' Also the write pointer has to be updated somewhat carefully to avoid race conditions (and in fact to simplify things I'm only ever checking its high-byte) while on the other hand the cruncher cannot preempt the game so the read pointer is safe.

Quoting HCL

I don't know.. This sounds completely fukkedup. But i suppose it's genious in some way :P. However, as long as all this doesn't render a demo in one or another way, it will stay fukkedup in my world ;).

Hence the need for the streaming loader Krill is writing. It took me bloody ages to work everything out and get streaming to work effectively, if someone just wants to write (say) a vector part with precalculated vertices they shouldn't have to go through all that crap.

2008-09-13 15:44

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quoting Jazzcat

Here is an article Antitrack wrote for the Domination magazine (later published in Recollection).

That article seems to describe a classic hashing scheme to speed things up, pretty much what I'm doing in my cross-compressor actually. The downside of this approach being that the degenerate case of heavily repeating sequences (e.g. long RLE runs and the like) will slow things down to a crawl.
Not surprisingly if you do some Googling you'll discover that there many other clever techniques to speed up the longest-match search, as well as (non-greedy) methods to improve the match quality. In fact there's enough research papers on LZ compression to last anyone a lifetime.

I'm tempted to see if these techniques can be put to use in a native C64 packer, if for no other reason than to get to implement some non-trivial algorithms in 6502 code for once. The really neat thing about ByteBoozer here is that the maximum match distance is just over 4k so there's enough memory available to do all kinds of advanced stuff and compress arbitrarily long files without involving an REU.
The only question is whether anyone actually still uses them or if it'd an academic exercise?

edit: Is it just me or does CSDB have issues with displaying links within quotes?

2008-09-14 21:59

Krill

Registered: Apr 2002
Posts: 2980

doynax: I guess it's totally possible to optimize state-machine LZ decompressors like i use by having them call a get-n-bytes routine as well when it comes to literal runs or some such, and thus avoiding the heavy overhead you mentioned - and that's what i'll try.

The biggest downside is that polling the drive for new blocks is needed, since there is no way for the drive to trigger c64-side interrupts, unless it's a 1581 or similar. :)

Previous - 1 | 2 - Next

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

iAN CooG/HVSC
Sokrates
The Human Co../Maste..
t0m3000/hf^boom!^ibx
fieserWolF/Abyss-Con..
kbs/Pht/Lxt
Alakran_64
MWR/Visdom
Andy/AEG
Guests online: 99

Top Demos

1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.6)
5 Edge of Disgrace  (9.6)
6 What Is The Matrix 2  (9.6)
7 The Demo Coder  (9.6)
8 Uncensored  (9.6)
9 Comaland 100%  (9.6)
10 Wonderland XIV  (9.6)

Top onefile Demos

1 No Listen  (9.6)
2 Layers  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.6)
6 Dawnfall V1.1  (9.5)
7 Rainbow Connection  (9.5)
8 Onscreen 5k  (9.5)
9 Morph  (9.5)
10 Libertongo  (9.5)

Top Groups

1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Censor Design  (9.3)
5 Triad  (9.3)

Top Swappers

1 Derbyshire Ram  (10)
2 Jerry  (9.8)
3 Violator  (9.7)
4 Acidchild  (9.7)
5 Cash  (9.6)

Page generated in: 0.055 sec.