| |
Krill
Registered: Apr 2002 Posts: 2980 |
LZW compression on C64
Hello,
i'm looking for a tool that LZW-compresses data and comes with C64 decompression code. I guess nothing like that exists yet, but i'm asking anyways.
LZW works with a library to look up uncompressed chunks of data to insert in place of their respective compressed ones in the output data, rather than LZ77 which works with back-references to already-decompressed data, as commonly done on the C64.
This sort of compression is useable for streaming compressed data through the C64, and hasn't been used up to its potential on the C64 yet, afaik. |
|
| |
TNT Account closed
Registered: Oct 2004 Posts: 189 |
The first thing I think of when I hear LZW is GIF, so check out for C64/Apple GIF viewers for decompression source. Finding a compatible PC compression routine shouldn't be too hard :)
Searching for 6502 LZW implementation is made harder by all those pages mixing LZW and LZ77 variants. |
| |
doynax Account closed
Registered: Oct 2004 Posts: 212 |
If you've got sufficient space for a dictionary then why not use it directly as a traditional LZ decompression window? That way you get also get a nice streaming buffer to cover fluctuating read/consumption rates.
At any I wrote a streaming LZ compressor for my game, essentially an optimized version of ByteBoozer, if you're interested.
Not to say that LZW isn't worth experimenting with, for one thing it's bound to get better compression rates on some files, but I fail to see how why it would be better for streaming. |
| |
Moloch
Registered: Jan 2002 Posts: 2928 |
C=Hacking #6 (basic compress/decompress)
and I'm assuming you've looked at the various native c64/c128 versions of zip/gzip?
|
| |
HCL
Registered: Feb 2003 Posts: 728 |
I don't have it. Lucky you may get the chance to code it yourself :). |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
doynax: Oh, you're right of course. Thanks, for pointing that out.
I was thinking about LZW because of saving one extra copy step for the application i have in mind, but then i realized the overhead is the same for both solutions. I'll just go for the streaming exomizer decompressor then.
Btw., there'll be two buffers then: One ringbuffer for the compressed data, which decouples fetching blocks from the drive from actually reading them into the decompressor's input stream (with my loader, they come out of order as they pass the r/w head - for speed reasons, of course), and another ringbuffer for the decompressed data which will then be used for the LZ77 back references.
The decompressed data will also be actually used and copied around frantically by the actual application, and so can't be used for back-referencing stuff. |
| |
doynax Account closed
Registered: Oct 2004 Posts: 212 |
Quoting krillI'll just go for the streaming exomizer decompressor then. I'd reconsider using Exomizer if I were you. That decruncher seems to use any and all tricks in the book to improve the compression rate and keep the code footprint small, at the cost of performance. I'm getting roughly a 2-3x speedup over Exomizer with my decruncher for a 1-2% worse compression ratio (though YMMV.)
Not to say you should necessarily use that particular decruncher, for one thing it just wasn't written to handle multiple sector buffers, but it's definitely worth looking into faster alternatives if you intend to get any mileage out of that fancy out-of-order reader you've got.
Quoting krillBtw., there'll be two buffers then: One ringbuffer for the compressed data, which decouples fetching blocks from the drive from actually reading them into the decompressor's input stream (with my loader, they come out of order as they pass the r/w head - for speed reasons, of course), and another ringbuffer for the decompressed data which will then be used for the LZ77 back references. Care to elaborate on just how that interleave-agnostic loader thing works together with decompression? Short of buffering an entire track I don't quite see how it'd work, and even then I imagine things like switching tracks and not stepping on any toes might get tricky.
Oh and don't forget that you probably want a sizable LZ window for decent results, it depends on the particular files you're compressing of course but I'd say 2-4k is reasonable. And the less predictable the read-rate is the larger your buffer needs to be to cover for it.. |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
doynax: I'll just implement both and then see, after all there are a lot of non-stream decompressors to choose from already in my loader, and more options are always better.
What do you mean by "multiple sector buffers"? The decompressor just calls a get-next-byte routine, and will be fed the data in the correct order.
And yes, my plan is to have a sector buffer the size of two tracks, so the loader can happily read the next track while the stream feeder goes through the current track's blocks and eventually waits when a block is still missing. At this point i don't see any tricky situations to worry about except for deciding if/when to reject a block to be put in place of a yet-unread one - in favour of a possible next block with less overall overhead.
The decompression look-up buffer is easily resizeable and has no constraints regarding the other buffer's size. I don't see a connection to the reading speed though, since the other buffer is already there for that and hands out the next input stream bytes as soon as possible. |
| |
doynax Account closed
Registered: Oct 2004 Posts: 212 |
Quoting KrillI'll just implement both and then see, after all there are a lot of non-stream decompressors to choose from already in my loader, and more options are always better. Knock yourself out (download.) It should be more-or-less self-explanatory..
Quote:What do you mean by "multiple sector buffers"? The decompressor just calls a get-next-byte routine, and will be fed the data in the correct order. It's inlined in a couple of places but yeah it's not really a problem, just patch up some load addresses when moving on to the next sector. I guess I hadn't thought it through ;)
Quote:And yes, my plan is to have a sector buffer the size of two tracks, so the loader can happily read the next track while the stream feeder goes through the current track's blocks and eventually waits when a block is still missing. At this point i don't see any tricky situations to worry about except for deciding if/when to reject a block to be put in place of a yet-unread one - in favour of a possible next block with less overall overhead. But that's like ten and a half kilobytes of buffers!
With Exomizer I'm up to an interleave of 11 or so when doing full-time loading over a blanked screen. Even with the worst possible interleave you'd still have to spend like half the frame loading to stand a chance of gaining anything by it.
Then again if you've got the memory to spare then why not, right? ;)
Quote:The decompression look-up buffer is easily resizeable and has no constraints regarding the other buffer's size. I don't see a connection to the reading speed though, since the other buffer is already there for that and hands out the next input stream bytes as soon as possible. Yeah, you seem to be covering things with the track buffers instead.
It still applies to just the decompression by itself though. That is by decompressing in advance (and in the background) rather than on-demand as you consume the data you can balance things out a bit so that the loading/crunching is done mostly during otherwise idle frames.
Perhaps that's of limited use for a demo part but in a game it means that you don't have to reserve N cycles each frame for streaming. |
| |
HCL
Registered: Feb 2003 Posts: 728 |
..and remember, it's based on ByteBoozer :D. |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
doynax: Thanks for the source. I guess the vanilla byteboozer decompressor is easier to change for my purposes, though. :)
Both buffers are resizeable, and the track buffer is not actually a track buffer. The blocks are put there in the correct order, and a reader just needs to increment its pointer linearly, wait if there's a gap, and wrap around if the end of the buffer has been reached. I will do some tests anyway to see how large that buffer should be.
Decompression in advance is something that the application can do if it chooses to. The loader itself does decompression on demand and could easily report the amount of continuous input stream bytes remaining in the buffer, so the application can choose whether to have them decompressed now or later. |
| |
doynax Account closed
Registered: Oct 2004 Posts: 212 |
Quoting KrillThanks for the source. I guess the vanilla byteboozer decompressor is easier to change for my purposes, though. :) You may be right.
I figure you'll want to tweak the compressor a bit so matches don't reach outside the window, and as well as make sure that neither individual matches nor literal runs cross the window boundary. Then add wrapping logic to wherever the destination pointer crosses a page, as well as when the match source pointer is calculated.
This version does this kind of thing already and will hopefully compress better and decompress faster than the original ByteBoozer package. The code is quite a bit larger though and leaves a lot to be desired when it comes to testing and documentation.
Quote:Both buffers are resizeable, and the track buffer is not actually a track buffer. The blocks are put there in the correct order, and a reader just needs to increment its pointer linearly, wait if there's a gap, and wrap around if the end of the buffer has been reached. I will do some tests anyway to see how large that buffer should be. Interesting. Then how do you figure out which sectors are relevant ahead of time? I guess you could make an extra scanning pass to read all of the track/sector links, or assume the interleave is constant and make an educated guess, or... something.
Quote:Decompression in advance is something that the application can do if it chooses to. The loader itself does decompression on demand and could easily report the amount of continuous input stream bytes remaining in the buffer, so the application can choose whether to have them decompressed now or later. It just seemed surprisingly tricky to put everything together and make the damned thing work without having the consumer and the decruncher stepping on each others' toes, especially when the consumer can interrupt the cruncher at any time (e.g. the buffer pointers had to be read and updated without races conditions.)
But you're probably right. Different applications have different needs and can make different assumptions, and trying to write a generic loader system which handles it all is likely a bad idea. I just haven't thought much beyond the needs of my particular pet project ;) |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
doynax: Hmm, i certainly won't fix any compressor to work with streaming - i'll just see if i can use your stuff and skip it if not.
The loader does an extra scan of each track after seeking, so it knows all the links and thus the blocks to fetch and which index they have, so everything is known to put them at the right place. Without streaming, no extra buffers are needed this way. The overhead is ok compared to the gain, but i still need to implement that special format which will render that extra revolution obselete at the cost of 2 data bytes per block.
About race conditions, does your solution work with interrupts? Because otherwise you'd have none of those, i guess. My loader normally doesn't use interrupts, unless you enable the non-blocking API which keeps polling and loading blocks in the background while the main thread does whatever it is supposed to. |
| |
doynax Account closed
Registered: Oct 2004 Posts: 212 |
Quoting KrillAbout race conditions, does your solution work with interrupts? Because otherwise you'd have none of those, i guess. My loader normally doesn't use interrupts, unless you enable the non-blocking API which keeps polling and loading blocks in the background while the main thread does whatever it is supposed to. Well, aren't interrupts pretty much mandatory when doing background decrunching? I mean you're going to want to spend whatever spare cycles are left in a frame fetching and decompressing sectors.
Not that it was all that bad really, but it's not the kind of thing that can be tacked on by the user afterwards either unless you've designed your loader for it. |
| |
Jazzcat
Registered: Feb 2002 Posts: 1044 |
Here is an article Antitrack wrote for the Domination magazine (later published in Recollection).
http://www.atlantis-prophecy.org/recollection/?load=online&issu..
|
| |
Krill
Registered: Apr 2002 Posts: 2980 |
doynax:
My loader works like this (some of it at least in my head):
There's a get-n-bytes call to the loader to get a pointer to a consecutive chunk of n decompressed stream bytes. This chunk will be situated in the LZ77 back-reference look-up buffer, of course.
Now, that routine will call the decompressor (which holds its current state between calls). The decompressor then repeatedly calls a get-compressed-stream-byte routine to consume data from the compressed input stream.
This get-compressed-stream-byte routine gets the data from the "track buffer". But before returning the next stream byte, it checks if the drive has a block ready for transfer - if so (and the buffer has space for it), that block is fetched and put in the buffer. Then the drive will take a while until the next block is ready to transfer from the drive. In the meanwhile, the next few bytes are returned, or - if a gap is reached - the call waits until the next needed block is ready, then fetches it, and then returns the next byte.
So this all works without interrupts. If you want to have a background thread doing all the decompression and loading work, just make a little threading environment yourself. That is, have context-switch interrupts once in a while and then switch registers, the flags and the stack pointer, and decide yourself on the fly how many cycles that loading/decompression thread will get.
|
| |
HCL
Registered: Feb 2003 Posts: 728 |
I don't know.. This sounds completely fukkedup. But i suppose it's genious in some way :P. However, as long as all this doesn't render a demo in one or another way, it will stay fukkedup in my world ;). |
| |
doynax Account closed
Registered: Oct 2004 Posts: 212 |
The problem with that is approach is that designing the LZ routine as a state machine where you request each individual byte would be quite inefficient. Instead I call the cruncher to unpack an entire page per invocation, but also since stopping in the middle of a match or literal run would be rather inconvenient it will additionally complete the final sequence and spills up to 255 bytes into the next page (since that's the maximum run length.)
As for fetching sectors, well there's just not much to it in my case. There's a single sector buffer which the cruncher can request to refill whenever it happens to run out, and of course the drive is intended to fetch the next sector while the previous one is being decompressed.
The thread scheduling policy is rather simple, whatever time is left after the game has done its thing is spent on streaming. This means that you don't need split stacks and the like, a simple frame IRQ to run the game while letting the cruncher work in the background will suffice. So at the start of each frame interrupt the game looks at the LZ write pointer to see if there's enough fresh bytes available before the read pointer to let the frame proceed, and if not it simply returns from the interrupt and awaits the next frame.
Similarly the decompression loop must not overwrite whatever the game is trying to read so before fetching the next page so its got a busy-waiting loop checking whether there's enough space between game's read pointer and the LZ write pointer to call the decrunching routine.
There's some other details involved here, like the need for some extra safety space to account for the aforementioned decruncher 'spill.' Also the write pointer has to be updated somewhat carefully to avoid race conditions (and in fact to simplify things I'm only ever checking its high-byte) while on the other hand the cruncher cannot preempt the game so the read pointer is safe.
Quoting HCLI don't know.. This sounds completely fukkedup. But i suppose it's genious in some way :P. However, as long as all this doesn't render a demo in one or another way, it will stay fukkedup in my world ;). Hence the need for the streaming loader Krill is writing. It took me bloody ages to work everything out and get streaming to work effectively, if someone just wants to write (say) a vector part with precalculated vertices they shouldn't have to go through all that crap. |
| |
doynax Account closed
Registered: Oct 2004 Posts: 212 |
Quoting JazzcatHere is an article Antitrack wrote for the Domination magazine (later published in Recollection). That article seems to describe a classic hashing scheme to speed things up, pretty much what I'm doing in my cross-compressor actually. The downside of this approach being that the degenerate case of heavily repeating sequences (e.g. long RLE runs and the like) will slow things down to a crawl.
Not surprisingly if you do some Googling you'll discover that there many other clever techniques to speed up the longest-match search, as well as (non-greedy) methods to improve the match quality. In fact there's enough research papers on LZ compression to last anyone a lifetime.
I'm tempted to see if these techniques can be put to use in a native C64 packer, if for no other reason than to get to implement some non-trivial algorithms in 6502 code for once. The really neat thing about ByteBoozer here is that the maximum match distance is just over 4k so there's enough memory available to do all kinds of advanced stuff and compress arbitrarily long files without involving an REU.
The only question is whether anyone actually still uses them or if it'd an academic exercise?
edit: Is it just me or does CSDB have issues with displaying links within quotes? |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
doynax: I guess it's totally possible to optimize state-machine LZ decompressors like i use by having them call a get-n-bytes routine as well when it comes to literal runs or some such, and thus avoiding the heavy overhead you mentioned - and that's what i'll try.
The biggest downside is that polling the drive for new blocks is needed, since there is no way for the drive to trigger c64-side interrupts, unless it's a 1581 or similar. :) |