| |
cadaver
Registered: Feb 2002 Posts: 1160 |
Exomizer on-the-fly loading/decompressing
Hey,
anyone want to share, what is the lowest disk interleave you've managed to use with on-the-fly Exomizer decompression while loading?
I'm currently at 11, using 2-bit transfer and a lame drivecode (using jobcodes only) + 1 sector buffering. However I don't think the drivecode is the problem; if I try to decrease to IL 10 the C64 often doesn't have to wait for the drive at all for the next sector's data, but occasionally the depack will take too long, resulting in missed revolution.
I've already done some optimization to the depack routine, including inlining getting a single bit (literal/sequence decision, and reading the gamma).
Don't think I would switch to another packer just for speed, but nevertheless interested in any battle stories. |
|
| |
cadaver
Registered: Feb 2002 Posts: 1160 |
Ok. Got it down to 10, not 100% reliably but reliably enough to result in a definite speed increase compared to 11.
Final changes were further inlining in the gamma/decision handling (do not jsr to a "refill bitbuf" routine, instead jsr directly to getbyte) and to not save the X register except when needed. I'm also purposefully disallowing literal sequences, since at least for my usecase they don't result in actual disk blocks being saved.
For those interested, the current loader code (may not be useful for general use): https://github.com/cadaver/hessian/blob/master/loader.s
(I'm checking for a sprite Y-coordinate range in the sector transfer, that could simply be removed if sprites were always off to save approx. 40 lines) |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
cant help, but i hope this means a cool cadaver game :) |
| |
tlr
Registered: Sep 2003 Posts: 1790 |
Perhaps you should try limiting copy lengths to 256 with -M256 as well. Then you can rewrite the copy loop and possibly even the bits/base decoder to optimize for that. Shouldn't reduce compression much. |
| |
cadaver
Registered: Feb 2002 Posts: 1160 |
Actually I just did that :)
Though it seemed that it would still generate longer RLE sequences, so I patched Exomizer2 a bit to honor the -M parameters also for RLE. |
| |
tlr
Registered: Sep 2003 Posts: 1790 |
So did you get any gain from doing that? It's going to generate a few more primary units so perhaps the increased number of bits used for encoding those eats up the gain in the loop? |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Surely the 'perfect' interleave would depend on the data being decompressed? I'd expect a bunch of 'copy substrings' would consume a lot more cycles per byte of input than a chunk of literals. |
| |
lft
Registered: Jul 2007 Posts: 369 |
Next up: Make a cruncher that simulates disk access times, and weighs that into the cost function that determines when to produce a copy or a literal unit. Too bad access times aren't really predictable. |
| |
cadaver
Registered: Feb 2002 Posts: 1160 |
tlr: I first checked how often it was generating those longer sequences, in my case it wasn't often (a few times per the longer files).
The changes have resulted in a few bytes longer data here and there, but at least not in any more used blocks.
I haven't been able to bump interleave down from 10, but the optimizations have allowed a larger "safety factor" ie. ingame loading will be slower as there are raster IRQs and possibly sprites on, and having a faster decompressor reduces the chances of missed revolution hickups.
EDIT: lft: rather than to cripple the compressed output for speed, it should be possible to build the diskimage dynamically by emulating the CPU during loading/depacking to guarantee optimal interleave. Don't think I'll be going there, but it's a possibility. |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
I'm with ChristopherJam here, and i don't see an actual benefit unless you really have constant depacking time on every block. That is pretty unlikely to be achieved, except maybe with prohibitive degradation of overall performance.
The goal is the shortest combined loading and depacking time, and given that you want to download a block from the drive just when it is ready, you do that and prioritise this over depacking. Now, it may happen that time is wasted when the blocks arrive in a non-linear order, such that the depacker is doing nothing between new blocks arriving, until the next block in the linear packed data stream is ready.
However, it is fairly simple to have an interleave that makes all blocks arrive in linear order (even when the loader/depacker support out-of-order loading for the eventual hiccup). Then you can depack as much as possible until the next block is ready, and after downloading it from the drive, go on depacking where you left before. No time is wasted, and the rest of the file will be depacked when all blocks have been loaded. |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
lft, cadaver: IMHO, the biggest problem when factoring in actual run-time data to optimise combined loading and depacking is that you have at least one big source of error: the inter-track skew.
As conventional tools to transfer disk images to physical disks (just like most copy programs) do not line up the blocks on every track to the same respective sectors (blocks and sectors are different things here, as the 1541 does not have an index hole sensor), individual copies will start reading a different block when going from one track to the next.
Thus, you will see different timings from one disk to the next, even with the same drive. |
... 23 posts hidden. Click here to view all posts.... |
Previous - 1 | 2 | 3 | 4 - Next |