| |
cadaver
Registered: Feb 2002 Posts: 1160 |
Exomizer on-the-fly loading/decompressing
Hey,
anyone want to share, what is the lowest disk interleave you've managed to use with on-the-fly Exomizer decompression while loading?
I'm currently at 11, using 2-bit transfer and a lame drivecode (using jobcodes only) + 1 sector buffering. However I don't think the drivecode is the problem; if I try to decrease to IL 10 the C64 often doesn't have to wait for the drive at all for the next sector's data, but occasionally the depack will take too long, resulting in missed revolution.
I've already done some optimization to the depack routine, including inlining getting a single bit (literal/sequence decision, and reading the gamma).
Don't think I would switch to another packer just for speed, but nevertheless interested in any battle stories. |
|
... 23 posts hidden. Click here to view all posts.... |
| |
cadaver
Registered: Feb 2002 Posts: 1160 |
Actually I just did that :)
Though it seemed that it would still generate longer RLE sequences, so I patched Exomizer2 a bit to honor the -M parameters also for RLE. |
| |
tlr
Registered: Sep 2003 Posts: 1790 |
So did you get any gain from doing that? It's going to generate a few more primary units so perhaps the increased number of bits used for encoding those eats up the gain in the loop? |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Surely the 'perfect' interleave would depend on the data being decompressed? I'd expect a bunch of 'copy substrings' would consume a lot more cycles per byte of input than a chunk of literals. |
| |
lft
Registered: Jul 2007 Posts: 369 |
Next up: Make a cruncher that simulates disk access times, and weighs that into the cost function that determines when to produce a copy or a literal unit. Too bad access times aren't really predictable. |
| |
cadaver
Registered: Feb 2002 Posts: 1160 |
tlr: I first checked how often it was generating those longer sequences, in my case it wasn't often (a few times per the longer files).
The changes have resulted in a few bytes longer data here and there, but at least not in any more used blocks.
I haven't been able to bump interleave down from 10, but the optimizations have allowed a larger "safety factor" ie. ingame loading will be slower as there are raster IRQs and possibly sprites on, and having a faster decompressor reduces the chances of missed revolution hickups.
EDIT: lft: rather than to cripple the compressed output for speed, it should be possible to build the diskimage dynamically by emulating the CPU during loading/depacking to guarantee optimal interleave. Don't think I'll be going there, but it's a possibility. |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
I'm with ChristopherJam here, and i don't see an actual benefit unless you really have constant depacking time on every block. That is pretty unlikely to be achieved, except maybe with prohibitive degradation of overall performance.
The goal is the shortest combined loading and depacking time, and given that you want to download a block from the drive just when it is ready, you do that and prioritise this over depacking. Now, it may happen that time is wasted when the blocks arrive in a non-linear order, such that the depacker is doing nothing between new blocks arriving, until the next block in the linear packed data stream is ready.
However, it is fairly simple to have an interleave that makes all blocks arrive in linear order (even when the loader/depacker support out-of-order loading for the eventual hiccup). Then you can depack as much as possible until the next block is ready, and after downloading it from the drive, go on depacking where you left before. No time is wasted, and the rest of the file will be depacked when all blocks have been loaded. |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
lft, cadaver: IMHO, the biggest problem when factoring in actual run-time data to optimise combined loading and depacking is that you have at least one big source of error: the inter-track skew.
As conventional tools to transfer disk images to physical disks (just like most copy programs) do not line up the blocks on every track to the same respective sectors (blocks and sectors are different things here, as the 1541 does not have an index hole sensor), individual copies will start reading a different block when going from one track to the next.
Thus, you will see different timings from one disk to the next, even with the same drive. |
| |
cadaver
Registered: Feb 2002 Posts: 1160 |
Krill: sure, the depacker cycle use measurement as I imagined would only be useful within a track, and only as long as the depack CPU time dominates.
In my case I need to minimize the loader/depacker resident code size so unfortunately that means multiple sector buffering and/or out-of-order loading are out of question for me. |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
Why are buffering multiple sectors and out-of-order loading out of the question for you?
The former is not a big issue, as the packed data can be stored at the end of the unpacked area, plus the safety margin to prevent unpacked data writes from happening before the packed data read on the same memory cell.
And out-of-order loading... i see no problem there at all, save maybe for some more resident code.
So the main downsides are a few bytes wasted for the safety margin at the end of each uncompressed file, and some resident code to handle ooo loading.
So why are those such big problems? :) |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Quoting cadaver
In my case I need to minimize the loader/depacker resident code size so unfortunately that means multiple sector buffering and/or out-of-order loading are out of question for me.
Regarding the size of the exomizer depacker (including tables) this would just be a small addition. Quite some of the out-of-order (and serialization) handling can be offloaded to the floppy, where you don't have much code yet, right? |
Previous - 1 | 2 | 3 | 4 - Next |