| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Release id #139503 : Spindle 2.0
So with the spin mode it was now easy to quickly do a speedtest with the files i usually test with (most of the files from cl13 side1).
It turns out that spindle nearly loads as fast as bitfire with on the fly depacking. While bitfire chews in the tracks a tad faster, it has to make breaks to finalize the depacking. So data arrives a bit too fast first and blocks pile up to be decrunched. Spindle manages to have a continuous flow due to its blockwise packing scheme here.
Therefore the 18 files used get squeezed down to 491 blocks, as with bitfire down to 391 blocks. So Spindle leeches an additional 100 blocks in about the time bitfire requires for additional depacking.
However, under load the speed of spindle turns down rapidly, with 25% cpu load it is no faster than krill's loader, with 75% load it takes eons to leech the 491 blocks in :-( What's happening there?!
When is the 50x version from Krill done? :-D HCL, what's the penis length of your loader? :-D
Results here. |
|
... 91 posts hidden. Click here to view all posts.... |
| |
doynax Account closed
Registered: Oct 2004 Posts: 212 |
Isn't it about time to settle on a standard loader test corpus?
Expecting a single developer to accurately compare his or her own carefully-tuned work to semi-documented off-the-shelf libraries seems dubious, whether in the C64 scene or in academia.
Admittedly no data set is ever going to be perfectly representable and certainly reliability, whether in the form of hardware compatibility or insensitivity to variance, would be penalized by a pure speed metric. It would still serve as an interesting challenge though.
The condition may also be modified slightly, say by introducing a random CPU load and a range of rotational speeds to accommodate.
Quoting JackAsserWouldn't it be possible to devise a compression method that guarantees no two consecutive zero-bits in a row, thus remove the need for GCR-encoding completely. I.e. each nibble read from the disk has a direct meaning to the decompressor? Possibly. You could certainly device a Huffman-esque binary prefix code avoiding unrepresentable sequences but it may be difficult to rival the performance of traditional LZ with literal bytes, especially in terms of RAM for tables/buffers. The real kicker is that you'd be shifting load from the drive to the main CPU, both due to additional decode work and transfer overhead.
Plus the 10x one-bit limit is surprisingly annoying.
Quoting BitbreakerOf course you can write each file with a different interleave, but you won't be able to place 2 files with different interleave on a same track. I don't see why not. Admittedly loaders without CBM linking or custom formatting would require per-track interleave tables.
The results of mixed strides will naturally be worse but a decent optimizer can still do a reasonable job of it. After all things never did line up perfectly even with a fixed interleave.
The ideal in-order loader would use a profiling pass with a dummy loader (e.g. a cartridge build) to measure the elapsed time between sector transfers. Data which would then be fed back to the layout optimizer. Perhaps refined with a programmable safety margin and priorities or deadline measurements to weigh non-critical files. |
| |
Krill
Registered: Apr 2002 Posts: 2981 |
Quoting ChristopherJamHow many bytes can we snaffle per raster line, and how many cpu-cycles does it take to acquire each byte? If you mean raw transfer power from drive to C-64, my rule of thumb is about 2 bytes per scanline. That is 28 cycles for the actual byte (the usual lda $dd00:lsr:lsr:ora $dd00:... business), times 2, plus a bit of overhead for storing and re-syncing once in a while. |
| |
Krill
Registered: Apr 2002 Posts: 2981 |
Quoting Oswaldfixed order loading is the key for all those loaders beating krill's one. you dont need extra time to wait for sectors or find out their order beforehand. It's not about fixed-order loading vs. out-of-order loading, it's about the last point you mentioned: knowing the sector layout beforehand. When there is no need to scan a track before actually loading anything, you can still load whatever file block happens to pass by and put its data at the right place in C-64 memory.
And yes, this is the main reason for the discussed speed differences. While today's loaders require about 4 to 5 revolutions to read an entire track, an extra scan revolution gives quite a bit of penalty compared to a carefully hand-crafted sector layout. |
| |
Krill
Registered: Apr 2002 Posts: 2981 |
Quoting doynaxThe ideal in-order loader would use a profiling pass with a dummy loader (e.g. a cartridge build) to measure the elapsed time between sector transfers. Data which would then be fed back to the layout optimizer. Perhaps refined with a programmable safety margin and priorities or deadline measurements to weigh non-critical files. And it probably requires a mastering process that would align the sectors. When transferring disk images with the usual tools these days, you end up with more or less random track-to-track offsets of the sectors. But you want the first block passing by on the next track to be the next one in your data stream. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Quoting HCLBtw, you are already cheating then because you're using a special d64-tool that places your files in track-order(!).
Try harder :-) I write out everything in interleave 4, no matter what, but i throw away that annoying t/s links. Spindle does about the same, and both have the best speed possible with it, out of the box, no further tweaking and tuning necessary. If you mean that i write files in a sequential order, everyone would do so, also your disk tool ;-) It's 2015. Why not going for more sophisticated stuff? Cross development entered demomaking, cross-packing did, emulators did, Makefiles did. Further more you could simply OOO enable your loader, there's enough mem free for that, and the few bytes you need on c64 side for it are easily saved. just as you were able to squeeze all in below the magic $200. No need to name the shortcomings of a technique a feature. Also: you skip checksumming. Other's might rant on that lack of feature :-D |
| |
Krill
Registered: Apr 2002 Posts: 2981 |
Quoting BitbreakerAlso: you skip checksumming. Other's might rant on that lack of feature :-D This is indeed rant-worthy and should be a major point of comparison when comparing speeds. While a lack of checksumming doesn't cause much trouble in a lab setting (= at home), in my experience, a party setting with lots and lots of signal interference is a different beast. But then this might just be biased perception (and i haven't conducted scientifically sound tests on that :D). |
| |
chatGPZ
Registered: Dec 2001 Posts: 11390 |
i can atleast say that HCL loader is very bitchy about subtle timing differences, and it will then break because of the missing checksumming :) |
| |
Danzig
Registered: Jun 2002 Posts: 440 |
HCL - Hezno Chagsam Lohda |
| |
HCL
Registered: Feb 2003 Posts: 728 |
Quote:Also: you skip checksumming. Other's might rant on that lack of feature :-D That was an easy one, just hand it out to the checksum top dog (=Krill) and he'll smash it in :P. ..and the ranting continues :D. No, Doynax, there is no place for a standard loader test, what would be doing then all day?!? ;).
Honestly, i can't remember the last time a BoozeDesign demo hanged or crashed in a compo (or ever did?). But it is an unfair compare, yes..
@Groepaz: Are you serious, or did i just mis-interpret that smiley!? |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Quote: Quoting ChristopherJamHow many bytes can we snaffle per raster line, and how many cpu-cycles does it take to acquire each byte? If you mean raw transfer power from drive to C-64, my rule of thumb is about 2 bytes per scanline. That is 28 cycles for the actual byte (the usual lda $dd00:lsr:lsr:ora $dd00:... business), times 2, plus a bit of overhead for storing and re-syncing once in a while.
That makes sense for the raw transfer, but I was trying to graph the time including decompression, waiting for data, synchronising etc.
Working assumption was that Bitbreaker's frame counts were for spending 312*(100-load)/100 rasters per frame on loading. Did I get that right? |
Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 - Next |