[CSDb] - User Forums - Release id #139503 : Spindle 2.0

You are not logged in - nap

CSDb User Forums

Forums > CSDb Entries > Release id #139503 : Spindle 2.0

2015-07-01 11:59

Bitbreaker

Registered: Oct 2002
Posts: 508

Release id #139503 : Spindle 2.0

So with the spin mode it was now easy to quickly do a speedtest with the files i usually test with (most of the files from cl13 side1).
It turns out that spindle nearly loads as fast as bitfire with on the fly depacking. While bitfire chews in the tracks a tad faster, it has to make breaks to finalize the depacking. So data arrives a bit too fast first and blocks pile up to be decrunched. Spindle manages to have a continuous flow due to its blockwise packing scheme here.
Therefore the 18 files used get squeezed down to 491 blocks, as with bitfire down to 391 blocks. So Spindle leeches an additional 100 blocks in about the time bitfire requires for additional depacking.
However, under load the speed of spindle turns down rapidly, with 25% cpu load it is no faster than krill's loader, with 75% load it takes eons to leech the 491 blocks in :-( What's happening there?!
When is the 50x version from Krill done? :-D HCL, what's the penis length of your loader? :-D

Results here.

... 91 posts hidden. Click here to view all posts....

2015-07-07 10:22

Krill

Registered: Apr 2002
Posts: 2980

Quoting ChristopherJam

How many bytes can we snaffle per raster line, and how many cpu-cycles does it take to acquire each byte?

If you mean raw transfer power from drive to C-64, my rule of thumb is about 2 bytes per scanline. That is 28 cycles for the actual byte (the usual lda $dd00:lsr:lsr:ora $dd00:... business), times 2, plus a bit of overhead for storing and re-syncing once in a while.

2015-07-07 10:33

Krill

Registered: Apr 2002
Posts: 2980

Quoting Oswald

fixed order loading is the key for all those loaders beating krill's one. you dont need extra time to wait for sectors or find out their order beforehand.

It's not about fixed-order loading vs. out-of-order loading, it's about the last point you mentioned: knowing the sector layout beforehand. When there is no need to scan a track before actually loading anything, you can still load whatever file block happens to pass by and put its data at the right place in C-64 memory.

And yes, this is the main reason for the discussed speed differences. While today's loaders require about 4 to 5 revolutions to read an entire track, an extra scan revolution gives quite a bit of penalty compared to a carefully hand-crafted sector layout.

2015-07-07 10:41

Krill

Registered: Apr 2002
Posts: 2980

Quoting doynax

The ideal in-order loader would use a profiling pass with a dummy loader (e.g. a cartridge build) to measure the elapsed time between sector transfers. Data which would then be fed back to the layout optimizer. Perhaps refined with a programmable safety margin and priorities or deadline measurements to weigh non-critical files.

And it probably requires a mastering process that would align the sectors. When transferring disk images with the usual tools these days, you end up with more or less random track-to-track offsets of the sectors. But you want the first block passing by on the next track to be the next one in your data stream.

2015-07-07 10:45

Bitbreaker

Registered: Oct 2002
Posts: 508

Quoting HCL

Btw, you are already cheating then because you're using a special d64-tool that places your files in track-order(!).

Try harder :-) I write out everything in interleave 4, no matter what, but i throw away that annoying t/s links. Spindle does about the same, and both have the best speed possible with it, out of the box, no further tweaking and tuning necessary. If you mean that i write files in a sequential order, everyone would do so, also your disk tool ;-) It's 2015. Why not going for more sophisticated stuff? Cross development entered demomaking, cross-packing did, emulators did, Makefiles did. Further more you could simply OOO enable your loader, there's enough mem free for that, and the few bytes you need on c64 side for it are easily saved. just as you were able to squeeze all in below the magic $200. No need to name the shortcomings of a technique a feature. Also: you skip checksumming. Other's might rant on that lack of feature :-D

2015-07-07 10:55

Krill

Registered: Apr 2002
Posts: 2980

Quoting Bitbreaker

Also: you skip checksumming. Other's might rant on that lack of feature :-D

This is indeed rant-worthy and should be a major point of comparison when comparing speeds. While a lack of checksumming doesn't cause much trouble in a lab setting (= at home), in my experience, a party setting with lots and lots of signal interference is a different beast. But then this might just be biased perception (and i haven't conducted scientifically sound tests on that :D).

2015-07-07 10:59

chatGPZ

Registered: Dec 2001
Posts: 11386

i can atleast say that HCL loader is very bitchy about subtle timing differences, and it will then break because of the missing checksumming :)

2015-07-07 11:11

Danzig

Registered: Jun 2002
Posts: 440

HCL - Hezno Chagsam Lohda

2015-07-07 13:55

HCL

Registered: Feb 2003
Posts: 728

Quote:

Also: you skip checksumming. Other's might rant on that lack of feature :-D

That was an easy one, just hand it out to the checksum top dog (=Krill) and he'll smash it in :P. ..and the ranting continues :D. No, Doynax, there is no place for a standard loader test, what would be doing then all day?!? ;).

Honestly, i can't remember the last time a BoozeDesign demo hanged or crashed in a compo (or ever did?). But it is an unfair compare, yes..

@Groepaz: Are you serious, or did i just mis-interpret that smiley!?

2015-07-07 14:05

ChristopherJam

Registered: Aug 2004
Posts: 1409

Quote: Quoting ChristopherJam
How many bytes can we snaffle per raster line, and how many cpu-cycles does it take to acquire each byte?
If you mean raw transfer power from drive to C-64, my rule of thumb is about 2 bytes per scanline. That is 28 cycles for the actual byte (the usual lda $dd00:lsr:lsr:ora $dd00:... business), times 2, plus a bit of overhead for storing and re-syncing once in a while.

That makes sense for the raw transfer, but I was trying to graph the time including decompression, waiting for data, synchronising etc.

Working assumption was that Bitbreaker's frame counts were for spending 312*(100-load)/100 rasters per frame on loading. Did I get that right?

2015-07-07 14:06

Krill

Registered: Apr 2002
Posts: 2980

Quoting HCL

Honestly, i can't remember the last time a BoozeDesign demo hanged or crashed in a compo (or ever did?). But it is an unfair compare, yes..

Ah, i wasn't thinking about the compo itself, where the machine is run under semi-lab conditions :) Was thinking about C-64s running demos while sitting in row 10, seat 5, at a party like Revision, or in some dark basement with a hundred other C64s, like at Datastorm.. :)

And yes, i took that bait so willingly, it was just hanging so low and an easy bite :D