| |
Sparta
Registered: Feb 2017 Posts: 49 |
Loader Benchmarks
I do not intend to stir up the mud, but it’s been 5 years since the performance of commonly used fast loaders was last compared in this thread. Since then, Lft has updated Spindle, Krill has practically rewritten his loader, HCL has released ByteBoozer 2.0, Bitfire is past version 0.6, and I have released Sparkle (AKA “Chinese clone” :D, apologies to the Chinese sceners). I have been running tests for my own entertainment and figured I’d update Bitbreaker’s benchmark with the latest loader versions using his test files.
The graph below compares the following loader+packer combinations: Sparkle V1.4 and Spindle 2.3 using their own packers, Krill’s loader v184 with TinyCrunch 1.2 (TC), Bitnax (BN), and ByteBoozer 2.0 (B2), BoozeLoader 1.0 with ByteBoozer 2.0, and Bitfire 0.6+ downloaded from GitHub on April 9, 2020 with Bitnax. The sole purpose of this benchmark was to examine how fast these loaders can load and decompress 728 blocks of data in 18 files under different CPU loads (i.e. it does not test disk zones separately). I spent quite some time optimizing each loader’s performance as much as I could with preliminary trial runs in VICE with warp mode to find the best parameters. Thus, Spindle disks were built with the fast serial transfer protocol and in the case of Sparkle, Krill’s loader, Bitfire, and BoozeLoader, a custom interleave was implemented. Finally, the disk drive’s motor was left running during the tests with BoozeLoader and Bitfire. For each test I used the interleave that proved to be the fastest in the preliminary runs. All tests were performed the same way: the executable installs the loader and loads a small program from track 1, sector 0 (this way, seek time to track 1 is not included in the test) with a raster IRQ routine that blocks 0%, 25%, 50%, or 75% of the screen while loading and depacking. None of the files are loaded under the I/O area. Sparkle packed the files from 728 blocks to 442 (60.7%), Spindle to 456 (62.6%), TinyCrunch to 447 (61.4%), ByteBoozer and Bitnax to 390 blocks (53.6%). D64 images were transferred to the same floppy using Luigi Di Fraia’s IECHost connected to a 1571 disk drive. Each column in the graph represents the average ± 2SD of 10 consecutive tests on my C64C PAL + 1541-II disk drive combo (except for Krill+ByteBoozer in which case for some reason the test crushed 3 times requiring additional runs).
![](https://lh3.googleusercontent.com/ZVxDoqYyIV_nK8nAJY7NhjqKB5h75s1_k_U8fa8W2SY0owEHqGHaFgru2uFRAWyfRnVa3bcqz502h29YoXn_v140PGzrde_Fd9XX8RWDAu_nDaMcll8=w1280)
Interleaves (note: the files only occupy the first two speed zones on the disk):Loader 0% 25% 50% 75%
Sparkle 4-4-4-4 5-4-4-4 7-4-4-4 7-4-4-4
Spindle default default default default
Krill 4-4-4-4 5-4-4-4 7-7-7-7 11-10-10-10
Bitfire 5-5-5-5 6-6-6-6 9-9-9-9 6-6-6-6
BoozeLoader 4-4-4-4 5-5-5-5 7-7-7-7 11-11-11-11
Feel free to interpret the data the way you want. Obviously, the authors of the other loaders and packers are way out of my hobby coder league, so I will not attempt to draw conclusions or pretend to have answers. But if anyone is interested, I’d be happy to share my test disk images and spreadsheets. Also, let me know if you want to see any other loader’s performance in the benchmark.
Finally, I am sure many of us do similar tests, so please feel free to post you own benchmarks with some description here.
Cheers, stay healthy and safe,
Sparta/OMG |
|
| |
Sparta
Registered: Feb 2017 Posts: 49 |
Edit:
Here is the graph again:
![](https://lh6.googleusercontent.com/w6KsyqAPRUte6wOCAzn0mSntFMZ888FLvCB0qHD2An4O0MV-P9OMmnyRaUtL0fo)
If it still does not work please use this link:
https://sites.google.com/view/c64loaderbenchmark |
| |
Sparta
Registered: Feb 2017 Posts: 49 |
And here is another way to look at it:
![](https://lh6.googleusercontent.com/oPjdh_N-geydk0SR5SndPuuUdKJjlh3JvQeCmK6nwczZagLGcOifagEdJfI1RA4) |
| |
Krill
Registered: Apr 2002 Posts: 3003 |
I'd prefer graphs where higher bars mean better, i.e., the first.
But i find it very surprising that my loader (with a raw-loading peak performance of 7.7 KB/s) is slower overall with loading compressed files - a mere 7-ish KB/s with 100% CPU for the loader.
The test that comes with Krill's Loader, Repository Version 184 shows significantly higher throughput at about 11 to 13 KB/s for compressed files, as seen in the screenshots.
Do you perhaps load first and then decompress rather than using the combined load+depack call?
Oh, and there is a way to have the motor spin continuously, but it's not documented because i don't like that practice, and it doesn't matter much for performance in back-to-back loading of files anyways.
And is there a reason to give out the disk images on request only, rather than linking them as well for anybody to check out? |
| |
Sparta
Registered: Feb 2017 Posts: 49 |
Thank you for you reply, Krill. :)
LOAD_COMPD_API is set to 1, everything else is 0 in the basic and extended features sections of the config file. I presume this means on-the-fly decompression. Please let me know if there is a better way to do it.
No, there is no specific reason to give out the test disks on request other than I did not know if anyone was interested and wasn't sure what common practice was. :) I simply thought this was a good way to start conversation. So without further ado:
https://drive.google.com/open?id=1lNOFWJtqJCuRvg5uQqRpGcmUAH92m..
Please note that results are not necessarily the same in VICE and on real HW. My personal experience is that with 0% CPU load e.g. there can be an up 100-frame difference.
Also, I really hope no one takes my benchmark as a personal attack. :)
Cheers,
Sparta |
| |
Krill
Registered: Apr 2002 Posts: 3003 |
Quoting SpartaAlso, I really hope no one takes my benchmark as a personal attack. :) At least here, no offence taken. :)
But after investigating a little, it seems like it does not quite max out the performance of at least my loader, and in general does not illustrate more different scenarios than just CPU available to the loader.
Firstly, most files are rather small (2 tracks or so - if you do that in a demo, you don't care so much for speed anyways), which makes opening a file (including finding and loading the first block, in my case) pretty dominant in the overall cost and reduces the impact of sustained throughput.
This puts block-based compression (rather than stream-based) and fixed/known layout at an advantage.
Then, the faster tracks tend to be the upper tracks 18+ (native interleave 3, not 4, in my case), and so the files should end at track 35 rather than start at track 1, if maximum speed is to be the goal.
The tool to create the images seems a tad suspicious to me (or the used parameters). It seems to save files with correct CBM DOS interleaving behaviour, but puts every first file block on a new track to sector 1 rather than 0. |
| |
Sparta
Registered: Feb 2017 Posts: 49 |
I appreciate you taking the time and checking my test disks. Let me try to answer your points the best can.
Quoting KrillBut after investigating a little, it seems like it does not quite max out the performance of at least my loader, and in general does not illustrate more different scenarios than just CPU available to the loader.
I agree, this was the sole purpose of this benchmark as stated my first post. :)
Quoting KrillFirstly, most files are rather small (2 tracks or so - if you do that in a demo, you don't care so much for speed anyways), which makes opening a file (including finding and loading the first block, in my case) pretty dominant in the overall cost and reduces the impact of sustained throughput.
This puts block-based compression (rather than stream-based) and fixed/known layout at an advantage.
I personally also prefer larger files but wanted to assure backward compatibility so I decided to use Bitbreaker's test files that he kindly shared with me last year or so that were used in one of their demos.
Quoting KrillThen, the faster tracks tend to be the upper tracks 18+ (native interleave 3, not 4, in my case), and so the files should end at track 35 rather than start at track 1, if maximum speed is to be the goal.
This is true for all the loaders. So I believe starting at track 1 is still a fair comparison. Of course, I could add more files to see what happens. But then again, no more backward compatibility. :)
Quoting KrillThe tool to create the images seems a tad suspicious to me (or the used parameters). It seems to save files with correct CBM DOS interleaving behaviour, but puts every first file block on a new track to sector 1 rather than 0.
I admit, this is my ad hoc VB tool. I used the following formula: once the last sector is used on a track (which is sector 19 in the case of track 1), I stepped to the next track, added the interleave to the last sector and subtracted the number of sectors if the result was equal or greater than the number of sectors on this track (19+4-21=2). I then subtracted an additional 1 in the case of tracks 1-17 if the result was greater than 0. I am using the same formula in Sparkle. |
| |
Sparta
Registered: Feb 2017 Posts: 49 |
Re: native interleave. I did trial runs using 4-3-3-3 and 4-4-4-4 and chose the one that was faster. I did not investigate the why.
Thanks again! |
| |
Krill
Registered: Apr 2002 Posts: 3003 |
Quoting Krill does not illustrate more different scenarios than just CPU available to the loader. Quoting SpartaI agree, this was the sole purpose of this benchmark as stated my first post. :) And yet, it fixes all other variables, not allowing for comparison of those, even if the goal were to compare speed vs. CPU again, but in another scenario (such as having big files only).
Not sure what you mean by backward compatibility. Is it to the benchmark of 5 years ago, to get comparable numbers with that?
I'd say that's a different benchmark, and numbers shall only be compared within one benchmark.
And again, having exactly one benchmark isn't so feasible, you'll make everyone optimise their tools for that and make people assume that these numbers would reflect every scenario. |
| |
Sparta
Registered: Feb 2017 Posts: 49 |
Quoting KrillAnd yet, it fixes all other variables, not allowing for comparison of those, even if the goal were to compare speed vs. CPU again, but in another scenario (such as having big files only).
For reproducibility, every test needs standardization. Since there are no generally accepted standards for loader benchmarks I did my best to standardize my tests to provide the same circumstances for comparability. I clearly stated these in my opening post. And again, these are not my own files, so I did not chose the size. I think this is realistic and fair to all the loaders as I did not pick files to favor Sparkle e.g.
Quoting KrillNot sure what you mean by backward compatibility. Is it to the benchmark of 5 years ago, to get comparable numbers with that?
I'd say that's a different benchmark, and numbers shall only be compared within one benchmark.
It is still interesting to see how much faster your loader has become loading the same files (when the rest seems to be slower actually, except maybe BoozeLoader). :)
Quoting KrillAnd again, having exactly one benchmark isn't so feasible, you'll make everyone optimise their tools for that and make people assume that these numbers would reflect every scenario.
I fully agree with you. So let me again encourage everyone to post their benchmarks with their own rules if you will. Or even better, how about a standard corpus of test files and rules for testing. I think it would be the best for everyone. :) |
| |
Krill
Registered: Apr 2002 Posts: 3003 |
At least i have an idea now why BoozeLoader loads the corpus faster than mine, despite using the same compression algorithm. Will optimise the integration of ByteBoozer 2 a little, expecting to bump the speed on that one.
And yet, still can't find any checksumming in that loader... :)
Will first finish work on a different kind of loader, though, before coming back to this. |
| |
Sparta
Registered: Feb 2017 Posts: 49 |
In the meantime, I am going to repeat the tests using your loader with the following additional specifications: test files will occupy the higher tracks (starting on track 35) and the first block on each track will be in sector 0. Additionally, with 0% CPU load, I am going to use interleaves 4-3-3-3. Please let me know if I understand it correctly or if you want to see any other optimizations. :) |
| |
Krill
Registered: Apr 2002 Posts: 3003 |
Hold your horses until i have a new build ready, at least for the ByteBoozer 2 test.
And i'm still surprised that it's generally not faster than loading uncompressed, so will investigate a bit more. |
| |
Krill
Registered: Apr 2002 Posts: 3003 |
Oh, and the same drive should be used both for writing the disks and running the tests. |
| |
HCL
Registered: Feb 2003 Posts: 728 |
I don't like representing the brown column in those graphs, it's not my preferred choice of color and does not reflect to Booze Design in general. .. ;) |
| |
Oswald
Registered: Apr 2002 Posts: 5109 |
nice to see the competetiveness still in you guys :D |
| |
Krill
Registered: Apr 2002 Posts: 3003 |
Quoting HCLI don't like representing the brown column in those graphs, it's not my preferred choice of color and does not reflect to Booze Design in general. .. ;) Which of the 16 shades of brown we have on display is your preferred choice, then? :) |
| |
Sparta
Registered: Feb 2017 Posts: 49 |
Quoting HCLI don't like representing the brown column in those graphs, it's not my preferred choice of color and does not reflect to Booze Design in general. .. ;)
OK HCL, tell me the color of your Booze and I will... Never mind. ;) |
| |
HCL
Registered: Feb 2003 Posts: 728 |
Despite the brown.. It would be kinda interesting to see the compression ratio in this comparison. I guess the three to the left are all using crippled compression due to tiny search window, or? |
| |
Krill
Registered: Apr 2002 Posts: 3003 |
Quoting HCLDespite the brown.. It would be kinda interesting to see the compression ratio in this comparison. I guess the three to the left are all using crippled compression due to tiny search window, or? OP did mention the compression ratios for this particular corpus.
And for tinycrunch, look here: https://codebase64.org/doku.php?id=base:compression_benchmarks - it's in the 60% compressed size cluster, while ByteBoozer 2 is in the 45-ish% cluster.
But for demos, compression ratio doesn't matter so much as overall throughput. Disk images are cheap. :D |
| |
Frantic
Registered: Mar 2003 Posts: 1650 |
Flipping disk( image)s when watching demos is hard work though. |
| |
tlr
Registered: Sep 2003 Posts: 1793 |
Quote: Flipping disk( image)s when watching demos is hard work though.
Not compared to serial party copying using 15 Seconds Copy 35 Tracks V2.36 after being awake 48 hours. ;) |
| |
Danzig
Registered: Jun 2002 Posts: 443 |
Quote: Not compared to serial party copying using 15 Seconds Copy 35 Tracks V2.36 after being awake 48 hours. ;)
Avantgarde houseparty 1995: Robocop and Jack Alien copying box after box after box... 25y ago already... |
| |
Raistlin
Registered: Mar 2007 Posts: 697 |
Worth noting that the end part of Memento Mori would never have been possible if it wasn’t for the performance of Sparkle. We originally had this part running at 16.666fps (1 pixel per 3 frames). But, with Sparta’s help and the new versions of Sparkle, we were able to push it that little bit more and get 25fps (1 pixel per 2 frames).
I can’t speak enough volumes about how good this loader is. It’s really a game changer for us.
That said, it’s not just about the loader. You can’t just throw data at these loaders and expect them to do magic. Having Sparta analysing the loading patterns for us, working out the best interleave methods, the best data forms to get the best from the compression, etc etc.. that was all incredibly invaluable. |
| |
Krill
Registered: Apr 2002 Posts: 3003 |
Now, if only i weren't busy working on that new kind of fastloader, i'd update the IRQ loader to catch up in that benchmark (also found some new tricks on the way).
But alas, first things first. :) |