Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > CSDb Discussions > Loader Benchmarks
2020-05-04 13:51
Sparta

Registered: Feb 2017
Posts: 38
Loader Benchmarks

I do not intend to stir up the mud, but it’s been 5 years since the performance of commonly used fast loaders was last compared in this thread. Since then, Lft has updated Spindle, Krill has practically rewritten his loader, HCL has released ByteBoozer 2.0, Bitfire is past version 0.6, and I have released Sparkle (AKA “Chinese clone” :D, apologies to the Chinese sceners). I have been running tests for my own entertainment and figured I’d update Bitbreaker’s benchmark with the latest loader versions using his test files.

The graph below compares the following loader+packer combinations: Sparkle V1.4 and Spindle 2.3 using their own packers, Krill’s loader v184 with TinyCrunch 1.2 (TC), Bitnax (BN), and ByteBoozer 2.0 (B2), BoozeLoader 1.0 with ByteBoozer 2.0, and Bitfire 0.6+ downloaded from GitHub on April 9, 2020 with Bitnax. The sole purpose of this benchmark was to examine how fast these loaders can load and decompress 728 blocks of data in 18 files under different CPU loads (i.e. it does not test disk zones separately). I spent quite some time optimizing each loader’s performance as much as I could with preliminary trial runs in VICE with warp mode to find the best parameters. Thus, Spindle disks were built with the fast serial transfer protocol and in the case of Sparkle, Krill’s loader, Bitfire, and BoozeLoader, a custom interleave was implemented. Finally, the disk drive’s motor was left running during the tests with BoozeLoader and Bitfire. For each test I used the interleave that proved to be the fastest in the preliminary runs. All tests were performed the same way: the executable installs the loader and loads a small program from track 1, sector 0 (this way, seek time to track 1 is not included in the test) with a raster IRQ routine that blocks 0%, 25%, 50%, or 75% of the screen while loading and depacking. None of the files are loaded under the I/O area. Sparkle packed the files from 728 blocks to 442 (60.7%), Spindle to 456 (62.6%), TinyCrunch to 447 (61.4%), ByteBoozer and Bitnax to 390 blocks (53.6%). D64 images were transferred to the same floppy using Luigi Di Fraia’s IECHost connected to a 1571 disk drive. Each column in the graph represents the average ± 2SD of 10 consecutive tests on my C64C PAL + 1541-II disk drive combo (except for Krill+ByteBoozer in which case for some reason the test crushed 3 times requiring additional runs).



Interleaves (note: the files only occupy the first two speed zones on the disk):
Loader		0%		25%		50%		75%
Sparkle		4-4-4-4		5-4-4-4		7-4-4-4		7-4-4-4
Spindle		default		default		default		default
Krill		4-4-4-4		5-4-4-4		7-7-7-7		11-10-10-10
Bitfire		5-5-5-5		6-6-6-6		9-9-9-9		6-6-6-6
BoozeLoader	4-4-4-4		5-5-5-5		7-7-7-7		11-11-11-11

Feel free to interpret the data the way you want. Obviously, the authors of the other loaders and packers are way out of my hobby coder league, so I will not attempt to draw conclusions or pretend to have answers. But if anyone is interested, I’d be happy to share my test disk images and spreadsheets. Also, let me know if you want to see any other loader’s performance in the benchmark.

Finally, I am sure many of us do similar tests, so please feel free to post you own benchmarks with some description here.

Cheers, stay healthy and safe,
Sparta/OMG
 
... 14 posts hidden. Click here to view all posts....
 
2020-05-04 19:27
Krill

Registered: Apr 2002
Posts: 2839
Quoting Sparta
Also, I really hope no one takes my benchmark as a personal attack. :)
At least here, no offence taken. :)

But after investigating a little, it seems like it does not quite max out the performance of at least my loader, and in general does not illustrate more different scenarios than just CPU available to the loader.

Firstly, most files are rather small (2 tracks or so - if you do that in a demo, you don't care so much for speed anyways), which makes opening a file (including finding and loading the first block, in my case) pretty dominant in the overall cost and reduces the impact of sustained throughput.
This puts block-based compression (rather than stream-based) and fixed/known layout at an advantage.

Then, the faster tracks tend to be the upper tracks 18+ (native interleave 3, not 4, in my case), and so the files should end at track 35 rather than start at track 1, if maximum speed is to be the goal.

The tool to create the images seems a tad suspicious to me (or the used parameters). It seems to save files with correct CBM DOS interleaving behaviour, but puts every first file block on a new track to sector 1 rather than 0.
2020-05-04 20:10
Sparta

Registered: Feb 2017
Posts: 38
I appreciate you taking the time and checking my test disks. Let me try to answer your points the best can.

Quoting Krill
But after investigating a little, it seems like it does not quite max out the performance of at least my loader, and in general does not illustrate more different scenarios than just CPU available to the loader.

I agree, this was the sole purpose of this benchmark as stated my first post. :)

Quoting Krill
Firstly, most files are rather small (2 tracks or so - if you do that in a demo, you don't care so much for speed anyways), which makes opening a file (including finding and loading the first block, in my case) pretty dominant in the overall cost and reduces the impact of sustained throughput.
This puts block-based compression (rather than stream-based) and fixed/known layout at an advantage.

I personally also prefer larger files but wanted to assure backward compatibility so I decided to use Bitbreaker's test files that he kindly shared with me last year or so that were used in one of their demos.

Quoting Krill
Then, the faster tracks tend to be the upper tracks 18+ (native interleave 3, not 4, in my case), and so the files should end at track 35 rather than start at track 1, if maximum speed is to be the goal.

This is true for all the loaders. So I believe starting at track 1 is still a fair comparison. Of course, I could add more files to see what happens. But then again, no more backward compatibility. :)

Quoting Krill
The tool to create the images seems a tad suspicious to me (or the used parameters). It seems to save files with correct CBM DOS interleaving behaviour, but puts every first file block on a new track to sector 1 rather than 0.

I admit, this is my ad hoc VB tool. I used the following formula: once the last sector is used on a track (which is sector 19 in the case of track 1), I stepped to the next track, added the interleave to the last sector and subtracted the number of sectors if the result was equal or greater than the number of sectors on this track (19+4-21=2). I then subtracted an additional 1 in the case of tracks 1-17 if the result was greater than 0. I am using the same formula in Sparkle.
2020-05-04 20:19
Sparta

Registered: Feb 2017
Posts: 38
Re: native interleave. I did trial runs using 4-3-3-3 and 4-4-4-4 and chose the one that was faster. I did not investigate the why.

Thanks again!
2020-05-04 20:48
Krill

Registered: Apr 2002
Posts: 2839
Quoting Krill
does not illustrate more different scenarios than just CPU available to the loader.
Quoting Sparta
I agree, this was the sole purpose of this benchmark as stated my first post. :)
And yet, it fixes all other variables, not allowing for comparison of those, even if the goal were to compare speed vs. CPU again, but in another scenario (such as having big files only).

Not sure what you mean by backward compatibility. Is it to the benchmark of 5 years ago, to get comparable numbers with that?

I'd say that's a different benchmark, and numbers shall only be compared within one benchmark.

And again, having exactly one benchmark isn't so feasible, you'll make everyone optimise their tools for that and make people assume that these numbers would reflect every scenario.
2020-05-04 21:24
Sparta

Registered: Feb 2017
Posts: 38
Quoting Krill
And yet, it fixes all other variables, not allowing for comparison of those, even if the goal were to compare speed vs. CPU again, but in another scenario (such as having big files only).

For reproducibility, every test needs standardization. Since there are no generally accepted standards for loader benchmarks I did my best to standardize my tests to provide the same circumstances for comparability. I clearly stated these in my opening post. And again, these are not my own files, so I did not chose the size. I think this is realistic and fair to all the loaders as I did not pick files to favor Sparkle e.g.

Quoting Krill
Not sure what you mean by backward compatibility. Is it to the benchmark of 5 years ago, to get comparable numbers with that?

I'd say that's a different benchmark, and numbers shall only be compared within one benchmark.

It is still interesting to see how much faster your loader has become loading the same files (when the rest seems to be slower actually, except maybe BoozeLoader). :)

Quoting Krill
And again, having exactly one benchmark isn't so feasible, you'll make everyone optimise their tools for that and make people assume that these numbers would reflect every scenario.

I fully agree with you. So let me again encourage everyone to post their benchmarks with their own rules if you will. Or even better, how about a standard corpus of test files and rules for testing. I think it would be the best for everyone. :)
2020-05-04 23:29
Krill

Registered: Apr 2002
Posts: 2839
At least i have an idea now why BoozeLoader loads the corpus faster than mine, despite using the same compression algorithm. Will optimise the integration of ByteBoozer 2 a little, expecting to bump the speed on that one.

And yet, still can't find any checksumming in that loader... :)

Will first finish work on a different kind of loader, though, before coming back to this.
2020-05-05 00:49
Sparta

Registered: Feb 2017
Posts: 38
In the meantime, I am going to repeat the tests using your loader with the following additional specifications: test files will occupy the higher tracks (starting on track 35) and the first block on each track will be in sector 0. Additionally, with 0% CPU load, I am going to use interleaves 4-3-3-3. Please let me know if I understand it correctly or if you want to see any other optimizations. :)
2020-05-05 01:09
Krill

Registered: Apr 2002
Posts: 2839
Hold your horses until i have a new build ready, at least for the ByteBoozer 2 test.
And i'm still surprised that it's generally not faster than loading uncompressed, so will investigate a bit more.
2020-05-05 01:10
Krill

Registered: Apr 2002
Posts: 2839
Oh, and the same drive should be used both for writing the disks and running the tests.
2020-05-05 08:58
HCL

Registered: Feb 2003
Posts: 716
I don't like representing the brown column in those graphs, it's not my preferred choice of color and does not reflect to Booze Design in general. .. ;)
Previous - 1 | 2 | 3 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
Apollyon/ALD
Barfly/Extend
Didi/Laxity
Guests online: 129
Top Demos
1 Next Level  (9.8)
2 Mojo  (9.7)
3 Coma Light 13  (9.7)
4 Edge of Disgrace  (9.6)
5 Comaland 100%  (9.6)
6 No Bounds  (9.6)
7 Uncensored  (9.6)
8 Wonderland XIV  (9.6)
9 Bromance  (9.6)
10 Memento Mori  (9.6)
Top onefile Demos
1 It's More Fun to Com..  (9.7)
2 Party Elk 2  (9.7)
3 Cubic Dream  (9.6)
4 Copper Booze  (9.5)
5 Rainbow Connection  (9.5)
6 TRSAC, Gabber & Pebe..  (9.5)
7 Onscreen 5k  (9.5)
8 Wafer Demo  (9.5)
9 Dawnfall V1.1  (9.5)
10 Quadrants  (9.5)
Top Groups
1 Oxyron  (9.3)
2 Nostalgia  (9.3)
3 Booze Design  (9.3)
4 Censor Design  (9.3)
5 Crest  (9.3)
Top Swappers
1 Derbyshire Ram  (10)
2 Jerry  (9.8)
3 Violator  (9.8)
4 Acidchild  (9.7)
5 Starlight  (9.6)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.054 sec.