[CSDb] - User Forums - Loader Benchmarks

Welcome to our latest new user Sensorium ! (Registered 2025-02-17)

You are not logged in - nap

CSDb User Forums

Forums > CSDb Discussions > Loader Benchmarks

2020-05-04 13:51

Sparta

Registered: Feb 2017
Posts: 49

Loader Benchmarks

I do not intend to stir up the mud, but it’s been 5 years since the performance of commonly used fast loaders was last compared in this thread. Since then, Lft has updated Spindle, Krill has practically rewritten his loader, HCL has released ByteBoozer 2.0, Bitfire is past version 0.6, and I have released Sparkle (AKA “Chinese clone” :D, apologies to the Chinese sceners). I have been running tests for my own entertainment and figured I’d update Bitbreaker’s benchmark with the latest loader versions using his test files.

The graph below compares the following loader+packer combinations: Sparkle V1.4 and Spindle 2.3 using their own packers, Krill’s loader v184 with TinyCrunch 1.2 (TC), Bitnax (BN), and ByteBoozer 2.0 (B2), BoozeLoader 1.0 with ByteBoozer 2.0, and Bitfire 0.6+ downloaded from GitHub on April 9, 2020 with Bitnax. The sole purpose of this benchmark was to examine how fast these loaders can load and decompress 728 blocks of data in 18 files under different CPU loads (i.e. it does not test disk zones separately). I spent quite some time optimizing each loader’s performance as much as I could with preliminary trial runs in VICE with warp mode to find the best parameters. Thus, Spindle disks were built with the fast serial transfer protocol and in the case of Sparkle, Krill’s loader, Bitfire, and BoozeLoader, a custom interleave was implemented. Finally, the disk drive’s motor was left running during the tests with BoozeLoader and Bitfire. For each test I used the interleave that proved to be the fastest in the preliminary runs. All tests were performed the same way: the executable installs the loader and loads a small program from track 1, sector 0 (this way, seek time to track 1 is not included in the test) with a raster IRQ routine that blocks 0%, 25%, 50%, or 75% of the screen while loading and depacking. None of the files are loaded under the I/O area. Sparkle packed the files from 728 blocks to 442 (60.7%), Spindle to 456 (62.6%), TinyCrunch to 447 (61.4%), ByteBoozer and Bitnax to 390 blocks (53.6%). D64 images were transferred to the same floppy using Luigi Di Fraia’s IECHost connected to a 1571 disk drive. Each column in the graph represents the average ± 2SD of 10 consecutive tests on my C64C PAL + 1541-II disk drive combo (except for Krill+ByteBoozer in which case for some reason the test crushed 3 times requiring additional runs).

Interleaves (note: the files only occupy the first two speed zones on the disk):

Loader		0%		25%		50%		75%
Sparkle		4-4-4-4		5-4-4-4		7-4-4-4		7-4-4-4
Spindle		default		default		default		default
Krill		4-4-4-4		5-4-4-4		7-7-7-7		11-10-10-10
Bitfire		5-5-5-5		6-6-6-6		9-9-9-9		6-6-6-6
BoozeLoader	4-4-4-4		5-5-5-5		7-7-7-7		11-11-11-11

Feel free to interpret the data the way you want. Obviously, the authors of the other loaders and packers are way out of my hobby coder league, so I will not attempt to draw conclusions or pretend to have answers. But if anyone is interested, I’d be happy to share my test disk images and spreadsheets. Also, let me know if you want to see any other loader’s performance in the benchmark.

Finally, I am sure many of us do similar tests, so please feel free to post you own benchmarks with some description here.

Cheers, stay healthy and safe,
Sparta/OMG

2020-05-04 14:23

Sparta

Registered: Feb 2017
Posts: 49

Edit:

Here is the graph again:

If it still does not work please use this link:

https://sites.google.com/view/c64loaderbenchmark

2020-05-04 16:28

Sparta

Registered: Feb 2017
Posts: 49

And here is another way to look at it:

2020-05-04 16:46

Krill

Registered: Apr 2002
Posts: 3003

I'd prefer graphs where higher bars mean better, i.e., the first.

But i find it very surprising that my loader (with a raw-loading peak performance of 7.7 KB/s) is slower overall with loading compressed files - a mere 7-ish KB/s with 100% CPU for the loader.
The test that comes with Krill's Loader, Repository Version 184 shows significantly higher throughput at about 11 to 13 KB/s for compressed files, as seen in the screenshots.

Do you perhaps load first and then decompress rather than using the combined load+depack call?

Oh, and there is a way to have the motor spin continuously, but it's not documented because i don't like that practice, and it doesn't matter much for performance in back-to-back loading of files anyways.

And is there a reason to give out the disk images on request only, rather than linking them as well for anybody to check out?

2020-05-04 17:49

Sparta

Registered: Feb 2017
Posts: 49

Thank you for you reply, Krill. :)

LOAD_COMPD_API is set to 1, everything else is 0 in the basic and extended features sections of the config file. I presume this means on-the-fly decompression. Please let me know if there is a better way to do it.

No, there is no specific reason to give out the test disks on request other than I did not know if anyone was interested and wasn't sure what common practice was. :) I simply thought this was a good way to start conversation. So without further ado:

https://drive.google.com/open?id=1lNOFWJtqJCuRvg5uQqRpGcmUAH92m..

Please note that results are not necessarily the same in VICE and on real HW. My personal experience is that with 0% CPU load e.g. there can be an up 100-frame difference.

Also, I really hope no one takes my benchmark as a personal attack. :)

Cheers,

Sparta

2020-05-04 19:27

Krill

Registered: Apr 2002
Posts: 3003

Quoting Sparta

Also, I really hope no one takes my benchmark as a personal attack. :)

At least here, no offence taken. :)

But after investigating a little, it seems like it does not quite max out the performance of at least my loader, and in general does not illustrate more different scenarios than just CPU available to the loader.

Firstly, most files are rather small (2 tracks or so - if you do that in a demo, you don't care so much for speed anyways), which makes opening a file (including finding and loading the first block, in my case) pretty dominant in the overall cost and reduces the impact of sustained throughput.
This puts block-based compression (rather than stream-based) and fixed/known layout at an advantage.

Then, the faster tracks tend to be the upper tracks 18+ (native interleave 3, not 4, in my case), and so the files should end at track 35 rather than start at track 1, if maximum speed is to be the goal.

The tool to create the images seems a tad suspicious to me (or the used parameters). It seems to save files with correct CBM DOS interleaving behaviour, but puts every first file block on a new track to sector 1 rather than 0.

2020-05-04 20:10

Sparta

Registered: Feb 2017
Posts: 49

I appreciate you taking the time and checking my test disks. Let me try to answer your points the best can.

Quoting Krill

But after investigating a little, it seems like it does not quite max out the performance of at least my loader, and in general does not illustrate more different scenarios than just CPU available to the loader.

I agree, this was the sole purpose of this benchmark as stated my first post. :)

Quoting Krill

Firstly, most files are rather small (2 tracks or so - if you do that in a demo, you don't care so much for speed anyways), which makes opening a file (including finding and loading the first block, in my case) pretty dominant in the overall cost and reduces the impact of sustained throughput.
This puts block-based compression (rather than stream-based) and fixed/known layout at an advantage.

I personally also prefer larger files but wanted to assure backward compatibility so I decided to use Bitbreaker's test files that he kindly shared with me last year or so that were used in one of their demos.

Quoting Krill

Then, the faster tracks tend to be the upper tracks 18+ (native interleave 3, not 4, in my case), and so the files should end at track 35 rather than start at track 1, if maximum speed is to be the goal.

This is true for all the loaders. So I believe starting at track 1 is still a fair comparison. Of course, I could add more files to see what happens. But then again, no more backward compatibility. :)

Quoting Krill

The tool to create the images seems a tad suspicious to me (or the used parameters). It seems to save files with correct CBM DOS interleaving behaviour, but puts every first file block on a new track to sector 1 rather than 0.

I admit, this is my ad hoc VB tool. I used the following formula: once the last sector is used on a track (which is sector 19 in the case of track 1), I stepped to the next track, added the interleave to the last sector and subtracted the number of sectors if the result was equal or greater than the number of sectors on this track (19+4-21=2). I then subtracted an additional 1 in the case of tracks 1-17 if the result was greater than 0. I am using the same formula in Sparkle.

2020-05-04 20:19

Sparta

Registered: Feb 2017
Posts: 49

Re: native interleave. I did trial runs using 4-3-3-3 and 4-4-4-4 and chose the one that was faster. I did not investigate the why.

Thanks again!

2020-05-04 20:48

Krill

Registered: Apr 2002
Posts: 3003

Quoting Krill

does not illustrate more different scenarios than just CPU available to the loader.

Quoting Sparta

I agree, this was the sole purpose of this benchmark as stated my first post. :)

And yet, it fixes all other variables, not allowing for comparison of those, even if the goal were to compare speed vs. CPU again, but in another scenario (such as having big files only).

Not sure what you mean by backward compatibility. Is it to the benchmark of 5 years ago, to get comparable numbers with that?

I'd say that's a different benchmark, and numbers shall only be compared within one benchmark.

And again, having exactly one benchmark isn't so feasible, you'll make everyone optimise their tools for that and make people assume that these numbers would reflect every scenario.

2020-05-04 21:24

Sparta

Registered: Feb 2017
Posts: 49

Quoting Krill

And yet, it fixes all other variables, not allowing for comparison of those, even if the goal were to compare speed vs. CPU again, but in another scenario (such as having big files only).

For reproducibility, every test needs standardization. Since there are no generally accepted standards for loader benchmarks I did my best to standardize my tests to provide the same circumstances for comparability. I clearly stated these in my opening post. And again, these are not my own files, so I did not chose the size. I think this is realistic and fair to all the loaders as I did not pick files to favor Sparkle e.g.

Quoting Krill

Not sure what you mean by backward compatibility. Is it to the benchmark of 5 years ago, to get comparable numbers with that?

I'd say that's a different benchmark, and numbers shall only be compared within one benchmark.

It is still interesting to see how much faster your loader has become loading the same files (when the rest seems to be slower actually, except maybe BoozeLoader). :)

Quoting Krill

And again, having exactly one benchmark isn't so feasible, you'll make everyone optimise their tools for that and make people assume that these numbers would reflect every scenario.

I fully agree with you. So let me again encourage everyone to post their benchmarks with their own rules if you will. Or even better, how about a standard corpus of test files and rules for testing. I think it would be the best for everyone. :)

2020-05-04 23:29

Krill

Registered: Apr 2002
Posts: 3003

At least i have an idea now why BoozeLoader loads the corpus faster than mine, despite using the same compression algorithm. Will optimise the integration of ByteBoozer 2 a little, expecting to bump the speed on that one.

And yet, still can't find any checksumming in that loader... :)

Will first finish work on a different kind of loader, though, before coming back to this.

2020-05-05 00:49

Sparta

Registered: Feb 2017
Posts: 49

In the meantime, I am going to repeat the tests using your loader with the following additional specifications: test files will occupy the higher tracks (starting on track 35) and the first block on each track will be in sector 0. Additionally, with 0% CPU load, I am going to use interleaves 4-3-3-3. Please let me know if I understand it correctly or if you want to see any other optimizations. :)

2020-05-05 01:09

Krill

Registered: Apr 2002
Posts: 3003

Hold your horses until i have a new build ready, at least for the ByteBoozer 2 test.
And i'm still surprised that it's generally not faster than loading uncompressed, so will investigate a bit more.

2020-05-05 01:10

Krill

Registered: Apr 2002
Posts: 3003

Oh, and the same drive should be used both for writing the disks and running the tests.

2020-05-05 08:58

HCL

Registered: Feb 2003
Posts: 728

I don't like representing the brown column in those graphs, it's not my preferred choice of color and does not reflect to Booze Design in general. .. ;)

2020-05-05 10:01

Oswald

Registered: Apr 2002
Posts: 5109

nice to see the competetiveness still in you guys :D

2020-05-05 10:58

Krill

Registered: Apr 2002
Posts: 3003

Quoting HCL

I don't like representing the brown column in those graphs, it's not my preferred choice of color and does not reflect to Booze Design in general. .. ;)

Which of the 16 shades of brown we have on display is your preferred choice, then? :)

2020-05-05 18:05

Sparta

Registered: Feb 2017
Posts: 49

Quoting HCL

I don't like representing the brown column in those graphs, it's not my preferred choice of color and does not reflect to Booze Design in general. .. ;)

OK HCL, tell me the color of your Booze and I will... Never mind. ;)

2020-05-07 10:29

HCL

Registered: Feb 2003
Posts: 728

Despite the brown.. It would be kinda interesting to see the compression ratio in this comparison. I guess the three to the left are all using crippled compression due to tiny search window, or?

2020-05-07 11:06

Krill

Registered: Apr 2002
Posts: 3003

Quoting HCL

Despite the brown.. It would be kinda interesting to see the compression ratio in this comparison. I guess the three to the left are all using crippled compression due to tiny search window, or?

OP did mention the compression ratios for this particular corpus.

And for tinycrunch, look here: https://codebase64.org/doku.php?id=base:compression_benchmarks - it's in the 60% compressed size cluster, while ByteBoozer 2 is in the 45-ish% cluster.

But for demos, compression ratio doesn't matter so much as overall throughput. Disk images are cheap. :D

2020-05-07 11:21

Frantic

Registered: Mar 2003
Posts: 1650

Flipping disk( image)s when watching demos is hard work though.

2020-05-07 13:18

tlr

Registered: Sep 2003
Posts: 1793

Quote: Flipping disk( image)s when watching demos is hard work though.

Not compared to serial party copying using 15 Seconds Copy 35 Tracks V2.36 after being awake 48 hours. ;)

2020-05-08 12:26

Danzig

Registered: Jun 2002
Posts: 443

Quote: Not compared to serial party copying using 15 Seconds Copy 35 Tracks V2.36 after being awake 48 hours. ;)

Avantgarde houseparty 1995: Robocop and Jack Alien copying box after box after box... 25y ago already...

2020-09-20 14:49

Raistlin

Registered: Mar 2007
Posts: 697

Worth noting that the end part of Memento Mori would never have been possible if it wasn’t for the performance of Sparkle. We originally had this part running at 16.666fps (1 pixel per 3 frames). But, with Sparta’s help and the new versions of Sparkle, we were able to push it that little bit more and get 25fps (1 pixel per 2 frames).

I can’t speak enough volumes about how good this loader is. It’s really a game changer for us.

That said, it’s not just about the loader. You can’t just throw data at these loaders and expect them to do magic. Having Sparta analysing the loading patterns for us, working out the best interleave methods, the best data forms to get the best from the compression, etc etc.. that was all incredibly invaluable.

2020-09-20 15:44

Krill

Registered: Apr 2002
Posts: 3003

Now, if only i weren't busy working on that new kind of fastloader, i'd update the IRQ loader to catch up in that benchmark (also found some new tricks on the way).

But alas, first things first. :)

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

Hexhog
cobbpg
Luca/FIRE
CreaMD/React
bugjam
Neon Vincent
mutetus/Ald ^ Ons
Gordian
saimo/RETREAM
Alakran_64
Guests online: 148

Top Demos

1 Next Level  (9.7)
2 Codeboys & Endians  (9.7)
3 13:37  (9.7)
4 Coma Light 13  (9.6)
5 Mojo  (9.6)
6 Edge of Disgrace  (9.6)
7 Uncensored  (9.6)
8 Comaland 100%  (9.6)
9 Wonderland XIV  (9.6)
10 What Is The Matrix 2  (9.6)

Top onefile Demos

1 Nine  (9.7)
2 Layers  (9.6)
3 Party Elk 2  (9.6)
4 Cubic Dream  (9.6)
5 Ten  (9.6)
6 Copper Booze  (9.6)
7 Asking Nicely  (9.5)
8 Libertongo  (9.5)
9 Dawnfall V1.1  (9.5)
10 Rainbow Connection  (9.5)

Top Groups

1 Oxyron  (9.3)
2 Booze Design  (9.3)
3 Performers  (9.3)
4 Censor Design  (9.2)
5 Triad  (9.2)

Top Musicians

1 Jeroen Tel  (9.7)
2 Hein  (9.6)
3 Reyn Ouwehand  (9.6)
4 Rob Hubbard  (9.6)
5 dEViLOCk  (9.6)

Page generated in: 0.19 sec.