| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Doynamite 1.x
Hi Folx,
after doynamite was used in some recent productions and people often stumbled over the .prg/.bin pitfall i decided to make some improvements to the packer, it can now spit out a sfx, level-packed data including a valid load-address and depack-address, as well as forward literals to keep the safety margin low. Raw data can still be loaded and output without any bytes added. Also the optimal bitlengths can be iterated now and the optimal table be glued to the output file.
I also happend to make a leaner version that lets the files get slightly bigger, but shrinks the depacker to $e0 bytes and makes depacking 5-10% faster. This might be of good use for demo systems where size matters a lot.
Any more things one could wish? |
|
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
that a very nice job you've done there. I'd only ask for a win exe, 64tass src depackers & a readme.txt :) |
| |
chatGPZ
Registered: Dec 2001 Posts: 11386 |
acme and source only it is right now - which is perfect =) and friggin fast <3 |
| |
soci
Registered: Sep 2003 Posts: 480 |
Quote: that a very nice job you've done there. I'd only ask for a win exe, 64tass src depackers & a readme.txt :)
All of these were there, or did I look at the wrong release? |
| |
chatGPZ
Registered: Dec 2001 Posts: 11386 |
its not yet released :) |
| |
Burglar
Registered: Dec 2004 Posts: 1101 |
some benchmarks would be nice (in terms of pack ratio and depack time), especially compared to others. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
The .asm source is done in a format that is very easy to adopt to other assemblers i'd say, if people hurt themselves on that, it is not my fault :-) I'll not include any .exe as the source is plattform-independent (got to test that with those changes however) anyway.
I should do some extra documentation on the bunch of switches being added though.
As for benchmarks, i'll dig some numbers out on monday, Axis did a comparison with at least ByteBoozer on C13 (benchmark to load/depack the whole demo). I know that doynamite beats ByteBoozer in speed and pack ratio, but can't remember details. I'll tweak that benchmark to have also numbers at hand for the new features.
Are switches like $01-value sei/cli for the sfx still necessary in our nowadays world? And is it okay to destroy the zeropage to some extent with the depacker by placing code there? Never gained any experience on sfx-depackers yet. |
| |
WVL
Registered: Mar 2002 Posts: 902 |
@Enno :
I did just that for the original Doynamite packer. I copied this from my comments for LZWVL :
As you can see I was too lazy to measure how much time Pucrunch took to decrunch, but I'm betting it would be slower than Byteboozer any day ;)
Here's some more hard data :
filesizes
file bin rle lzwvl-f lzwvl-s bb pu doynax
1 11008 8020 4529 4151 3383 3410 3265
2 4973 4314 3532 3309 2648 2687 2512
3 3949 3498 2991 2617 2187 2226 2108
4 7016 6456 4242 4085 3681 3595 3617
5 34760 27647 25781 24895 21306 20887 20405
6 31605 12511 11283 10923 9194 8877 8904
7 20392 17295 12108 11285 9627 9460 9289
8 5713 5407 4179 3916 3251 3314 3132
9 8960 7986 6914 6896 5586 5651 5430
filesize in %
1 100% 73% 41% 38% 31% 31% 30%
2 100% 87% 71% 67% 53% 54% 51%
3 100% 89% 76% 66% 55% 56% 53%
4 100% 92% 60% 58% 52% 51% 52%
5 100% 80% 74% 72% 61% 60% 59%
6 100% 40% 36% 35% 29% 28% 28%
7 100% 85% 59% 55% 47% 46% 46%
8 100% 95% 73% 69% 57% 58% 55%
9 100% 89% 77% 77% 62% 63% 61%
#frames to depack
1 11 13 14 58 27
2 5 7 7 38 17
3 4 6 6 28 12
4 8 9 9 43 20
5 36 39 42 300 119
6 20 25 25 126 49
7 22 25 26 138 60
8 6 8 8 43 18
9 9 12 12 73 32 |
| |
chatGPZ
Registered: Dec 2001 Posts: 11386 |
you guys should spend more time coding demoparts than bothering about depack times, its friggin fast, what else do you need to know? =P |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Going by the numbers BB is faster than doynax? But going by the graph doynax is faster than BB? So what's correct now? :-) And can i have the testfiles so i have compareable results in future? :-) Thx.
Edit: ah, the table is missing the doynax row! :-) |
| |
Burglar
Registered: Dec 2004 Posts: 1101 |
and wvl's test is using a pretty old doynax version. so what does the current version look like? ;) |
| |
WVL
Registered: Mar 2002 Posts: 902 |
the last table is missing the pu column.(didnt waste time testing pu, since we all know it is slow depacking and byteboozer and doynax had better or equal compression on all the files anyway).
So for the last file (file 9), rle takes 9 frames to depack, lzwvl takes 9 (fast mode) or 12 (slow mode), byteboozer takes 73 and doynax took 32 frames.
Anyway, based on the old version : compression was always better than byteboozer and always faster to decompress than byteboozer.
-> when you need small files, go for doynamite or exomizer (but be prepared for really slow decrunch). You can forget about pu or byteboozer.
-> if you need faster depack than doynamite, then go for LZWVL in slow mode or fast mode. |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
this all should be examined combined with loading speed, wether the faster depack worths the bigger file or not ;) |
| |
Testa Account closed
Registered: Oct 2004 Posts: 197 |
thanks wvl!!!, very informative.. |
| |
QuasaR
Registered: Dec 2001 Posts: 145 |
Thanx WVL for all the work. What I'm missing is Exomizer or is it even more slower than Pucrunch? In my testings it was nearly as good as Pucrunch at compressing but much faster. |
| |
Burglar
Registered: Dec 2004 Posts: 1101 |
exomizer will beat all in compression (except maybe on tiny files), it will also beat all depacking time, it's the slowest of all. |
| |
Urban Space Cowboy
Registered: Nov 2004 Posts: 45 |
At least with Exomizer you can set custom decrunch effects to help pass the time. Cruncher AB nostalgists try this: -s"lsr $d011" -x"and #$05 sta $d020" -f"rol $d011"
Here's WVL's results table again, this time readable:
filesizes
# bin rle wvl-f wvl-s bb pu doyna
- ----- ----- ----- ----- ----- ----- -----
1 11008 8020 4529 4151 3383 3410 3265
2 4973 4314 3532 3309 2648 2687 2512
3 3949 3498 2991 2617 2187 2226 2108
4 7016 6456 4242 4085 3681 3595 3617
5 34760 27647 25781 24895 21306 20887 20405
6 31605 12511 11283 10923 9194 8877 8904
7 20392 17295 12108 11285 9627 9460 9289
8 5713 5407 4179 3916 3251 3314 3132
9 8960 7986 6914 6896 5586 5651 5430
filesize in %
# bin rle wvl-f wvl-s bb pu doyna
- ----- ----- ----- ----- ----- ----- -----
1 100% 73% 41% 38% 31% 31% 30%
2 100% 87% 71% 67% 53% 54% 51%
3 100% 89% 76% 66% 55% 56% 53%
4 100% 92% 60% 58% 52% 51% 52%
5 100% 80% 74% 72% 61% 60% 59%
6 100% 40% 36% 35% 29% 28% 28%
7 100% 85% 59% 55% 47% 46% 46%
8 100% 95% 73% 69% 57% 58% 55%
9 100% 89% 77% 77% 62% 63% 61%
number of frames to depack
# bin rle wvl-f wvl-s bb pu doyna
- ----- ----- ----- ----- ----- ----- -----
1 11 13 14 58 27
2 5 7 7 38 17
3 4 6 6 28 12
4 8 9 9 43 20
5 36 39 42 300 119
6 20 25 25 126 49
7 22 25 26 138 60
8 6 8 8 43 18
9 9 12 12 73 32 |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Is the Pearls for Pigs corpus downloadable anywhere?
I'd quite like to see how the simplistic packer I put together for Jam Ball 2 fares with it. Compression factor's not the best, but the depacker's less than $80 bytes long; no idea how speed compares. |
| |
WVL
Registered: Mar 2002 Posts: 902 |
I'll try to dig up the files I used and upload them. They're not really what you would call normal testfiles, but I think they're a pretty good average of stuff you'd find in a demo (graphics, tables, code, etc..) |
| |
HCL
Registered: Feb 2003 Posts: 728 |
Hmm.. are you guys saying that byte-boozer decrunching is slow? Perhaps i could do something about it. I assume these values are already using the "optimized" version of the decruncher that has inlined getter of the code-bytes..
Apart from that, i think you guys should spend a minute or two on linking instead. It's quite easy to make 5-10 seconds of black empty screen feel like nothing if you just have something on the screen moving or doing some kind of easy effect. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Doynamite 1.1
Here you can go with new tests. I'd be interested in suggestions on the simple/ folder. Any further ideas on how to make this smaller and faster? The shifting shit is still eating so much cycles :-( |
| |
chatGPZ
Registered: Dec 2001 Posts: 11386 |
just remove the zeros! |
| |
ruk
Registered: Jan 2012 Posts: 43 |
Quote: Hmm.. are you guys saying that byte-boozer decrunching is slow? Perhaps i could do something about it. I assume these values are already using the "optimized" version of the decruncher that has inlined getter of the code-bytes..
Apart from that, i think you guys should spend a minute or two on linking instead. It's quite easy to make 5-10 seconds of black empty screen feel like nothing if you just have something on the screen moving or doing some kind of easy effect.
Yes, what HCL says. "Dead air" is really boring, but easily remedied. And while a fast depacker surely helps, it is not the only way.
We've successfully used the "oh-no-so-dawg-slow" Exomizer for all our parts in Revolved, Solaris and Continuum without any noticeable pauses. In those cases where we couldn't load and unpack while a part was running, some small rasterbar effect or similar filler-part was inserted. |
| |
algorithm
Registered: May 2002 Posts: 705 |
Yes, even if exomizer is a lot slower, usually, it only requires a few extra seconds of 'demo' effect before its done its job. Main use for the faster depacking would probably be if requiring fast load and depack (some type of chunk based streaming) that is compressed finally with a packed that depacks fast |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
It is also useful if you want to achieve a higher pace than offence :-) But one could of course say it is art and intended to be slow :-) |
| |
Urban Space Cowboy
Registered: Nov 2004 Posts: 45 |
Quoting ChristopherJamIs the Pearls for Pigs corpus downloadable anywhere? It's included in LZWVL, the file "bin.rar". |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
its easy to talk about it, but in reality a fast paced demo needs blood,tears, sweat and a human sacrifice |
| |
WVL
Registered: Mar 2002 Posts: 902 |
Quote: Quoting ChristopherJamIs the Pearls for Pigs corpus downloadable anywhere? It's included in LZWVL, the file "bin.rar".
W00t! I totally forgot that :-D
Those files are nice collection of 'real' data. Some music, some code, some graphics, tables, etc etc :-) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Thank you. And yes, it sounds like a pretty representative dataset for what packers face in practice. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
..and as for things to do while performing a slow decrunch, loader games anyone? ;) |
| |
WVL
Registered: Mar 2002 Posts: 902 |
So.. how did your packed perform then? :-) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Not quite as good as WVL-S! Here are the results with tinycrunch added:
filesizes
# bin rle wvl-f wvl-s tc bb pu doyna
- ----- ----- ----- ----- ----- ----- ----- -----
1 11008 8020 4529 4151 4329 3383 3410 3265
2 4973 4314 3532 3309 3423 2648 2687 2512
3 3949 3498 2991 2617 2972 2187 2226 2108
4 7016 6456 4242 4085 4225 3681 3595 3617
5 34760 27647 25781 24895 25210 21306 20887 20405
6 31605 12511 11283 10923 11614 9194 8877 8904
7 20392 17295 12108 11285 11445 9627 9460 9289
8 5713 5407 4179 3916 3936 3251 3314 3132
9 8960 7986 6914 6896 6572 5586 5651 5430
filesize in %
# bin rle wvl-f wvl-s tc bb pu doyna
- ----- ----- ----- ----- --- ----- ----- -----
1 100% 73% 41% 38% 39% 31% 31% 30%
2 100% 87% 71% 67% 69% 53% 54% 51%
3 100% 89% 76% 66% 75% 55% 56% 53%
4 100% 92% 60% 58% 60% 52% 51% 52%
5 100% 80% 74% 72% 73% 61% 60% 59%
6 100% 40% 36% 35% 37% 29% 28% 28%
7 100% 85% 59% 55% 56% 47% 46% 46%
8 100% 95% 73% 69% 69% 57% 58% 55%
9 100% 89% 77% 77% 73% 62% 63% 61%
number of frames to depack
# bin rle wvl-f wvl-s tc bb pu doyna
- ----- ----- ----- ----- ----- ----- ----- -----
1 11 13 14 15 58 27
2 5 7 7 9 38 17
3 4 6 6 7 28 12
4 8 9 9 10 43 20
5 36 39 42 59 300 119
6 20 25 25 37 126 49
7 22 25 26 32 138 60
8 6 8 8 10 43 18
9 9 12 12 16 73 32
As you can see, my sizes are always bracketed by WVL-F and WVL-S, and my decompression speed is two thirds of yours for 6.bin.
(crunching the entire corpus took 7 seconds on a single core of a 3GHz i7. It's a fairly slack python script, I put all recent substrings in a dict. The parameters are tuned for the lower entropy components of JamBall2, I might be able to improve the ratio if I play with it a bit). |
| |
WVL
Registered: Mar 2002 Posts: 902 |
Score \o/ |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Yes, well done!
All I can say in my defence is my decoder's even smaller than yours ;) |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
A bad ratio still spoils the fun, no matter how fast and tiny the depackers get. Here's some benchmark results for loading + depacking the first side (well most of, not those 2 going under IO) of CL13:
bb hclfix $0ac4
lzwvl $0a08
doynax $08a9
doynax_small $08e8
doynax_small loaddecomp $0749
So the additional loading overhead kills all the speed advantage. Loading and decompressing in one go gives the best results, but bloats the code a lot. The spindle system suffers from the same problem, a bad ratio due to having references in only one block. It feels still fast though and i get testfiles loaded around the same speed as with loaddecomp (is there a framecounter available in spindle to proof the feeling?) |
| |
HCL
Registered: Feb 2003 Posts: 728 |
Thanx, now i will not waste my time ;). *this* will not keep me from winning any compo in the future, though perhaps other things will :P. |
| |
enthusi
Registered: May 2004 Posts: 677 |
I get the feeling that it really doesnt matter much :)
Rather for crackers and onefilers or one-siders maybe? :) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Well, the main reason I was optimising for decoder size with tinycrunch was because that was all I had room for once the demo was decoded. It's admittedly a fairly special case, it's not often I'm scraping for every last fraction of a page.
Even the music data was interleaved into unused fragments of the character definitions (only 5 bytes of every 8 were visible).
Agreed that total time for load+decrunch is usually more significant, except in cases where you can background load into some free space, but then need to quickly decrunch between ending one part and starting the next. |