| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Nucrunch 0.1
Continuing from the benchmarks WVL posted in Doynamite 1.x:
I dusted off my unfinished nucrunch in December to pack just enough of the second page of Reutastic to give me some workspace for some precalculations. Pity I didn't schedule enough time to pack the entire demo, else it would have been ~90 blocks instead of 190, but I digress. I've spent bits of the past month cleaning up the code, optimizing the packer (mostly by porting it from python to rust :P), and adding reverse direction support.
It's still no more than a component, with an commandline packer and asm decrunch subroutine, but no tools yet for generating an executable from a single commandline. It does at least now support multiple input segments that are unpacked to their destination addresses, and it's also now useable enough to for me to do some benchmarking.
In short, doynamite's ratio looks pretty unbeatable for anything lz based; my ratio's almost identical despite a somewhat different encoding.
Where I can win though is speed at that ratio; nucrunch is usually ten to twenty percent faster. The one exception in the test corpus is 6.bin, where it's 20% slower; not sure why yet.
I've added the times for pucrunch -ffast below for to complete the comparison. Last two columns are nucrunch, and nucrunch -r (the latter decodes in reverse; should be a more useful component for single filers)
If anyone wants to have a play at this stage, poke me and I'll upload some source. Failing that I'll hold off until I at least have something that can make onefilers without any faffing about with relocating the last couple of pages by hand.
filesizes
# bin rle wvl-f wvl-s tc bb pu-f doyna nucru rnucr
- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
1 11008 8020 4529 4151 4329 3383 3711 3265 3225 3230
2 4973 4314 3532 3309 3423 2648 3005 2512 2498 2490
3 3949 3498 2991 2617 2972 2187 2530 2108 2091 2093
4 7016 6456 4242 4085 4225 3681 3924 3617 3622 3614
5 34760 27647 25781 24895 25210 21306 21182 20405 20447 20516
6 31605 12511 11283 10923 11614 9194 9203 8904 8915 8894
7 20392 17295 12108 11285 11445 9627 9789 9289 9140 9144
8 5713 5407 4179 3916 3936 3251 3656 3132 3165 3187
9 8960 7986 6914 6896 6572 5586 6000 5430 5502 5486
filesize in %
# bin rle wvl-f wvl-s tc bb pu-f doyna nucru rnucr
- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
1 100 72.9 41.1 37.7 39.3 30.7 33.7 29.7 29.3 29.3
2 100 86.7 71.0 66.5 68.8 53.2 60.4 50.5 50.2 50.1
3 100 88.6 75.7 66.3 75.3 55.4 64.1 53.4 53.0 53.0
4 100 92.0 60.5 58.2 60.2 52.5 55.9 51.6 51.6 51.5
5 100 79.5 74.2 71.6 72.5 61.3 60.9 58.7 58.8 59.0
6 100 39.6 35.7 34.6 36.7 29.1 29.1 28.2 28.2 28.1
7 100 84.8 59.4 55.3 56.1 47.2 48.0 45.6 44.8 44.8
8 100 94.6 73.1 68.5 68.9 56.9 64.0 54.8 55.4 55.8
9 100 89.1 77.2 77.0 73.3 62.3 67.0 60.6 61.4 61.2
number of frames to depack
# bin rle wvl-f wvl-s tc bb pu-f doyna nucru rnucr
- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
1 0 11 13 14 15 58 54 27 22 22
2 0 5 7 7 9 38 39 17 14 14
3 0 4 6 6 7 28 31 12 10 10
4 0 8 9 9 10 43 51 20 17 18
5 0 36 39 42 59 300 298 119 104 107
6 0 20 25 25 37 126 152 49 59 59
7 0 22 25 26 32 138 139 60 51 52
8 0 6 8 8 10 43 47 18 16 17
9 0 9 12 12 16 73 81 32 28 29
kilobytes output per second
# bin rle wvl-f wvl-s tc bb pu-f doyna nucru rnucr
- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
1 49.0 41.4 38.5 35.9 9.3 10.0 20.0 24.5 24.5
2 48.7 34.8 34.8 27.0 6.4 6.2 14.3 17.4 17.4
3 48.3 32.2 32.2 27.6 6.9 6.2 16.1 19.3 19.3
4 42.9 38.2 38.2 34.3 8.0 6.7 17.2 20.2 19.1
5 47.3 43.6 40.5 28.8 5.7 5.7 14.3 16.4 15.9
6 77.4 61.9 61.9 41.8 12.3 10.2 31.6 26.2 26.2
7 45.4 39.9 38.4 31.2 7.2 7.2 16.6 19.6 19.2
8 46.6 35.0 35.0 28.0 6.5 6.0 15.5 17.5 16.5
9 48.7 36.5 36.5 27.4 6.0 5.4 13.7 15.7 15.1
cycles per byte consumed
# bin rle wvl-f wvl-s tc bb pu-f doyna nucru rnucr
- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
1 0 27 56 66 68 337 286 163 134 134
2 0 23 39 42 52 282 255 133 110 111
3 0 22 39 45 46 252 241 112 94 94
4 0 24 42 43 47 230 255 109 92 98
5 0 26 30 33 46 277 277 115 100 103
6 0 31 44 45 63 269 325 108 130 130
7 0 25 41 45 55 282 279 127 110 112
8 0 22 38 40 50 260 253 113 99 105
9 0 22 34 34 48 257 265 116 100 104
decrunch time for nucrunch/rnucrunch relative to doynamite
1: 81.5% (-18.5%) 81.5% (-18.5%)
2: 82.4% (-17.6%) 82.4% (-17.6%)
3: 83.3% (-16.7%) 83.3% (-16.7%)
4: 85.0% (-15.0%) 90.0% (-10.0%)
5: 87.4% (-12.6%) 89.9% (-10.1%)
6: 120.4% ( 20.4%) 120.4% ( 20.4%)
7: 85.0% (-15.0%) 86.7% (-13.3%)
8: 88.9% (-11.1%) 94.4% ( -5.6%)
9: 87.5% (-12.5%) 90.6% ( -9.4%)
|
|
... 95 posts hidden. Click here to view all posts.... |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: Is this thread just being hijacked!?
Oswald 'The Hijacker' |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Quoting algorithmActually should be able to do a delta every 4 lines..
Set up d800-d827 before the first badline..
1
when the badline occurs, do the delta update for the next 4 lines
force second badline 4 pixels down. at this stage, the updates to d800-d827 will occur
then write the delta to change back the values to how they were originally at d828-d84f
repeat from 1
Exactly what I did. Still waiting on art from someone, or you'd have seen a release by now. |
| |
MagerValp
Registered: Dec 2001 Posts: 1078 |
Quote: I suppose you have a scratchpad somewhere in memory, like in the decruncher code to handle it then?
No, no extra memory, that's the beauty of it, only the end address or a counter. I'll sketch something out to show how it works, there might be a corner case that I'm not thinking of. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
But wouldn't that only work if we end with a match then? As long as the last thing to do is copying a literal this would still fail as with packers like Doynax, BB and similar, as one would have some control stream bits/bytes before that to signal a literal and give a length. If the following literal bytes would not overlap, the control bits would have been overwritten by the preceeding action and things would fail. In case things end with a match, all would be fine, only the copying of the match would then kill the already read control bits and the decruncher would end afterwards. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Update time!!
All results from above combined, as well as some new ones
Thanks MagerValp for LZMV* & exo*, and Bitbreaker for Bitfire.
Also thanks to HCL for his patience, I've finally switched to the fast decompressor for BB.
Columns sorted by output rate (guessed positions for LZMPi and exoraw, the only ones I don't yet have times for):
filesizes
# bin rle wvl-f wvl-s LZMV25 tc LZMV4k rnucru nucrun bb2.0 bitfir doynax LZMPi exomem pu-f exoraw
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
1 11008 8020 4529 4151 4539 4329 4205 3230 3225 3322 3324 3265 3184 2988 3711 2988
2 4973 4314 3532 3309 3575 3423 3183 2490 2498 2513 2515 2512 2410 2225 3005 2241
3 3949 3498 2991 2617 3018 2972 2551 2093 2091 2098 2097 2108 1931 1808 2530 1817
4 7016 6456 4242 4085 4314 4225 4343 3614 3622 3682 3682 3617 3571 3442 3924 3454
5 34760 27647 25781 24895 26116 25210 23845 20516 20447 20530 20531 20405 20362 19715 21182 19631
6 31605 12511 11283 10923 11352 11614 10619 8896 8915 8998 9004 8904 8719 8322 9203 8337
7 20392 17295 12108 11285 12188 11445 11154 9145 9140 9241 9242 9289 9256 8765 9789 8751
8 5713 5407 4179 3916 3987 3936 3959 3187 3166 3165 3162 3132 3048 3081 3656 3059
9 8960 7986 6914 6896 6943 6572 6505 5487 5502 5491 5491 5430 5563 5304 6000 5295
filesize in %
# bin rle wvl-f wvl-s LZMV25 tc LZMV4k rnucru nucrun bb2.0 bitfir doynax LZMPi exomem pu-f exoraw
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
1 100.0 72.9 41.1 37.7 41.2 39.3 38.2 29.3 29.3 30.2 30.2 29.7 28.9 27.1 33.7 27.1
2 100.0 86.7 71.0 66.5 71.9 68.8 64.0 50.1 50.2 50.5 50.6 50.5 48.5 44.7 60.4 45.1
3 100.0 88.6 75.7 66.3 76.4 75.3 64.6 53.0 53.0 53.1 53.1 53.4 48.9 45.8 64.1 46.0
4 100.0 92.0 60.5 58.2 61.5 60.2 61.9 51.5 51.6 52.5 52.5 51.6 50.9 49.1 55.9 49.2
5 100.0 79.5 74.2 71.6 75.1 72.5 68.6 59.0 58.8 59.1 59.1 58.7 58.6 56.7 60.9 56.5
6 100.0 39.6 35.7 34.6 35.9 36.7 33.6 28.1 28.2 28.5 28.5 28.2 27.6 26.3 29.1 26.4
7 100.0 84.8 59.4 55.3 59.8 56.1 54.7 44.8 44.8 45.3 45.3 45.6 45.4 43.0 48.0 42.9
8 100.0 94.6 73.1 68.5 69.8 68.9 69.3 55.8 55.4 55.4 55.3 54.8 53.4 53.9 64.0 53.5
9 100.0 89.1 77.2 77.0 77.5 73.3 72.6 61.2 61.4 61.3 61.3 60.6 62.1 59.2 67.0 59.1
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
~ 100.0 80.9 63.1 59.5 63.2 61.3 58.6 48.1 48.1 48.4 48.4 48.1 47.1 45.1 53.7 45.1
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
number of frames to depack
# bin rle wvl-f wvl-s LZMV25 tc LZMV4k rnucru nucrun bb2.0 bitfir doynax LZMPi exomem pu-f exoraw
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
1 0.0 11.5 13.5 14.5 13.7 15.5 16.9 21.3 22.0 24.3 24.4 27.5 0.0 57.9 54.7 0.0
2 0.0 5.5 7.5 7.5 8.4 9.5 9.6 13.6 13.7 15.1 15.1 17.5 0.0 38.5 39.6 0.0
3 0.0 4.5 6.5 6.5 6.2 7.5 7.1 10.1 10.0 10.9 10.9 12.5 0.0 28.5 31.9 0.0
4 0.0 8.5 9.5 9.5 9.4 10.5 11.5 17.1 16.6 18.1 18.1 20.5 0.0 53.1 52.0 0.0
5 0.0 36.5 39.5 42.5 46.6 59.5 58.4 100.6 101.4 107.7 107.4 119.5 0.0 295.9 298.6 0.0
6 0.0 20.5 25.5 25.5 38.2 37.5 35.8 56.6 57.9 61.5 62.0 49.5 0.0 142.3 152.8 0.0
7 0.0 22.5 25.5 26.5 29.1 32.5 35.0 50.0 50.3 54.5 54.5 60.5 0.0 139.2 139.8 0.0
8 0.0 6.5 8.5 8.5 8.9 10.5 10.3 15.8 15.6 16.9 17.0 18.5 0.0 44.8 47.7 0.0
9 0.0 9.5 12.5 12.5 14.2 16.5 16.6 27.2 27.1 29.6 29.4 32.5 0.0 78.9 81.7 0.0
kilobytes output per second
# bin rle wvl-f wvl-s LZMV25 tc LZMV4k rnucru nucrun bb2.0 bitfir doynax LZMPi exomem pu-f exoraw
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
1 46.9 39.9 37.2 39.3 34.8 31.9 25.3 24.5 22.2 22.1 19.6 9.3 9.9
2 44.3 32.5 32.5 29.0 25.6 25.4 17.9 17.7 16.1 16.1 13.9 6.3 6.1
3 43.0 29.7 29.7 31.2 25.8 27.2 19.2 19.4 17.7 17.8 15.5 6.8 6.1
4 40.4 36.2 36.2 36.5 32.7 29.9 20.1 20.6 18.9 19.0 16.8 6.5 6.6
5 46.6 43.1 40.0 36.5 28.6 29.1 16.9 16.8 15.8 15.8 14.2 5.8 5.7
6 75.5 60.7 60.7 40.5 41.3 43.2 27.3 26.7 25.2 25.0 31.3 10.9 10.1
7 44.4 39.1 37.7 34.3 30.7 28.5 20.0 19.8 18.3 18.3 16.5 7.2 7.1
8 43.0 32.9 32.9 31.4 26.6 27.2 17.7 18.0 16.5 16.4 15.1 6.2 5.9
9 46.2 35.1 35.1 30.9 26.6 26.4 16.1 16.2 14.8 14.9 13.5 5.6 5.4
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
~ 0.0 47.8 38.8 38.0 34.4 30.3 29.9 20.1 20.0 18.4 18.4 17.4 0.0 7.2 7.0 0.0
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
cycles per byte consumed
# bin rle wvl-f wvl-s LZMV25 tc LZMV4k rnucru nucrun bb2.0 bitfir doynax LZMPi exomem pu-f exoraw
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
1 0.0 28.2 58.6 68.7 59.3 70.4 79.0 129.4 134.0 143.6 144.1 165.6 0.0 380.9 289.7 0.0
2 0.0 25.1 41.7 44.6 46.2 54.6 59.3 107.5 108.0 118.4 118.4 136.9 0.0 340.1 259.3 0.0
3 0.0 25.3 42.7 48.8 40.4 49.6 54.7 94.6 93.8 102.4 102.0 116.6 0.0 309.8 247.9 0.0
4 0.0 25.9 44.0 45.7 42.8 48.8 52.0 93.1 90.3 96.8 96.4 111.4 0.0 303.2 260.4 0.0
5 0.0 26.0 30.1 33.6 35.1 46.4 48.1 96.4 97.5 103.1 102.8 115.1 0.0 295.0 277.1 0.0
6 0.0 32.2 44.4 45.9 66.1 63.5 66.3 125.1 127.7 134.3 135.4 109.3 0.0 336.1 326.3 0.0
7 0.0 25.6 41.4 46.2 46.9 55.8 61.7 107.4 108.2 115.9 116.0 128.0 0.0 312.2 280.7 0.0
8 0.0 23.6 40.0 42.7 43.9 52.4 51.1 97.2 96.6 105.2 105.9 116.1 0.0 285.8 256.4 0.0
9 0.0 23.4 35.5 35.6 40.2 49.3 50.2 97.4 96.9 105.9 105.4 117.6 0.0 292.4 267.7 0.0
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
|
| |
MagerValp
Registered: Dec 2001 Posts: 1078 |
Quote: But wouldn't that only work if we end with a match then? As long as the last thing to do is copying a literal this would still fail as with packers like Doynax, BB and similar, as one would have some control stream bits/bytes before that to signal a literal and give a length. If the following literal bytes would not overlap, the control bits would have been overwritten by the preceeding action and things would fail. In case things end with a match, all would be fine, only the copying of the match would then kill the already read control bits and the decruncher would end afterwards.
Yes, that's exactly the case. The trick is to ensure that the compressed stream ends with a match. If it ends with a literal, include the literal data without any control bytes at the correct location. Illustration here:
https://www.dropbox.com/s/9zttmj3avjm6krl/LZMV%20in%20place%20u..
Note that it packs backwards, so the start is at the top of memory and the ending literal "!" in the second example is at the bottom. Unpacking backwards enables dey/bpl copy loops for speed and lets you keep the load address. Speed for this should be on par with WVL's packers. |
| |
MagerValp
Registered: Dec 2001 Posts: 1078 |
@ChristopherJam: Lovely! The pattern I see here is that there are three speed classes. You have the fast LZSS variants (WVL, LZMV), you have the efficient but slow LZ + Huffman (exo, pu), and in the middle you have nu/bb2/bitfire/doynax.
It also makes it obvious that LZMV (and maybe tc) needs speed optimization to be competitive :) |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Thanks MagerValp, that was the tiny bit of info that was missing, in deed one can leave the last literal just there without any more control bits :-D As an extra, when we are guaranteed to end with a match, it is also sufficient to do the end check on matches only, as before with that maximum run as an eof marker. It will however add a few more bytes to the decruncher, but speedwise it would be mostly as fast as a tay + beq as eof detection by having a cmp + beq (in most cases that is, as we would require to check on 16 bit in case)
I'll give this a try the next days :-) |
| |
MagerValp
Registered: Dec 2001 Posts: 1078 |
Go for it! :)
And Excel just crashed on me while I was trying to make a pretty scatter plot of the results :/ |
| |
Martin Piper
Registered: Nov 2007 Posts: 726 |
Very slow LZMPi decompression cycles:
1.bin 1558777
2.bin 1044272
3.bin 813537
4.bin 1482657
5.bin 8471668
6.bin 4323352
7.bin 4044852
8.bin 1278691
9.bin 2307100
Hey, it's optimised for space not speed. :D |
Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 - Next |