Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in 
CSDb User Forums


Forums > C64 Coding > Nucrunch 0.1
2016-02-04 11:02
ChristopherJam

Registered: Aug 2004
Posts: 382
Nucrunch 0.1

Continuing from the benchmarks WVL posted in Doynamite 1.x:

I dusted off my unfinished nucrunch in December to pack just enough of the second page of Reutastic to give me some workspace for some precalculations. Pity I didn't schedule enough time to pack the entire demo, else it would have been ~90 blocks instead of 190, but I digress. I've spent bits of the past month cleaning up the code, optimizing the packer (mostly by porting it from python to rust :P), and adding reverse direction support.

It's still no more than a component, with an commandline packer and asm decrunch subroutine, but no tools yet for generating an executable from a single commandline. It does at least now support multiple input segments that are unpacked to their destination addresses, and it's also now useable enough to for me to do some benchmarking.

In short, doynamite's ratio looks pretty unbeatable for anything lz based; my ratio's almost identical despite a somewhat different encoding.

Where I can win though is speed at that ratio; nucrunch is usually ten to twenty percent faster. The one exception in the test corpus is 6.bin, where it's 20% slower; not sure why yet.

I've added the times for pucrunch -ffast below for to complete the comparison. Last two columns are nucrunch, and nucrunch -r (the latter decodes in reverse; should be a more useful component for single filers)

If anyone wants to have a play at this stage, poke me and I'll upload some source. Failing that I'll hold off until I at least have something that can make onefilers without any faffing about with relocating the last couple of pages by hand.

filesizes
#   bin   rle wvl-f wvl-s    tc    bb  pu-f doyna nucru rnucr
- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
1 11008  8020  4529  4151  4329  3383  3711  3265  3225  3230
2  4973  4314  3532  3309  3423  2648  3005  2512  2498  2490
3  3949  3498  2991  2617  2972  2187  2530  2108  2091  2093
4  7016  6456  4242  4085  4225  3681  3924  3617  3622  3614
5 34760 27647 25781 24895 25210 21306 21182 20405 20447 20516
6 31605 12511 11283 10923 11614  9194  9203  8904  8915  8894
7 20392 17295 12108 11285 11445  9627  9789  9289  9140  9144
8  5713  5407  4179  3916  3936  3251  3656  3132  3165  3187
9  8960  7986  6914  6896  6572  5586  6000  5430  5502  5486

filesize in %
#   bin   rle wvl-f wvl-s    tc    bb  pu-f doyna nucru rnucr
- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
1   100  72.9  41.1  37.7  39.3  30.7  33.7  29.7  29.3  29.3
2   100  86.7  71.0  66.5  68.8  53.2  60.4  50.5  50.2  50.1
3   100  88.6  75.7  66.3  75.3  55.4  64.1  53.4  53.0  53.0
4   100  92.0  60.5  58.2  60.2  52.5  55.9  51.6  51.6  51.5
5   100  79.5  74.2  71.6  72.5  61.3  60.9  58.7  58.8  59.0
6   100  39.6  35.7  34.6  36.7  29.1  29.1  28.2  28.2  28.1
7   100  84.8  59.4  55.3  56.1  47.2  48.0  45.6  44.8  44.8
8   100  94.6  73.1  68.5  68.9  56.9  64.0  54.8  55.4  55.8
9   100  89.1  77.2  77.0  73.3  62.3  67.0  60.6  61.4  61.2

number of frames to depack
#   bin   rle wvl-f wvl-s    tc    bb  pu-f doyna nucru rnucr
- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
1     0    11    13    14    15    58    54    27    22    22
2     0     5     7     7     9    38    39    17    14    14
3     0     4     6     6     7    28    31    12    10    10
4     0     8     9     9    10    43    51    20    17    18
5     0    36    39    42    59   300   298   119   104   107
6     0    20    25    25    37   126   152    49    59    59
7     0    22    25    26    32   138   139    60    51    52
8     0     6     8     8    10    43    47    18    16    17
9     0     9    12    12    16    73    81    32    28    29

kilobytes output per second
#   bin   rle wvl-f wvl-s    tc    bb  pu-f doyna nucru rnucr
- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
1        49.0  41.4  38.5  35.9   9.3  10.0  20.0  24.5  24.5
2        48.7  34.8  34.8  27.0   6.4   6.2  14.3  17.4  17.4
3        48.3  32.2  32.2  27.6   6.9   6.2  16.1  19.3  19.3
4        42.9  38.2  38.2  34.3   8.0   6.7  17.2  20.2  19.1
5        47.3  43.6  40.5  28.8   5.7   5.7  14.3  16.4  15.9
6        77.4  61.9  61.9  41.8  12.3  10.2  31.6  26.2  26.2
7        45.4  39.9  38.4  31.2   7.2   7.2  16.6  19.6  19.2
8        46.6  35.0  35.0  28.0   6.5   6.0  15.5  17.5  16.5
9        48.7  36.5  36.5  27.4   6.0   5.4  13.7  15.7  15.1

cycles per byte consumed
#   bin   rle wvl-f wvl-s    tc    bb  pu-f doyna nucru rnucr
- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
1     0    27    56    66    68   337   286   163   134   134
2     0    23    39    42    52   282   255   133   110   111
3     0    22    39    45    46   252   241   112    94    94
4     0    24    42    43    47   230   255   109    92    98
5     0    26    30    33    46   277   277   115   100   103
6     0    31    44    45    63   269   325   108   130   130
7     0    25    41    45    55   282   279   127   110   112
8     0    22    38    40    50   260   253   113    99   105
9     0    22    34    34    48   257   265   116   100   104

decrunch time for nucrunch/rnucrunch relative to doynamite
1:  81.5% (-18.5%)  81.5% (-18.5%)
2:  82.4% (-17.6%)  82.4% (-17.6%)
3:  83.3% (-16.7%)  83.3% (-16.7%)
4:  85.0% (-15.0%)  90.0% (-10.0%)
5:  87.4% (-12.6%)  89.9% (-10.1%)
6: 120.4% ( 20.4%) 120.4% ( 20.4%)
7:  85.0% (-15.0%)  86.7% (-13.3%)
8:  88.9% (-11.1%)  94.4% ( -5.6%)
9:  87.5% (-12.5%)  90.6% ( -9.4%)
 
... 92 posts hidden. Click here to view all posts....
 
2016-02-15 13:24
enthusi

Registered: May 2004
Posts: 615
The cruncher of choice obviously depends on the environment as well. In particular the speed with which the data is loaded into memory. Which fastloader, tape, cart...
Maybe someone adds certain loading-bitrates to the graphs? ;-)
TurboTape is ~ 450 Byte/sec.
1541 ROM: ~ 410 Byte/sec
Cart (lda $8000,x; sta $0800,x) ~ 160 KB/sec

(i.e. for Caren I now used page aligned ,x loops of RAW data which is still faster than RLE with byte-at-boundary-check. A dedicated RLE that never crosses banked in pages should be slighly faster)
2016-02-15 14:09
Bitbreaker

Registered: Oct 2002
Posts: 400
Quote: Well yes, I was assuming only checking in case of a match… Intriguing that end addr test can be done at zero cost in cycles.

The clobbering cases - is it because of the match spec crossing a byte boundary?


Well, it is possible that the read_pointer crosses the write_pointer already before the last literal/match, at least then, when we encode stuff with variable bitlengths. I did a working prototype (more tests pending) to get the cruncher to spit out a final binary blob as soon as this happens. Thus some files tend to become bigger, others where this works out, get usually 2 bytes smaller.
As for the decruncher a few things need to be changed, in my case i can forgo on the terminator check:

        tay
        beq .lz_end_of_file


Therefore i check when i add the match length to the destination pointer:

        tya
        adc .lz_dst
        sta .lz_dst
        bcc .lz_end_low
.lz_maximum
        inc .lz_dst+1
        lda .lz_dst
.lz_end_low
        cmp #$00
.lz_skip_poll
        bne .lz_poll
        lda .lz_dst+1
.lz_end_hi
        cmp #$00
        bne .lz_skip_end
        rts
.lz_poll



However you also can't rely anymore on the crunchers EOF test, so you need to continue loading after the cruncher returns to be sure any remaining binary blob is still loaded if it goes over a block boundary, until the loader terminates with an EOF. The later of course only applies if you depack with on the fly loading.
2016-02-15 14:51
ChristopherJam

Registered: Aug 2004
Posts: 382
Ah of course - you're replacing the EOF check with something of (99.6% of the time) equal cost.

Interesting point about streaming loaders, but surely in that case you'd just be loading to a one page buffer rather than loading to destination address?
2016-02-15 15:44
Bitbreaker

Registered: Oct 2002
Posts: 400
I never used a buffer but was always depacking in place, yet however still with a small overlap.
2016-02-15 20:55
Bitbreaker

Registered: Oct 2002
Posts: 400
Here is a first experimental test. At least my usual benchmark stuff passes, also overlap for normal cases decreased, possible that the the old algorithm did something wrong (or the new does, haha)
As said, highly experimental, hurt yourself at your own risk. One might want to feed more files to the testsuite in the benchmark folder. The checksums + sizes and loadaddresses are not automated yet, some sed wizardry should help out there soon.
Now, with the end address check the non overflowing case on the dst-pointer addition is favoured and thus one cycled saved compared to the standard version.
2016-03-09 12:28
Bitbreaker

Registered: Oct 2002
Posts: 400
There's one problem with the in place decompression arising:
If you happen to have the source data not in place but at some other location in mem, the last literal will not be copied, unless you include it as a literal sequence into the control stream. Then however, if a file ends with a literal, the end check is missed when it is only done upon matches. Means: one has to either add another bogus match which makes the depacked data 2 bytes longer, or test on both cases, what bloats and slows down the depacker. Also The old sentinel could be used for that.
2016-03-09 13:59
ChristopherJam

Registered: Aug 2004
Posts: 382
Well if the source data is not in place but at some other location in mem, then you're not doing in place decompression :P
2016-03-09 14:02
Bitbreaker

Registered: Oct 2002
Posts: 400
Sure, but one might want to use the same decruncher code for both purposes, for e.g. in a demo ;-)
2016-03-09 14:16
ChristopherJam

Registered: Aug 2004
Posts: 382
Fair point!
2016-03-09 14:19
Bitbreaker

Registered: Oct 2002
Posts: 400
For now, i decided to add the check back in, that ends decrunching after a 257 byte match is decoded. This decreases speed slightly, but i have made other optimizations to compensate for more than that.
Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
iceout/Avatar
Moloch/TRIAD
Guests online: 22
Top Demos
1 Uncensored  (9.7)
2 Coma Light 13  (9.7)
3 Edge of Disgrace  (9.7)
4 Comaland  (9.6)
5 Comaland 100%  (9.6)
6 Wonderland XII  (9.5)
7 Fantasmolytic  (9.5)
8 Rocketry  (9.5)
9 GoatLight  (9.5)
10 We Are Demo  (9.4)
Top onefile Demos
1 Treu Love [reu]  (9.5)
2 Daah, Those Acid Pil..  (9.4)
3 Dawnfall  (9.4)
4 One-Der  (9.3)
5 Hardware Accelerated..  (9.2)
6 Animated Poopinski T..  (9.1)
7 Safe VSP  (9.1)
8 Black Magic  (9.0)
9 Goatbeard  (9.0)
10 Te-Te-Te-TechTech It..  (9.0)
Top Groups
1 Censor Design  (9.5)
2 Booze Design  (9.4)
3 Oxyron  (9.4)
4 Crest  (9.4)
5 Nostalgia  (9.2)
Top Cover Designers
1 Duce  (9.8)
2 Electric  (9.7)
3 Junkie  (9.7)
4 The Elegance  (9.4)
5 Cuc  (9.2)

Home - Disclaimer
Copyright © No Name 2001-2016
Page generated in: 0.652 sec.