[CSDb] - User Forums - Nucrunch 0.1

Welcome to our latest new user Harvey ! (Registered 2024-11-25)

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Nucrunch 0.1

2016-02-04 10:02

ChristopherJam

Registered: Aug 2004
Posts: 1408

Nucrunch 0.1

Continuing from the benchmarks WVL posted in Doynamite 1.x:

I dusted off my unfinished nucrunch in December to pack just enough of the second page of Reutastic to give me some workspace for some precalculations. Pity I didn't schedule enough time to pack the entire demo, else it would have been ~90 blocks instead of 190, but I digress. I've spent bits of the past month cleaning up the code, optimizing the packer (mostly by porting it from python to rust :P), and adding reverse direction support.

It's still no more than a component, with an commandline packer and asm decrunch subroutine, but no tools yet for generating an executable from a single commandline. It does at least now support multiple input segments that are unpacked to their destination addresses, and it's also now useable enough to for me to do some benchmarking.

In short, doynamite's ratio looks pretty unbeatable for anything lz based; my ratio's almost identical despite a somewhat different encoding.

Where I can win though is speed at that ratio; nucrunch is usually ten to twenty percent faster. The one exception in the test corpus is 6.bin, where it's 20% slower; not sure why yet.

I've added the times for pucrunch -ffast below for to complete the comparison. Last two columns are nucrunch, and nucrunch -r (the latter decodes in reverse; should be a more useful component for single filers)

If anyone wants to have a play at this stage, poke me and I'll upload some source. Failing that I'll hold off until I at least have something that can make onefilers without any faffing about with relocating the last couple of pages by hand.

filesizes
#   bin   rle wvl-f wvl-s    tc    bb  pu-f doyna nucru rnucr
- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
1 11008  8020  4529  4151  4329  3383  3711  3265  3225  3230
2  4973  4314  3532  3309  3423  2648  3005  2512  2498  2490
3  3949  3498  2991  2617  2972  2187  2530  2108  2091  2093
4  7016  6456  4242  4085  4225  3681  3924  3617  3622  3614
5 34760 27647 25781 24895 25210 21306 21182 20405 20447 20516
6 31605 12511 11283 10923 11614  9194  9203  8904  8915  8894
7 20392 17295 12108 11285 11445  9627  9789  9289  9140  9144
8  5713  5407  4179  3916  3936  3251  3656  3132  3165  3187
9  8960  7986  6914  6896  6572  5586  6000  5430  5502  5486

filesize in %
#   bin   rle wvl-f wvl-s    tc    bb  pu-f doyna nucru rnucr
- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
1   100  72.9  41.1  37.7  39.3  30.7  33.7  29.7  29.3  29.3
2   100  86.7  71.0  66.5  68.8  53.2  60.4  50.5  50.2  50.1
3   100  88.6  75.7  66.3  75.3  55.4  64.1  53.4  53.0  53.0
4   100  92.0  60.5  58.2  60.2  52.5  55.9  51.6  51.6  51.5
5   100  79.5  74.2  71.6  72.5  61.3  60.9  58.7  58.8  59.0
6   100  39.6  35.7  34.6  36.7  29.1  29.1  28.2  28.2  28.1
7   100  84.8  59.4  55.3  56.1  47.2  48.0  45.6  44.8  44.8
8   100  94.6  73.1  68.5  68.9  56.9  64.0  54.8  55.4  55.8
9   100  89.1  77.2  77.0  73.3  62.3  67.0  60.6  61.4  61.2

number of frames to depack
#   bin   rle wvl-f wvl-s    tc    bb  pu-f doyna nucru rnucr
- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
1     0    11    13    14    15    58    54    27    22    22
2     0     5     7     7     9    38    39    17    14    14
3     0     4     6     6     7    28    31    12    10    10
4     0     8     9     9    10    43    51    20    17    18
5     0    36    39    42    59   300   298   119   104   107
6     0    20    25    25    37   126   152    49    59    59
7     0    22    25    26    32   138   139    60    51    52
8     0     6     8     8    10    43    47    18    16    17
9     0     9    12    12    16    73    81    32    28    29

kilobytes output per second
#   bin   rle wvl-f wvl-s    tc    bb  pu-f doyna nucru rnucr
- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
1        49.0  41.4  38.5  35.9   9.3  10.0  20.0  24.5  24.5
2        48.7  34.8  34.8  27.0   6.4   6.2  14.3  17.4  17.4
3        48.3  32.2  32.2  27.6   6.9   6.2  16.1  19.3  19.3
4        42.9  38.2  38.2  34.3   8.0   6.7  17.2  20.2  19.1
5        47.3  43.6  40.5  28.8   5.7   5.7  14.3  16.4  15.9
6        77.4  61.9  61.9  41.8  12.3  10.2  31.6  26.2  26.2
7        45.4  39.9  38.4  31.2   7.2   7.2  16.6  19.6  19.2
8        46.6  35.0  35.0  28.0   6.5   6.0  15.5  17.5  16.5
9        48.7  36.5  36.5  27.4   6.0   5.4  13.7  15.7  15.1

cycles per byte consumed
#   bin   rle wvl-f wvl-s    tc    bb  pu-f doyna nucru rnucr
- ----- ----- ----- ----- ----- ----- ----- ----- ----- -----
1     0    27    56    66    68   337   286   163   134   134
2     0    23    39    42    52   282   255   133   110   111
3     0    22    39    45    46   252   241   112    94    94
4     0    24    42    43    47   230   255   109    92    98
5     0    26    30    33    46   277   277   115   100   103
6     0    31    44    45    63   269   325   108   130   130
7     0    25    41    45    55   282   279   127   110   112
8     0    22    38    40    50   260   253   113    99   105
9     0    22    34    34    48   257   265   116   100   104

decrunch time for nucrunch/rnucrunch relative to doynamite
1:  81.5% (-18.5%)  81.5% (-18.5%)
2:  82.4% (-17.6%)  82.4% (-17.6%)
3:  83.3% (-16.7%)  83.3% (-16.7%)
4:  85.0% (-15.0%)  90.0% (-10.0%)
5:  87.4% (-12.6%)  89.9% (-10.1%)
6: 120.4% ( 20.4%) 120.4% ( 20.4%)
7:  85.0% (-15.0%)  86.7% (-13.3%)
8:  88.9% (-11.1%)  94.4% ( -5.6%)
9:  87.5% (-12.5%)  90.6% ( -9.4%)

2016-02-04 10:39

Fungus

Registered: Sep 2002
Posts: 680

could you include exo in the tests pls.

2016-02-04 10:40

Oswald

Registered: Apr 2002
Posts: 5086

demo coders wouldnt mind for 10x speed for 10% bigger file. or smth like that.

2016-02-04 11:08

Bitbreaker

Registered: Oct 2002
Posts: 504

How's about the size of the decoder itself? Can it easily be integrated into some loader system?

2016-02-04 11:09

Bitbreaker

Registered: Oct 2002
Posts: 504

Quote: demo coders wouldnt mind for 10x speed for 10% bigger file. or smth like that.

That would just make the decruncher wait even longer for new data on a on the fly loading/decrunching process :-)

2016-02-04 11:26

HCL

Registered: Feb 2003
Posts: 727

Oh noes.. i get compared again on 10+ years old code. But i don't blame noone but myself, i just have to click the release-button :P.

2016-02-04 11:56

ChristopherJam

Registered: Aug 2004
Posts: 1408

@Fungus, I'll add that to my todo list, unless someone else gets to it first!

@Oswald well, for the moment the KB per second table should be useful for choosing between crunchers, but I'm planning on adding some tuning parameters to nucrunch next

@Bitbreaker Currently around 380 bytes for the standard decruncher/450 for the reverse. But there's some fairly aggressive inlining in there, probably wouldn't lose too much performance if I scale that back a bit.

The plan is to integrate it with marmaload, but my progress on that is pretty glacial at the moment. Should be pretty trivial to replace the 'next page' callback with something that signals a loader/waits for the next page in any case.

2016-02-04 11:56

ChristopherJam

Registered: Aug 2004
Posts: 1408

@HCL it's ok, you're still 20% faster than pucrunch -ffast, and that was my go-to until *very* recently :D

2016-02-04 18:24

Bitbreaker

Registered: Oct 2002
Posts: 504

Quoting ChristopherJam

@Bitbreaker Currently around 380 bytes for the standard decruncher/450 for the reverse. But there's some fairly aggressive inlining in there, probably wouldn't lose too much performance if I scale that back a bit.

Sounds huge yet, but i'd give a smaller version a try, so mind sharing code?

2016-02-04 19:42

ChristopherJam

Registered: Aug 2004
Posts: 1408

Quote: Quoting ChristopherJam

@Bitbreaker Currently around 380 bytes for the standard decruncher/450 for the reverse. But there's some fairly aggressive inlining in there, probably wouldn't lose too much performance if I scale that back a bit.

Sounds huge yet, but i'd give a smaller version a try, so mind sharing code?

OK, I'll bundle something up tomorrow.

2016-02-04 20:08

Kabuto
Account closed

Registered: Sep 2004
Posts: 58

Could you please include my ALZ64 packer too? It's an LZMA-based packer, compression is very good, but decompression is very slow yet still acceptable for 4 KB intros, and some of your test cases are in that range :)

2016-02-05 08:58

mankeli

Registered: Oct 2010
Posts: 141

Can you add Spindle's cruncher to the list?

2016-02-05 10:20

Fungus

Registered: Sep 2002
Posts: 680

HCL: Yours is still best for cracks too, if lots of small files. :)

2016-02-07 22:57

Martin Piper

Registered: Nov 2007
Posts: 718

I couldn't find the test files along with the code used to replicate the tests?

2016-02-08 05:55

ChristopherJam

Registered: Aug 2004
Posts: 1408

@Martin, I grabbed the test files from LZWVL (bin.rar)

2016-02-08 09:02

Martin Piper

Registered: Nov 2007
Posts: 718

Quote: @Martin, I grabbed the test files from LZWVL (bin.rar)

Cheers :)
The number of bytes in the top table, is that the raw file compressed only? No self extracting code?

2016-02-08 09:51

ChristopherJam

Registered: Aug 2004
Posts: 1408

np. Raw file compressed only. (Or at least, that's what I was assuming for rle, wvl, bb & doyna, and it's what I measured for nucrunch, tinycrunch and pucrunch)

2016-02-08 11:35

Martin Piper

Registered: Nov 2007
Posts: 718

Oh cool. :) I also get similar compression results to yours with my old LZ based code. The circular recent history buffer (updateHistoryBuffer) that stores length and offset helps squeeze more compression at the expense of decompression speed.

Current LZMPi results:
1 3,184
2 2,410
3 1,931
4 3,571
5 20,362
6 8,719
7 9,256
8 3,048
9 5,563

https://github.com/martinpiper/C64Public/tree/master/Compression
https://github.com/martinpiper/C64Public/tree/master/Decompress..

2016-02-08 13:12

MagerValp

Registered: Dec 2001
Posts: 1074

For funsies here's the result from LZMV+RLE with a 4k window:

#   bin   rle    wvl-f  wvl-s  tc     bb     pu-f   doyna  nucru  rnucr  LZMV4k
1   100   72.9   41.1   37.7   39.3   30.7   33.7   29.7   29.3   29.3   38.2
2   100   86.7   71.0   66.5   68.8   53.2   60.4   50.5   50.2   50.1   64.0
3   100   88.6   75.7   66.3   75.3   55.4   64.1   53.4   53.0   53.0   64.6
4   100   92.0   60.5   58.2   60.2   52.5   55.9   51.6   51.6   51.5   61.9
5   100   79.5   74.2   71.6   72.5   61.3   60.9   58.7   58.8   59.0   68.6
6   100   39.6   35.7   34.6   36.7   29.1   29.1   28.2   28.2   28.1   33.6
7   100   84.8   59.4   55.3   56.1   47.2   48.0   45.6   44.8   44.8   54.7
8   100   94.6   73.1   68.5   68.9   56.9   64.0   54.8   55.4   55.8   69.3
9   100   89.1   77.2   77.0   73.3   62.3   67.0   60.6   61.4   61.2   72.6

https://github.com/MagerValp/u4remastered/blob/master/tools/bac..
https://github.com/MagerValp/u4remastered/blob/master/src/u4loa..

2016-02-08 22:17

ChristopherJam

Registered: Aug 2004
Posts: 1408

OK, got some updated benchmarks. Still no exo or ALZ64, but I've updated ByteBoozer to version 2.0, optimised nucrunch a little, and I've switched to counting cycles for timing BB, Pu & Nu. I've added half a frame to all the other times on the assumption that they were originally all rounded down.

I've also included sizes for LZMPi and ~LZMV4k (thanks guys!), no timings for those yet. I gather bitnax/doynax is a little faster now, too; update yet to come.

filesizes
#    bin    rle  wvl-f  wvl-s     tc  bb2.0   pu-f  doyna  nucru  rnucr  LZMPi LZMV4k
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
1  11008   8020   4529   4151   4329   3322   3711   3265   3225   3230   3184   4205
2   4973   4314   3532   3309   3423   2513   3005   2512   2498   2490   2410   3183
3   3949   3498   2991   2617   2972   2098   2530   2108   2091   2093   1931   2551
4   7016   6456   4242   4085   4225   3682   3924   3617   3622   3614   3571   4343
5  34760  27647  25781  24895  25210  20530  21182  20405  20447  20516  20362  23845
6  31605  12511  11283  10923  11614   8998   9203   8904   8915   8896   8719  10619
7  20392  17295  12108  11285  11445   9241   9789   9289   9140   9145   9256  11154
8   5713   5407   4179   3916   3936   3165   3656   3132   3166   3187   3048   3959
9   8960   7986   6914   6896   6572   5491   6000   5430   5502   5487   5563   6505

filesize in %
#    bin    rle  wvl-f  wvl-s     tc  bb2.0   pu-f  doyna  nucru  rnucr  LZMPi LZMV4k
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
1  100.0   72.9   41.1   37.7   39.3   30.2   33.7   29.7   29.3   29.3   28.9   38.2
2  100.0   86.7   71.0   66.5   68.8   50.5   60.4   50.5   50.2   50.1   48.5   64.0
3  100.0   88.6   75.7   66.3   75.3   53.1   64.1   53.4   53.0   53.0   48.9   64.6
4  100.0   92.0   60.5   58.2   60.2   52.5   55.9   51.6   51.6   51.5   50.9   61.9
5  100.0   79.5   74.2   71.6   72.5   59.1   60.9   58.7   58.8   59.0   58.6   68.6
6  100.0   39.6   35.7   34.6   36.7   28.5   29.1   28.2   28.2   28.1   27.6   33.6
7  100.0   84.8   59.4   55.3   56.1   45.3   48.0   45.6   44.8   44.8   45.4   54.7
8  100.0   94.6   73.1   68.5   68.9   55.4   64.0   54.8   55.4   55.8   53.4   69.3
9  100.0   89.1   77.2   77.0   73.3   61.3   67.0   60.6   61.4   61.2   62.1   72.6

number of frames to depack
#    bin    rle  wvl-f  wvl-s     tc  bb2.0   pu-f  doyna  nucru  rnucr  LZMPi LZMV4k
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
1    0.0   11.5   13.5   14.5   15.5   46.1   54.7   27.5   22.0   21.3
2    0.0    5.5    7.5    7.5    9.5   31.2   39.6   17.5   13.7   13.6
3    0.0    4.5    6.5    6.5    7.5   22.7   31.9   12.5   10.0   10.1
4    0.0    8.5    9.5    9.5   10.5   37.5   52.0   20.5   16.6   17.1
5    0.0   36.5   39.5   42.5   59.5  222.2  298.6  119.5  101.4  100.6
6    0.0   20.5   25.5   25.5   37.5  112.3  152.8   49.5   57.9   56.6
7    0.0   22.5   25.5   26.5   32.5  109.6  139.8   60.5   50.3   50.0
8    0.0    6.5    8.5    8.5   10.5   35.4   47.7   18.5   15.6   15.8
9    0.0    9.5   12.5   12.5   16.5   62.3   81.7   32.5   27.1   27.2

kilobytes output per second
#    bin    rle  wvl-f  wvl-s     tc  bb2.0   pu-f  doyna  nucru  rnucr  LZMPi LZMV4k
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
1          46.9   39.9   37.2   34.8   11.7    9.9   19.6   24.5   25.3
2          44.3   32.5   32.5   25.6    7.8    6.1   13.9   17.7   17.9
3          43.0   29.7   29.7   25.8    8.5    6.1   15.5   19.4   19.2
4          40.4   36.2   36.2   32.7    9.2    6.6   16.8   20.6   20.1
5          46.6   43.1   40.0   28.6    7.7    5.7   14.2   16.8   16.9
6          75.5   60.7   60.7   41.3   13.8   10.1   31.3   26.7   27.3
7          44.4   39.1   37.7   30.7    9.1    7.1   16.5   19.8   20.0
8          43.0   32.9   32.9   26.6    7.9    5.9   15.1   18.0   17.7
9          46.2   35.1   35.1   26.6    7.0    5.4   13.5   16.2   16.1

cycles per byte consumed
#    bin    rle  wvl-f  wvl-s     tc  bb2.0   pu-f  doyna  nucru  rnucr  LZMPi LZMV4k
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
1    0.0   28.2   58.6   68.7   70.4  272.9  289.7  165.6  134.2  129.4
2    0.0   25.1   41.7   44.6   54.6  244.4  259.3  136.9  108.0  107.5
3    0.0   25.3   42.7   48.8   49.6  212.3  247.9  116.6   93.8   94.6
4    0.0   25.9   44.0   45.7   48.8  200.3  260.4  111.4   90.3   93.1
5    0.0   26.0   30.1   33.6   46.4  212.8  277.1  115.1   97.5   96.4
6    0.0   32.2   44.4   45.9   63.5  245.3  326.3  109.3  127.7  125.1
7    0.0   25.6   41.4   46.2   55.8  233.2  280.7  128.0  108.2  107.4
8    0.0   23.6   40.0   42.7   52.4  219.9  256.4  116.1   96.6   97.2
9    0.0   23.4   35.5   35.6   49.3  223.0  267.7  117.6   96.9   97.4

2016-02-09 06:35

Martin Piper

Registered: Nov 2007
Posts: 718

Are you timing with VICE stopwatch to get the cycle times? With the screen off?

2016-02-09 06:49

ChristopherJam

Registered: Aug 2004
Posts: 1408

Chained DC00 timers (based on a routine from Glasnost) and blanked screen for nucrunch

VICE stopwatch for BB & pu, not blanked; which in the light of day seems a tad unfair - forgot that at 3am.

I need to either translate BB or install the assembler HCL uses so I can build it into my testbed instead of just running the produced executable.

2016-02-09 07:02

HCL

Registered: Feb 2003
Posts: 727

Hmm, compression rate of B2 was something like expected.. but performance was not that impressive, only 15-25% faster than before. I guess i was comparing to original bb, and you probably had the latest optimized decruncher. Never mind, thanx for adding it so quickly!!

2016-02-10 12:22

MagerValp

Registered: Dec 2001
Posts: 1074

I added exoraw 2.0.9 with default options:

filesizes
#    bin    rle  wvl-f  wvl-s     tc  bb2.0   pu-f  doyna  nucru  rnucr  LZMPi LZMV4k exoraw
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
1  11008   8020   4529   4151   4329   3322   3711   3265   3225   3230   3184   4205   2988
2   4973   4314   3532   3309   3423   2513   3005   2512   2498   2490   2410   3183   2241
3   3949   3498   2991   2617   2972   2098   2530   2108   2091   2093   1931   2551   1817
4   7016   6456   4242   4085   4225   3682   3924   3617   3622   3614   3571   4343   3454
5  34760  27647  25781  24895  25210  20530  21182  20405  20447  20516  20362  23845  19631
6  31605  12511  11283  10923  11614   8998   9203   8904   8915   8896   8719  10619   8337
7  20392  17295  12108  11285  11445   9241   9789   9289   9140   9145   9256  11154   8751
8   5713   5407   4179   3916   3936   3165   3656   3132   3166   3187   3048   3959   3059
9   8960   7986   6914   6896   6572   5491   6000   5430   5502   5487   5563   6505   5295

filesize in %
#    bin    rle  wvl-f  wvl-s     tc  bb2.0   pu-f  doyna  nucru  rnucr  LZMPi LZMV4k exoraw
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
1  100.0   72.9   41.1   37.7   39.3   30.2   33.7   29.7   29.3   29.3   28.9   38.2   27.1
2  100.0   86.7   71.0   66.5   68.8   50.5   60.4   50.5   50.2   50.1   48.5   64.0   45.1
3  100.0   88.6   75.7   66.3   75.3   53.1   64.1   53.4   53.0   53.0   48.9   64.6   46.0
4  100.0   92.0   60.5   58.2   60.2   52.5   55.9   51.6   51.6   51.5   50.9   61.9   49.2
5  100.0   79.5   74.2   71.6   72.5   59.1   60.9   58.7   58.8   59.0   58.6   68.6   56.5
6  100.0   39.6   35.7   34.6   36.7   28.5   29.1   28.2   28.2   28.1   27.6   33.6   26.4
7  100.0   84.8   59.4   55.3   56.1   45.3   48.0   45.6   44.8   44.8   45.4   54.7   42.9
8  100.0   94.6   73.1   68.5   68.9   55.4   64.0   54.8   55.4   55.8   53.4   69.3   53.5
9  100.0   89.1   77.2   77.0   73.3   61.3   67.0   60.6   61.4   61.2   62.1   72.6   59.1

For a graph and results sorted by average file size: https://www.icloud.com/numbers/00060NSRmk5sdCbLjtye-UGCA#LZMV_B..

2016-02-10 12:36

Martin Piper

Registered: Nov 2007
Posts: 718

Very cool compression with exoraw. Something to strive towards.

2016-02-10 13:14

ChristopherJam

Registered: Aug 2004
Posts: 1408

Impressively small! Any chance of cycle timings for the new additions?

2016-02-10 14:00

MagerValp

Registered: Dec 2001
Posts: 1074

I don't have time to code anything from scratch at the moment, but if you share your benchmarking code I could probably do LZMV and exomizer.

And while exomizer produces small files, it really is butt slow.

2016-02-10 14:32

ChristopherJam

Registered: Aug 2004
Posts: 1408

Here you go:
https://dl.dropboxusercontent.com/u/62734159/csdb/20160210_cycl..

2016-02-10 16:48

Axis/Oxyron
Account closed

Registered: Apr 2007
Posts: 91

So any volunteers to optimize the exo-decruncher? ;oP

2016-02-11 08:59

Bitbreaker

Registered: Oct 2002
Posts: 504

Quote: So any volunteers to optimize the exo-decruncher? ;oP

Doesn't make much sense as long as you also want the decruncher to be tiny, most of all, as it needs a 156 byte scratchpad during decrunching.

2016-02-11 10:46

MagerValp

Registered: Dec 2001
Posts: 1074

Added timed benchmarks for LZMV, both the original 256 byte sliding window and the 4k version I used in U4. Note that neither is optimized for speed right now, but size and convenience.

filesizes
#    bin    rle  wvl-f  wvl-s     tc  bb2.0   pu-f  doyna  nucru  rnucr  LZMPi LZMV4k LZMV256 exoraw
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------  ------
1  11008   8020   4529   4151   4329   3322   3711   3265   3225   3230   3184   4205    4539   2988
2   4973   4314   3532   3309   3423   2513   3005   2512   2498   2490   2410   3183    3575   2241
3   3949   3498   2991   2617   2972   2098   2530   2108   2091   2093   1931   2551    3018   1817
4   7016   6456   4242   4085   4225   3682   3924   3617   3622   3614   3571   4343    4314   3454
5  34760  27647  25781  24895  25210  20530  21182  20405  20447  20516  20362  23845   26116  19631
6  31605  12511  11283  10923  11614   8998   9203   8904   8915   8896   8719  10619   11352   8337
7  20392  17295  12108  11285  11445   9241   9789   9289   9140   9145   9256  11154   12188   8751
8   5713   5407   4179   3916   3936   3165   3656   3132   3166   3187   3048   3959    3987   3059
9   8960   7986   6914   6896   6572   5491   6000   5430   5502   5487   5563   6505    6943   5295

filesize in %
#    bin    rle  wvl-f  wvl-s     tc  bb2.0   pu-f  doyna  nucru  rnucr  LZMPi LZMV4k LZMV256 exoraw
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------- ------
1  100.0   72.9   41.1   37.7   39.3   30.2   33.7   29.7   29.3   29.3   28.9   38.2    41,2   27.1
2  100.0   86.7   71.0   66.5   68.8   50.5   60.4   50.5   50.2   50.1   48.5   64.0    71,9   45.1
3  100.0   88.6   75.7   66.3   75.3   53.1   64.1   53.4   53.0   53.0   48.9   64.6    76,4   46.0
4  100.0   92.0   60.5   58.2   60.2   52.5   55.9   51.6   51.6   51.5   50.9   61.9    61,5   49.2
5  100.0   79.5   74.2   71.6   72.5   59.1   60.9   58.7   58.8   59.0   58.6   68.6    75,1   56.5
6  100.0   39.6   35.7   34.6   36.7   28.5   29.1   28.2   28.2   28.1   27.6   33.6    35,9   26.4
7  100.0   84.8   59.4   55.3   56.1   45.3   48.0   45.6   44.8   44.8   45.4   54.7    59,8   42.9
8  100.0   94.6   73.1   68.5   68.9   55.4   64.0   54.8   55.4   55.8   53.4   69.3    69,8   53.5
9  100.0   89.1   77.2   77.0   73.3   61.3   67.0   60.6   61.4   61.2   62.1   72.6    77,5   59.1

number of frames to depack
#    bin    rle  wvl-f  wvl-s     tc  bb2.0   pu-f  doyna  nucru  rnucr  LZMPi LZMV4k LZMV256 exoraw
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------- ------
1    0.0   11.5   13.5   14.5   15.5   46.1   54.7   27.5   22.0   21.3          16.9    13.7
2    0.0    5.5    7.5    7.5    9.5   31.2   39.6   17.5   13.7   13.6           9.6     8.4
3    0.0    4.5    6.5    6.5    7.5   22.7   31.9   12.5   10.0   10.1           7.1     6.2
4    0.0    8.5    9.5    9.5   10.5   37.5   52.0   20.5   16.6   17.1          11.5     9.4
5    0.0   36.5   39.5   42.5   59.5  222.2  298.6  119.5  101.4  100.6          58.4    46.6
6    0.0   20.5   25.5   25.5   37.5  112.3  152.8   49.5   57.9   56.6          35.8    38.2
7    0.0   22.5   25.5   26.5   32.5  109.6  139.8   60.5   50.3   50.0          35.0    29.1
8    0.0    6.5    8.5    8.5   10.5   35.4   47.7   18.5   15.6   15.8          10.3     8.9
9    0.0    9.5   12.5   12.5   16.5   62.3   81.7   32.5   27.1   27.2          16.6    14.2

kilobytes output per second
#    bin    rle  wvl-f  wvl-s     tc  bb2.0   pu-f  doyna  nucru  rnucr  LZMPi LZMV4k LZMV256 exoraw
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------- ------
1          46.9   39.9   37.2   34.8   11.7    9.9   19.6   24.5   25.3          33.1    41.0
2          44.3   32.5   32.5   25.6    7.8    6.1   13.9   17.7   17.9          26.4    30.1
3          43.0   29.7   29.7   25.8    8.5    6.1   15.5   19.4   19.2          28.4    32.6
4          40.4   36.2   36.2   32.7    9.2    6.6   16.8   20.6   20.1          31.1    38.1
5          46.6   43.1   40.0   28.6    7.7    5.7   14.2   16.8   16.9          30.3    38.0
6          75.5   60.7   60.7   41.3   13.8   10.1   31.3   26.7   27.3          44.9    42.1
7          44.4   39.1   37.7   30.7    9.1    7.1   16.5   19.8   20.0          29.7    35.6
8          43.0   32.9   32.9   26.6    7.9    5.9   15.1   18.0   17.7          28.1    32.6
9          46.2   35.1   35.1   26.6    7.0    5.4   13.5   16.2   16.1          27.4    32.1

cycles per byte consumed
#    bin    rle  wvl-f  wvl-s     tc  bb2.0   pu-f  doyna  nucru  rnucr  LZMPi LZMV4k LZMV256 exoraw
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------- ------
1    0.0   28.2   58.6   68.7   70.4  272.9  289.7  165.6  134.2  129.4          30.2    24.4
2    0.0   25.1   41.7   44.6   54.6  244.4  259.3  136.9  108.0  107.5          37.8    33.2
3    0.0   25.3   42.7   48.8   49.6  212.3  247.9  116.6   93.8   94.6          35.2    30.6
4    0.0   25.9   44.0   45.7   48.8  200.3  260.4  111.4   90.3   93.1          32.2    26.2
5    0.0   26.0   30.1   33.6   46.4  212.8  277.1  115.1   97.5   96.4          33.0    26.3
6    0.0   32.2   44.4   45.9   63.5  245.3  326.3  109.3  127.7  125.1          22.3    23.8
7    0.0   25.6   41.4   46.2   55.8  233.2  280.7  128.0  108.2  107.4          33.7    28.1
8    0.0   23.6   40.0   42.7   52.4  219.9  256.4  116.1   96.6   97.2          35.5    30.6
9    0.0   23.4   35.5   35.6   49.3  223.0  267.7  117.6   96.9   97.4          36.5    31.1

2016-02-11 11:37

mankeli

Registered: Oct 2010
Posts: 141

just for comparison:

cycles per byte consumed (bin): 4+4 = 8
kilobytes output per second (bin): (312*63*50)/8 -> 122.85kB/s
etc.

:)

2016-02-11 11:46

MagerValp

Registered: Dec 2001
Posts: 1074

That requires an unrolled loop and hardcoded addresses, so it's not a realistic comparison.

2016-02-11 12:21

Bitbreaker

Registered: Oct 2002
Posts: 504

also with direct loading into registers and presorting to maximize register reuse upon same values, we get below 6 cycles ;-) For further brainfuck we can combine registers with SAX, SHA, ... on some stores :-D

2016-02-11 14:22

ChristopherJam

Registered: Aug 2004
Posts: 1408

Quoting Bitbreaker

also with direct loading into registers and presorting to maximize register reuse upon same values, we get below 6 cycles ;-)

Exactly what's required for half-cell Koalas, or there wouldn't be enough CPU time to update 37 bytes of d800 values every four lines :)

But yeah, I was assuming the .bin was already in place, so this is the excess cycles over time taken to transfer from external storage, be it 1541 or something more exotic

2016-02-11 14:23

ChristopherJam

Registered: Aug 2004
Posts: 1408

Quoting MagerValp

Added timed benchmarks for LZMV, both the original 256 byte sliding window and the 4k version I used in U4. Note that neither is optimized for speed right now, but size and convenience.

Thank you! Looking very similar to TinyCrunch there, maybe I should be dusting that off too

2016-02-11 16:31

MagerValp

Registered: Dec 2001
Posts: 1074

Added results of exomizer mem. Butt slow. I also added average compression and total time to depack all 9.

filesizes
#    bin    rle  wvl-f  wvl-s     tc  bb2.0   pu-f  doyna  nucru  rnucr  LZMPi LZMV4k LZMV256 exomem
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------  ------
1  11008   8020   4529   4151   4329   3322   3711   3265   3225   3230   3184   4205    4539   2988
2   4973   4314   3532   3309   3423   2513   3005   2512   2498   2490   2410   3183    3575   2225
3   3949   3498   2991   2617   2972   2098   2530   2108   2091   2093   1931   2551    3018   1808
4   7016   6456   4242   4085   4225   3682   3924   3617   3622   3614   3571   4343    4314   3442
5  34760  27647  25781  24895  25210  20530  21182  20405  20447  20516  20362  23845   26116  19715
6  31605  12511  11283  10923  11614   8998   9203   8904   8915   8896   8719  10619   11352   8322
7  20392  17295  12108  11285  11445   9241   9789   9289   9140   9145   9256  11154   12188   8765
8   5713   5407   4179   3916   3936   3165   3656   3132   3166   3187   3048   3959    3987   3081
9   8960   7986   6914   6896   6572   5491   6000   5430   5502   5487   5563   6505    6943   5304

filesize in %
#    bin    rle  wvl-f  wvl-s     tc  bb2.0   pu-f  doyna  nucru  rnucr  LZMPi LZMV4k LZMV256 exomem
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------- ------
1  100.0   72.9   41.1   37.7   39.3   30.2   33.7   29.7   29.3   29.3   28.9   38.2    41.2   27.1
2  100.0   86.7   71.0   66.5   68.8   50.5   60.4   50.5   50.2   50.1   48.5   64.0    71.9   44.7
3  100.0   88.6   75.7   66.3   75.3   53.1   64.1   53.4   53.0   53.0   48.9   64.6    76.4   45.8
4  100.0   92.0   60.5   58.2   60.2   52.5   55.9   51.6   51.6   51.5   50.9   61.9    61.5   49.1
5  100.0   79.5   74.2   71.6   72.5   59.1   60.9   58.7   58.8   59.0   58.6   68.6    75.1   56.7
6  100.0   39.6   35.7   34.6   36.7   28.5   29.1   28.2   28.2   28.1   27.6   33.6    35.9   26.3
7  100.0   84.8   59.4   55.3   56.1   45.3   48.0   45.6   44.8   44.8   45.4   54.7    59.8   43.0
8  100.0   94.6   73.1   68.5   68.9   55.4   64.0   54.8   55.4   55.8   53.4   69.3    69.8   53.9
9  100.0   89.1   77.2   77.0   73.3   61.3   67.0   60.6   61.4   61.2   62.1   72.6    77.5   59.2
Average    80.9   63.1   59.5   61.2   48.4   53.7   48.1   48.1   48.1   47.1   58.6    63.2   45.1

number of frames to depack
#      rle  wvl-f  wvl-s     tc  bb2.0   pu-f  doyna  nucru  rnucr  LZMPi LZMV4k LZMV256 exomem
--- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------- ------
1     11.5   13.5   14.5   15.5   46.1   54.7   27.5   22.0   21.3          16.9    13.7   57.9
2      5.5    7.5    7.5    9.5   31.2   39.6   17.5   13.7   13.6           9.6     8.4   38.5
3      4.5    6.5    6.5    7.5   22.7   31.9   12.5   10.0   10.1           7.1     6.2   28.5
4      8.5    9.5    9.5   10.5   37.5   52.0   20.5   16.6   17.1          11.5     9.4   53.1
5     36.5   39.5   42.5   59.5  222.2  298.6  119.5  101.4  100.6          58.4    46.6  295.9
6     20.5   25.5   25.5   37.5  112.3  152.8   49.5   57.9   56.6          35.8    38.2  142.3
7     22.5   25.5   26.5   32.5  109.6  139.8   60.5   50.3   50.0          35.0    29.1  139.2
8      6.5    8.5    8.5   10.5   35.4   47.7   18.5   15.6   15.8          10.3     8.9   44.8
9      9.5   12.5   12.5   16.5   62.3   81.7   32.5   27.1   27.2          16.6    14.2   78.9
Sum  125.5  148.5  153.5  199.5  679.3  898.8  358.5  314.6  312.3         201.2   174.7  879.1

kilobytes output per second
#    rle  wvl-f  wvl-s     tc  bb2.0   pu-f  doyna  nucru  rnucr  LZMPi LZMV4k LZMV256 exomem
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------- ------
1   46.9   39.9   37.2   34.8   11.7    9.9   19.6   24.5   25.3          33.1    41.0    9.7
2   44.3   32.5   32.5   25.6    7.8    6.1   13.9   17.7   17.9          26.4    30.1    6.6
3   43.0   29.7   29.7   25.8    8.5    6.1   15.5   19.4   19.2          28.4    32.6    7.0
4   40.4   36.2   36.2   32.7    9.2    6.6   16.8   20.6   20.1          31.1    38.1    6.7
5   46.6   43.1   40.0   28.6    7.7    5.7   14.2   16.8   16.9          30.3    38.0    6.0
6   75.5   60.7   60.7   41.3   13.8   10.1   31.3   26.7   27.3          44.9    42.1   11.3
7   44.4   39.1   37.7   30.7    9.1    7.1   16.5   19.8   20.0          29.7    35.6    7.5
8   43.0   32.9   32.9   26.6    7.9    5.9   15.1   18.0   17.7          28.1    32.6    6.5
9   46.2   35.1   35.1   26.6    7.0    5.4   13.5   16.2   16.1          27.4    32.1    5.8

cycles per byte consumed
#    rle  wvl-f  wvl-s     tc  bb2.0   pu-f  doyna  nucru  rnucr  LZMPi LZMV4k LZMV256 exomem
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------- ------
1   28.2   58.6   68.7   70.4  272.9  289.7  165.6  134.2  129.4          30.2    24.4  103.4
2   25.1   41.7   44.6   54.6  244.4  259.3  136.9  108.0  107.5          37.8    33.2  152.0
3   25.3   42.7   48.8   49.6  212.3  247.9  116.6   93.8   94.6          35.2    30.6  141.9
4   25.9   44.0   45.7   48.8  200.3  260.4  111.4   90.3   93.1          32.2    26.2  148.7
5   26.0   30.1   33.6   46.4  212.8  277.1  115.1   97.5   96.4          33.0    26.3  167.3
6   32.2   44.4   45.9   63.5  245.3  326.3  109.3  127.7  125.1          22.3    23.8   88.5
7   25.6   41.4   46.2   55.8  233.2  280.7  128.0  108.2  107.4          33.7    28.1  134.2
8   23.6   40.0   42.7   52.4  219.9  256.4  116.1   96.6   97.2          35.5    30.6  154.0
9   23.4   35.5   35.6   49.3  223.0  267.7  117.6   96.9   97.4          36.5    31.1  173.0

2016-02-11 20:08

Fungus

Registered: Sep 2002
Posts: 680

Thanks for that MV.

2016-02-11 21:27

Burglar

Registered: Dec 2004
Posts: 1088

2016-02-11 22:09

Burglar

Registered: Dec 2004
Posts: 1088

2016-02-11 22:34

Frantic

Registered: Mar 2003
Posts: 1646

Now just give us the Pareto Optimal set too and then we've settled this properly. ;)

2016-02-11 22:37

Burglar

Registered: Dec 2004
Posts: 1088

It would be invalid to treat Pareto efficiency as equivalent to social optimality.

2016-02-11 22:38

Frantic

Registered: Mar 2003
Posts: 1646

...but it would still be great. :)

2016-02-11 23:37

HCL

Registered: Feb 2003
Posts: 727

I'm sorry CJam, but those numbers for b2 decrunching times just don't make sense. I have setup the same tests (i hope) here at my place, and i get numbers in the same area as Doynax and Nucrunch.. which is expected. Could you please double check you're not using the old bb-decruncher on the b2-files or something.. :)

2016-02-12 04:38

ChristopherJam

Registered: Aug 2004
Posts: 1408

Curious indeed. I just grabbed the new release, built files with

bench_bb_speed/%.prg: corpus/%.prg Makefile
	mkdir -p bench_bb_speed
	cp $< bench_bb_speed
	b2 -c 1000 $@
	mv $@.b2 $@

and set watches in Vice on 080d and 1000, zeroing stopwatch at the first and recording it at the second.

I was going have another go anyway after patching the decoder to zero $d011, but that would only gain 4-5%. I'll keep you posted.

2016-02-12 07:36

HCL

Registered: Feb 2003
Posts: 727

Ah, so you are making executables (-c).. that explains it :). The executables have a smaller decruncher which is alot slower than the normal one.

2016-02-12 08:25

ChristopherJam

Registered: Aug 2004
Posts: 1408

That would do it!

Funny, I was just disassembling in Vice on the off chance the embedded decoder was out of date, and noticing the absence of the sequence of starting STYs :)

2016-02-12 09:35

HCL

Registered: Feb 2003
Posts: 727

I should have understood that when you were talking about translating the decruncher..

2016-02-12 09:41

Martin Piper

Registered: Nov 2007
Posts: 718

So are we meant to be timing with self extracting space optimised exes or speed optimised decompression code? :)
Or both? :)

2016-02-12 09:52

JackAsser

Registered: Jun 2002
Posts: 2014

Quote: So are we meant to be timing with self extracting space optimised exes or speed optimised decompression code? :)
Or both? :)

From a demo coder point of view, the self extracting values are completely useless.

2016-02-12 10:17

ChristopherJam

Registered: Aug 2004
Posts: 1408

Yup, speed optimised. I just didn't realise that wasn't what was bundled into the self-extracting archive.

2016-02-12 10:34

enthusi

Registered: May 2004
Posts: 677

For games you often (and for tapes always) want in-place depacking of 'raw' packed data.
(load data to $1xxx-$2004, depack to $1000-$2000)
Not streamed and not self-executing.
Doynamite ist just great for that.
Many tools/loaders seem to be tuned for demo-usage, not so much games :)

2016-02-12 10:38

JackAsser

Registered: Jun 2002
Posts: 2014

Quote: For games you often (and for tapes always) want in-place depacking of 'raw' packed data.
(load data to $1xxx-$2004, depack to $1000-$2000)
Not streamed and not self-executing.
Doynamite ist just great for that.
Many tools/loaders seem to be tuned for demo-usage, not so much games :)

This is exactly what we do in demos always, isn't it?

2016-02-12 14:47

MagerValp

Registered: Dec 2001
Posts: 1074

I have sketched out a version of LZMV that decrunches in place, with the added finesse that it doesn't need any overlap - e.g. load to $1000-$xxxx, decrunch to $1000-$1fff. That 4 byte overlap always bugged me :)

2016-02-12 15:05

JackAsser

Registered: Jun 2002
Posts: 2014

Quote: I have sketched out a version of LZMV that decrunches in place, with the added finesse that it doesn't need any overlap - e.g. load to $1000-$xxxx, decrunch to $1000-$1fff. That 4 byte overlap always bugged me :)

I suppose you have a scratchpad somewhere in memory, like in the decruncher code to handle it then?

2016-02-12 15:18

Oswald

Registered: Apr 2002
Posts: 5086

just stumbled over this:

"Exactly what's required for half-cell Koalas, or there wouldn't be enough CPU time to update 37 bytes of d800 values every four lines :)"

you have 4 rasterlines to update d800, because it is only read on bad lines.

2016-02-12 15:37

ChristopherJam

Registered: Aug 2004
Posts: 1408

Quoting Oswald

just stumbled over this:

"Exactly what's required for half-cell Koalas, or there wouldn't be enough CPU time to update 37 bytes of d800 values every four lines :)"

you have 4 rasterlines to update d800, because it is only read on bad lines.

Well, yes. 4 raster lines less the 40-43 cycles of DMA, 4 cycles to update d011, and 4 cycles to flip d018. That leaves at best 63*4-40-4-4=204 cycles, or 5.51 cycles per byte of d800, and that's assuming you've magically got useful values for d011 and d018 to hand.

2016-02-12 16:13

Oswald

Registered: Apr 2002
Posts: 5086

true. its harder than I thought. (as always)

anyhow 16 values vs 37 writes its should fit easily with register reuse (ok that was what you said, alright:). also eod wasnt doing nowhere near 30 bytes when it was dynamic. now we know why :)

2016-02-12 17:25

algorithm

Registered: May 2002
Posts: 705

Just doing the lda #$00 sta dxxx sta xxxx.... lda #$01 sta dxxx dxxx.. should give max cycle usage of (4*37 for the writes and (2*16) for the register preload = 180 cycles (and that is in the worst case scenario - if there are 16 different colors per line. Should still be ample time to update d800.

2016-02-12 17:33

algorithm

Registered: May 2002
Posts: 705

To reduce it further. can do the delta writes in the second half of the 4 rasterlines.. only change colors that are different in the same position as directly above..

2016-02-12 17:42

algorithm

Registered: May 2002
Posts: 705

Actually should be able to do a delta every 4 lines..

Set up d800-d827 before the first badline..

1
when the badline occurs, do the delta update for the next 4 lines

force second badline 4 pixels down. at this stage, the updates to d800-d827 will occur

then write the delta to change back the values to how they were originally at d828-d84f

repeat from 1

2016-02-12 18:23

Bitbreaker

Registered: Oct 2002
Posts: 504

If x is preloaded for a whole frame one can easily write 16 values with only 8 register loads:

ldx #$0e
lda #$01
sax $d800
sta $d801
lda #$03
sax $d802
sta $d803
lda #$05
sax $d804
sta $d805
...

2016-02-12 18:26

JackAsser

Registered: Jun 2002
Posts: 2014

Quote: just stumbled over this:

"Exactly what's required for half-cell Koalas, or there wouldn't be enough CPU time to update 37 bytes of d800 values every four lines :)"

you have 4 rasterlines to update d800, because it is only read on bad lines.

Really interesting, but why on earth contaminate THIS thread? :D

2016-02-12 18:28

Bitbreaker

Registered: Oct 2002
Posts: 504

Because he can :-D

2016-02-12 18:37

JackAsser

Registered: Jun 2002
Posts: 2014

Quote: Because he can :-D

I'm OK with that! :)

2016-02-12 18:48

HCL

Registered: Feb 2003
Posts: 727

Is this thread just being hijacked!?

2016-02-12 18:51

JackAsser

Registered: Jun 2002
Posts: 2014

Quote: Is this thread just being hijacked!?

Oswald 'The Hijacker'

2016-02-12 22:55

ChristopherJam

Registered: Aug 2004
Posts: 1408

Quoting algorithm

Actually should be able to do a delta every 4 lines..

Set up d800-d827 before the first badline..

1
when the badline occurs, do the delta update for the next 4 lines

force second badline 4 pixels down. at this stage, the updates to d800-d827 will occur

then write the delta to change back the values to how they were originally at d828-d84f

repeat from 1

Exactly what I did. Still waiting on art from someone, or you'd have seen a release by now.

2016-02-13 09:16

MagerValp

Registered: Dec 2001
Posts: 1074

Quote: I suppose you have a scratchpad somewhere in memory, like in the decruncher code to handle it then?

No, no extra memory, that's the beauty of it, only the end address or a counter. I'll sketch something out to show how it works, there might be a corner case that I'm not thinking of.

2016-02-13 10:47

Bitbreaker

Registered: Oct 2002
Posts: 504

But wouldn't that only work if we end with a match then? As long as the last thing to do is copying a literal this would still fail as with packers like Doynax, BB and similar, as one would have some control stream bits/bytes before that to signal a literal and give a length. If the following literal bytes would not overlap, the control bits would have been overwritten by the preceeding action and things would fail. In case things end with a match, all would be fine, only the copying of the match would then kill the already read control bits and the decruncher would end afterwards.

2016-02-13 10:52

ChristopherJam

Registered: Aug 2004
Posts: 1408

Update time!!
All results from above combined, as well as some new ones

Thanks MagerValp for LZMV* & exo*, and Bitbreaker for Bitfire.
Also thanks to HCL for his patience, I've finally switched to the fast decompressor for BB.

Columns sorted by output rate (guessed positions for LZMPi and exoraw, the only ones I don't yet have times for):

filesizes
#    bin    rle  wvl-f  wvl-s LZMV25     tc LZMV4k rnucru nucrun  bb2.0 bitfir doynax  LZMPi exomem   pu-f exoraw
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
1  11008   8020   4529   4151   4539   4329   4205   3230   3225   3322   3324   3265   3184   2988   3711   2988
2   4973   4314   3532   3309   3575   3423   3183   2490   2498   2513   2515   2512   2410   2225   3005   2241
3   3949   3498   2991   2617   3018   2972   2551   2093   2091   2098   2097   2108   1931   1808   2530   1817
4   7016   6456   4242   4085   4314   4225   4343   3614   3622   3682   3682   3617   3571   3442   3924   3454
5  34760  27647  25781  24895  26116  25210  23845  20516  20447  20530  20531  20405  20362  19715  21182  19631
6  31605  12511  11283  10923  11352  11614  10619   8896   8915   8998   9004   8904   8719   8322   9203   8337
7  20392  17295  12108  11285  12188  11445  11154   9145   9140   9241   9242   9289   9256   8765   9789   8751
8   5713   5407   4179   3916   3987   3936   3959   3187   3166   3165   3162   3132   3048   3081   3656   3059
9   8960   7986   6914   6896   6943   6572   6505   5487   5502   5491   5491   5430   5563   5304   6000   5295

filesize in %
#    bin    rle  wvl-f  wvl-s LZMV25     tc LZMV4k rnucru nucrun  bb2.0 bitfir doynax  LZMPi exomem   pu-f exoraw
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
1  100.0   72.9   41.1   37.7   41.2   39.3   38.2   29.3   29.3   30.2   30.2   29.7   28.9   27.1   33.7   27.1
2  100.0   86.7   71.0   66.5   71.9   68.8   64.0   50.1   50.2   50.5   50.6   50.5   48.5   44.7   60.4   45.1
3  100.0   88.6   75.7   66.3   76.4   75.3   64.6   53.0   53.0   53.1   53.1   53.4   48.9   45.8   64.1   46.0
4  100.0   92.0   60.5   58.2   61.5   60.2   61.9   51.5   51.6   52.5   52.5   51.6   50.9   49.1   55.9   49.2
5  100.0   79.5   74.2   71.6   75.1   72.5   68.6   59.0   58.8   59.1   59.1   58.7   58.6   56.7   60.9   56.5
6  100.0   39.6   35.7   34.6   35.9   36.7   33.6   28.1   28.2   28.5   28.5   28.2   27.6   26.3   29.1   26.4
7  100.0   84.8   59.4   55.3   59.8   56.1   54.7   44.8   44.8   45.3   45.3   45.6   45.4   43.0   48.0   42.9
8  100.0   94.6   73.1   68.5   69.8   68.9   69.3   55.8   55.4   55.4   55.3   54.8   53.4   53.9   64.0   53.5
9  100.0   89.1   77.2   77.0   77.5   73.3   72.6   61.2   61.4   61.3   61.3   60.6   62.1   59.2   67.0   59.1
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
~  100.0   80.9   63.1   59.5   63.2   61.3   58.6   48.1   48.1   48.4   48.4   48.1   47.1   45.1   53.7   45.1
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------

number of frames to depack
#    bin    rle  wvl-f  wvl-s LZMV25     tc LZMV4k rnucru nucrun  bb2.0 bitfir doynax  LZMPi exomem   pu-f exoraw
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
1    0.0   11.5   13.5   14.5   13.7   15.5   16.9   21.3   22.0   24.3   24.4   27.5    0.0   57.9   54.7    0.0
2    0.0    5.5    7.5    7.5    8.4    9.5    9.6   13.6   13.7   15.1   15.1   17.5    0.0   38.5   39.6    0.0
3    0.0    4.5    6.5    6.5    6.2    7.5    7.1   10.1   10.0   10.9   10.9   12.5    0.0   28.5   31.9    0.0
4    0.0    8.5    9.5    9.5    9.4   10.5   11.5   17.1   16.6   18.1   18.1   20.5    0.0   53.1   52.0    0.0
5    0.0   36.5   39.5   42.5   46.6   59.5   58.4  100.6  101.4  107.7  107.4  119.5    0.0  295.9  298.6    0.0
6    0.0   20.5   25.5   25.5   38.2   37.5   35.8   56.6   57.9   61.5   62.0   49.5    0.0  142.3  152.8    0.0
7    0.0   22.5   25.5   26.5   29.1   32.5   35.0   50.0   50.3   54.5   54.5   60.5    0.0  139.2  139.8    0.0
8    0.0    6.5    8.5    8.5    8.9   10.5   10.3   15.8   15.6   16.9   17.0   18.5    0.0   44.8   47.7    0.0
9    0.0    9.5   12.5   12.5   14.2   16.5   16.6   27.2   27.1   29.6   29.4   32.5    0.0   78.9   81.7    0.0

kilobytes output per second
#    bin    rle  wvl-f  wvl-s LZMV25     tc LZMV4k rnucru nucrun  bb2.0 bitfir doynax  LZMPi exomem   pu-f exoraw
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
1          46.9   39.9   37.2   39.3   34.8   31.9   25.3   24.5   22.2   22.1   19.6           9.3    9.9       
2          44.3   32.5   32.5   29.0   25.6   25.4   17.9   17.7   16.1   16.1   13.9           6.3    6.1       
3          43.0   29.7   29.7   31.2   25.8   27.2   19.2   19.4   17.7   17.8   15.5           6.8    6.1       
4          40.4   36.2   36.2   36.5   32.7   29.9   20.1   20.6   18.9   19.0   16.8           6.5    6.6       
5          46.6   43.1   40.0   36.5   28.6   29.1   16.9   16.8   15.8   15.8   14.2           5.8    5.7       
6          75.5   60.7   60.7   40.5   41.3   43.2   27.3   26.7   25.2   25.0   31.3          10.9   10.1       
7          44.4   39.1   37.7   34.3   30.7   28.5   20.0   19.8   18.3   18.3   16.5           7.2    7.1       
8          43.0   32.9   32.9   31.4   26.6   27.2   17.7   18.0   16.5   16.4   15.1           6.2    5.9       
9          46.2   35.1   35.1   30.9   26.6   26.4   16.1   16.2   14.8   14.9   13.5           5.6    5.4       
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
~    0.0   47.8   38.8   38.0   34.4   30.3   29.9   20.1   20.0   18.4   18.4   17.4    0.0    7.2    7.0    0.0
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------

cycles per byte consumed
#    bin    rle  wvl-f  wvl-s LZMV25     tc LZMV4k rnucru nucrun  bb2.0 bitfir doynax  LZMPi exomem   pu-f exoraw
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
1    0.0   28.2   58.6   68.7   59.3   70.4   79.0  129.4  134.0  143.6  144.1  165.6    0.0  380.9  289.7    0.0
2    0.0   25.1   41.7   44.6   46.2   54.6   59.3  107.5  108.0  118.4  118.4  136.9    0.0  340.1  259.3    0.0
3    0.0   25.3   42.7   48.8   40.4   49.6   54.7   94.6   93.8  102.4  102.0  116.6    0.0  309.8  247.9    0.0
4    0.0   25.9   44.0   45.7   42.8   48.8   52.0   93.1   90.3   96.8   96.4  111.4    0.0  303.2  260.4    0.0
5    0.0   26.0   30.1   33.6   35.1   46.4   48.1   96.4   97.5  103.1  102.8  115.1    0.0  295.0  277.1    0.0
6    0.0   32.2   44.4   45.9   66.1   63.5   66.3  125.1  127.7  134.3  135.4  109.3    0.0  336.1  326.3    0.0
7    0.0   25.6   41.4   46.2   46.9   55.8   61.7  107.4  108.2  115.9  116.0  128.0    0.0  312.2  280.7    0.0
8    0.0   23.6   40.0   42.7   43.9   52.4   51.1   97.2   96.6  105.2  105.9  116.1    0.0  285.8  256.4    0.0
9    0.0   23.4   35.5   35.6   40.2   49.3   50.2   97.4   96.9  105.9  105.4  117.6    0.0  292.4  267.7    0.0
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------

2016-02-13 11:11

MagerValp

Registered: Dec 2001
Posts: 1074

Quote: But wouldn't that only work if we end with a match then? As long as the last thing to do is copying a literal this would still fail as with packers like Doynax, BB and similar, as one would have some control stream bits/bytes before that to signal a literal and give a length. If the following literal bytes would not overlap, the control bits would have been overwritten by the preceeding action and things would fail. In case things end with a match, all would be fine, only the copying of the match would then kill the already read control bits and the decruncher would end afterwards.

Yes, that's exactly the case. The trick is to ensure that the compressed stream ends with a match. If it ends with a literal, include the literal data without any control bytes at the correct location. Illustration here:

https://www.dropbox.com/s/9zttmj3avjm6krl/LZMV%20in%20place%20u..

Note that it packs backwards, so the start is at the top of memory and the ending literal "!" in the second example is at the bottom. Unpacking backwards enables dey/bpl copy loops for speed and lets you keep the load address. Speed for this should be on par with WVL's packers.

2016-02-13 11:24

MagerValp

Registered: Dec 2001
Posts: 1074

@ChristopherJam: Lovely! The pattern I see here is that there are three speed classes. You have the fast LZSS variants (WVL, LZMV), you have the efficient but slow LZ + Huffman (exo, pu), and in the middle you have nu/bb2/bitfire/doynax.

It also makes it obvious that LZMV (and maybe tc) needs speed optimization to be competitive :)

2016-02-13 11:37

Bitbreaker

Registered: Oct 2002
Posts: 504

Thanks MagerValp, that was the tiny bit of info that was missing, in deed one can leave the last literal just there without any more control bits :-D As an extra, when we are guaranteed to end with a match, it is also sufficient to do the end check on matches only, as before with that maximum run as an eof marker. It will however add a few more bytes to the decruncher, but speedwise it would be mostly as fast as a tay + beq as eof detection by having a cmp + beq (in most cases that is, as we would require to check on 16 bit in case)
I'll give this a try the next days :-)

2016-02-13 12:05

MagerValp

Registered: Dec 2001
Posts: 1074

Go for it! :)

And Excel just crashed on me while I was trying to make a pretty scatter plot of the results :/

2016-02-13 12:31

Martin Piper

Registered: Nov 2007
Posts: 718

Very slow LZMPi decompression cycles:
1.bin 1558777
2.bin 1044272
3.bin 813537
4.bin 1482657
5.bin 8471668
6.bin 4323352
7.bin 4044852
8.bin 1278691
9.bin 2307100

Hey, it's optimised for space not speed. :D

2016-02-13 12:55

JackAsser

Registered: Jun 2002
Posts: 2014

It's almost as if BB2 and Bitfire is the same code! :)

2016-02-13 13:40

ChristopherJam

Registered: Aug 2004
Posts: 1408

Quoting Martin Piper

Very slow LZMPi decompression cycles:

Hey, it's optimised for space not speed. :D

Ah, thank you!

First couple of speed related tables:

number of frames to depack
#    bin    rle  wvl-f  wvl-s LZMV25     tc LZMV4k rnucru nucrun  bb2.0 bitfir doynax exomem   pu-f  LZMPi exoraw
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
1    0.0   11.5   13.5   14.5   13.7   15.5   16.9   21.3   22.0   24.3   24.4   27.5   57.9   54.7   79.3    0.0
2    0.0    5.5    7.5    7.5    8.4    9.5    9.6   13.6   13.7   15.1   15.1   17.5   38.5   39.6   53.1    0.0
3    0.0    4.5    6.5    6.5    6.2    7.5    7.1   10.1   10.0   10.9   10.9   12.5   28.5   31.9   41.4    0.0
4    0.0    8.5    9.5    9.5    9.4   10.5   11.5   17.1   16.6   18.1   18.1   20.5   53.1   52.0   75.4    0.0
5    0.0   36.5   39.5   42.5   46.6   59.5   58.4  100.6  101.4  107.7  107.4  119.5  295.9  298.6  431.0    0.0
6    0.0   20.5   25.5   25.5   38.2   37.5   35.8   56.6   57.9   61.5   62.0   49.5  142.3  152.8  220.0    0.0
7    0.0   22.5   25.5   26.5   29.1   32.5   35.0   50.0   50.3   54.5   54.5   60.5  139.2  139.8  205.8    0.0
8    0.0    6.5    8.5    8.5    8.9   10.5   10.3   15.8   15.6   16.9   17.0   18.5   44.8   47.7   65.1    0.0
9    0.0    9.5   12.5   12.5   14.2   16.5   16.6   27.2   27.1   29.6   29.4   32.5   78.9   81.7  117.4    0.0

kilobytes output per second
#    bin    rle  wvl-f  wvl-s LZMV25     tc LZMV4k rnucru nucrun  bb2.0 bitfir doynax exomem   pu-f  LZMPi exoraw
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
1          46.9   39.9   37.2   39.3   34.8   31.9   25.3   24.5   22.2   22.1   19.6    9.3    9.9    6.8       
2          44.3   32.5   32.5   29.0   25.6   25.4   17.9   17.7   16.1   16.1   13.9    6.3    6.1    4.6       
3          43.0   29.7   29.7   31.2   25.8   27.2   19.2   19.4   17.7   17.8   15.5    6.8    6.1    4.7       
4          40.4   36.2   36.2   36.5   32.7   29.9   20.1   20.6   18.9   19.0   16.8    6.5    6.6    4.6       
5          46.6   43.1   40.0   36.5   28.6   29.1   16.9   16.8   15.8   15.8   14.2    5.8    5.7    3.9       
6          75.5   60.7   60.7   40.5   41.3   43.2   27.3   26.7   25.2   25.0   31.3   10.9   10.1    7.0       
7          44.4   39.1   37.7   34.3   30.7   28.5   20.0   19.8   18.3   18.3   16.5    7.2    7.1    4.9       
8          43.0   32.9   32.9   31.4   26.6   27.2   17.7   18.0   16.5   16.4   15.1    6.2    5.9    4.3       
9          46.2   35.1   35.1   30.9   26.6   26.4   16.1   16.2   14.8   14.9   13.5    5.6    5.4    3.7       
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
~    0.0   47.8   38.8   38.0   34.4   30.3   29.9   20.1   20.0   18.4   18.4   17.4    7.2    7.0    4.9    0.0
- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------

2016-02-13 13:45

ChristopherJam

Registered: Aug 2004
Posts: 1408

Quoting MagerValp

@ChristopherJam: Lovely! The pattern I see here is that there are three speed classes. You have the fast LZSS variants (WVL, LZMV), you have the efficient but slow LZ + Huffman (exo, pu), and in the middle you have nu/bb2/bitfire/doynax.

It also makes it obvious that LZMV (and maybe tc) needs speed optimization to be competitive :)

Yes, I've done some quick graphs in Matplotlib (no lables yet) and there's some quite pronounced clustering.

exomem, pu-f & LZMPi are all below 10k per second, with exomem by far the smallest.

rle is also in a class of it's own; definite winner for speed if only light compression is required.

Quoting JackAsser

It's almost as if BB2 and Bitfire is the same code! :)

Yes, they are remarkably close! I suspect identical encoding for a start

2016-02-13 14:15

Burglar

Registered: Dec 2004
Posts: 1088

2016-02-14 17:32

Fungus

Registered: Sep 2002
Posts: 680

An how about a comparison of uncompressor size?

2016-02-14 18:13

ChristopherJam

Registered: Aug 2004
Posts: 1408

Size is definitely pertinent. nucrunch is hardly small at the moment.

nucrunch   342
rnucrunch  394
tinycrunch 151

2016-02-14 19:52

HCL

Registered: Feb 2003
Posts: 727

Comparing B2 and Bitfire.. The encoders are generating the same format, however not compatible. It has been the same since the first release of ByteBoozer1.0. ByteBoozer2.0 and Bitfire (and probably most others today) are searching through *all* possible ways to win bits, where as Bb1.0 has a faster gready algorithm. Remember it was running on the c64 also(!), but that's why B2 compresses better. B2 and Bitfire encoders do have rather different ways of achieving their goals, which explains the small differences we see in compression rate.

B2 and Bitfire decrunchers are more identical than the encoders. I was working like hell to find another code design than Bitbreaker's that still wasn't slower, but i guess he already examined all those possibilities before me :). So i guess we sort of agree on this rather optimal way of decrunching lz-code.. Now i'm very qurious on what new tricks Nucrunch is bringing up to the surface :).

2016-02-14 22:25

MagerValp

Registered: Dec 2001
Posts: 1074

LZMV4k decruncher is 137 bytes, LZMV256 is 89 bytes.

2016-02-14 23:43

Martin Piper

Registered: Nov 2007
Posts: 718

Depends if you're including all the post BASIC setup and memory moving code.

2016-02-14 23:48

ChristopherJam

Registered: Aug 2004
Posts: 1408

Quoting Martin Piper

Depends if you're including all the post BASIC setup and memory moving code.

Just the decrunch engine; assume something else has loaded the data/set $01/cleared interrupts etc and the source and destination areas placed so as to not need special handling.

2016-02-15 05:51

lft

Registered: Jul 2007
Posts: 369

Quoting MagerValp

The trick is to ensure that the compressed stream ends with a match. If it ends with a literal, include the literal data without any control bytes at the correct location.

That is absolutely brilliant!

2016-02-15 07:37

ChristopherJam

Registered: Aug 2004
Posts: 1408

Quoting lft

Quoting MagerValp
The trick is to ensure that the compressed stream ends with a match. If it ends with a literal, include the literal data without any control bytes at the correct location.

That is absolutely brilliant!

It is, rather. MagerValp, how were you planning to terminate the stream?

My EOF's over a byte long, and checking end address seems slow if it's not necessarily on a page boundary.

2016-02-15 07:43

ChristopherJam

Registered: Aug 2004
Posts: 1408

Quoting HCL

Comparing B2 and Bitfire.. The encoders are generating the same format, however not compatible. It has been the same since the first release of ByteBoozer1.0. ByteBoozer2.0 and Bitfire (and probably most others today) are searching through *all* possible ways to win bits, where as Bb1.0 has a faster gready algorithm. Remember it was running on the c64 also(!), but that's why B2 compresses better...

...Now i'm very qurious on what new tricks Nucrunch is bringing up to the surface :).

Impressive on the first one doing so well with a native compressor. But yes, I'm doing an exhaustive search too, with the full file rather than a sliding window. I pinched the idea of literals always being followed by matches from Doynax.

Source is out there now (NuCrunch 0.1), but I suspect most of the gains come from a combination of aggressive inlining, and splitting the bitstream into bits, bitpairs, and nybbles, so I can leave out a lot of the 'next byte?' tests.

2016-02-15 08:55

Trash

Registered: Jan 2002
Posts: 122

Quote: Quoting lft
Quoting MagerValp
The trick is to ensure that the compressed stream ends with a match. If it ends with a literal, include the literal data without any control bytes at the correct location.

That is absolutely brilliant!

It is, rather. MagerValp, how were you planning to terminate the stream?

My EOF's over a byte long, and checking end address seems slow if it's not necessarily on a page boundary.

Why use an EOF at all? Inlude how many bytes you should unpack in the header, for speed you could limit that to pages (the tradeoff would be that some data in the end would be overwritten)...

2016-02-15 09:11

Bitbreaker

Registered: Oct 2002
Posts: 504

As already posted earlier, it is sufficient to do any check only in case of a match, be it with an eof marker or by an end address. The check is not more expensive and can even be made without clobbering carry when being done with eor, but takes more size codewise. I have made tests with zero overlap on bitnax, but fail yet on some files with overwriting myself :-/

2016-02-15 09:46

ChristopherJam

Registered: Aug 2004
Posts: 1408

Well yes, I was assuming only checking in case of a match Intriguing that end addr test can be done at zero cost in cycles.

The clobbering cases - is it because of the match spec crossing a byte boundary?

2016-02-15 09:59

ChristopherJam

Registered: Aug 2004
Posts: 1408

Quoting Trash

(the tradeoff would be that some data in the end would be overwritten)...

Oh yes, aware of that. I want an exact endpoint so I can implement MagerValp's end-alignment, where not even the crunched data passes the endpoint.

2016-02-15 12:24

enthusi

Registered: May 2004
Posts: 677

The cruncher of choice obviously depends on the environment as well. In particular the speed with which the data is loaded into memory. Which fastloader, tape, cart...
Maybe someone adds certain loading-bitrates to the graphs? ;-)
TurboTape is ~ 450 Byte/sec.
1541 ROM: ~ 410 Byte/sec
Cart (lda $8000,x; sta $0800,x) ~ 160 KB/sec

(i.e. for Caren I now used page aligned ,x loops of RAW data which is still faster than RLE with byte-at-boundary-check. A dedicated RLE that never crosses banked in pages should be slighly faster)

2016-02-15 13:09

Bitbreaker

Registered: Oct 2002
Posts: 504

Quote: Well yes, I was assuming only checking in case of a match Intriguing that end addr test can be done at zero cost in cycles.

The clobbering cases - is it because of the match spec crossing a byte boundary?

Well, it is possible that the read_pointer crosses the write_pointer already before the last literal/match, at least then, when we encode stuff with variable bitlengths. I did a working prototype (more tests pending) to get the cruncher to spit out a final binary blob as soon as this happens. Thus some files tend to become bigger, others where this works out, get usually 2 bytes smaller.
As for the decruncher a few things need to be changed, in my case i can forgo on the terminator check:

        tay
        beq .lz_end_of_file

Therefore i check when i add the match length to the destination pointer:

        tya
        adc .lz_dst
        sta .lz_dst
        bcc .lz_end_low
.lz_maximum
        inc .lz_dst+1
        lda .lz_dst
.lz_end_low
        cmp #$00
.lz_skip_poll
        bne .lz_poll
        lda .lz_dst+1
.lz_end_hi
        cmp #$00
        bne .lz_skip_end
        rts
.lz_poll

However you also can't rely anymore on the crunchers EOF test, so you need to continue loading after the cruncher returns to be sure any remaining binary blob is still loaded if it goes over a block boundary, until the loader terminates with an EOF. The later of course only applies if you depack with on the fly loading.

2016-02-15 13:51

ChristopherJam

Registered: Aug 2004
Posts: 1408

Ah of course - you're replacing the EOF check with something of (99.6% of the time) equal cost.

Interesting point about streaming loaders, but surely in that case you'd just be loading to a one page buffer rather than loading to destination address?

2016-02-15 14:44

Bitbreaker

Registered: Oct 2002
Posts: 504

I never used a buffer but was always depacking in place, yet however still with a small overlap.

2016-02-15 19:55

Bitbreaker

Registered: Oct 2002
Posts: 504

Here is a first experimental test. At least my usual benchmark stuff passes, also overlap for normal cases decreased, possible that the the old algorithm did something wrong (or the new does, haha)
As said, highly experimental, hurt yourself at your own risk. One might want to feed more files to the testsuite in the benchmark folder. The checksums + sizes and loadaddresses are not automated yet, some sed wizardry should help out there soon.
Now, with the end address check the non overflowing case on the dst-pointer addition is favoured and thus one cycled saved compared to the standard version.

2016-03-09 11:28

Bitbreaker

Registered: Oct 2002
Posts: 504

There's one problem with the in place decompression arising:
If you happen to have the source data not in place but at some other location in mem, the last literal will not be copied, unless you include it as a literal sequence into the control stream. Then however, if a file ends with a literal, the end check is missed when it is only done upon matches. Means: one has to either add another bogus match which makes the depacked data 2 bytes longer, or test on both cases, what bloats and slows down the depacker. Also The old sentinel could be used for that.

2016-03-09 12:59

ChristopherJam

Registered: Aug 2004
Posts: 1408

Well if the source data is not in place but at some other location in mem, then you're not doing in place decompression :P

2016-03-09 13:02

Bitbreaker

Registered: Oct 2002
Posts: 504

Sure, but one might want to use the same decruncher code for both purposes, for e.g. in a demo ;-)

2016-03-09 13:16

ChristopherJam

Registered: Aug 2004
Posts: 1408

Fair point!

2016-03-09 13:19

Bitbreaker

Registered: Oct 2002
Posts: 504

For now, i decided to add the check back in, that ends decrunching after a 257 byte match is decoded. This decreases speed slightly, but i have made other optimizations to compensate for more than that.

2024-09-16 17:33

Martin Piper

Registered: Nov 2007
Posts: 718

LZMPi with -cu option:
https://github.com/martinpiper/C64Public/tree/master/bin

bytes
1 3,143
2 2,372
3 1,912
4 3,524
5 19,036
6 8,147
7 8,809
8 2,912
9 5,207

2024-09-17 16:14

ChristopherJam

Registered: Aug 2004
Posts: 1408

Oh, thanks for posting those.

How many cycles to decrunch? (or, speed in bytes per second produced or consumed, I can work out the rest from there)

2024-09-18 03:33

Martin Piper

Registered: Nov 2007
Posts: 718

| file | cycles |
| 1.bin | 420115 |
| 2.bin | 264355 |
| 3.bin | 199139 |
| 4.bin | 311702 |
| 5.bin | 1930613 |
| 6.bin | 1063796 |
| 7.bin | 954401 |
| 8.bin | 288310 |
| 9.bin | 515944 |

I've been porting this to the Amiga for a self extracting executable, so got around to testing the cycle times for this data: https://github.com/martinpiper/C64Public/blob/master/Decompress..

Refresh

Subscribe to this thread: