[CSDb] - User Forums - Release id #214940 : TSCrunch

You are not logged in - nap

CSDb User Forums

Forums > CSDb Entries > Release id #214940 : TSCrunch

2022-02-28 10:11

Krill

Registered: Apr 2002
Posts: 3098

Release id #214940 : TSCrunch

Quoting tonysavon

"TSCrunch is an optimal, byte-aligned, LZ+RLE hybrid encoder, designed to maximize decoding speed on NMOS 6502 and derived CPUs, while achieving decent compression ratio (for a bytecruncher, that is). It crunches as well as other popular bytecrunchers, while being considerably faster at decrunching."
[...]
TSCrunch is a bytepacker, with 2byte RLE, 1-2byte LZ tokens and a 512 bytes search window. In this "space" it provides the optimal solution to the puzzle. Exomizer is s different beast, being a bit-cruncher.

According to these specs, i'd expect it to fall somewhere into the 60% cluster in the graph below, from https://codebase64.org/doku.php?id=base:compression_benchmarks.

If it is made for in-memory decompression of data, can it also work with prefix data for better compression, i.e., back-referencing to data already in memory (either a static dictionary or, e.g., the previous level)?

And i can't quite follow what "bit-cruncher" vs "byte-cruncher" means. :)

IIRC, Exomizer works on byte-aligned source data as well, also with LZ and RLE.
It produces a bit-stream on the control symbols, though, which is interleaved with byte-aligned literals.

A "bit-cruncher" might maybe be something LZMA- or ANS-like (think ALZ64, or Shrinkler on Amiga), but not Exomizer.

... 31 posts hidden. Click here to view all posts....

2022-03-09 20:30

Bitbreaker

Registered: Oct 2002
Posts: 510

It shows again, that the loader usually is the bottleneck, and it has data steadily available to transfer on higher CPU loads, as the floppy can keep up. The bottleneck is also becoming oobvious as TS should perform way better when seeing the plain depack speed figures. Observed the same when comparing zx0 with bitnax, it yields better speed and ratio, but figures for loadcomp do not increase as much as plain decomp. That is why i decided to go for ration instead of a speedgain, as you pay that by lots of blocks, and precious diskspace is also a thing to keep an eye on with demos. zx0 gives 121 free extra blocks when applied to c=bit'18.

2022-03-09 22:09

Sparta

Registered: Feb 2017
Posts: 52

Great cruncher, @Tony and promising numbers indeed, @Krill. :) Are these from real HW of VICE? Can you please share the total number of frames required to load the whole benchmark for each cruncher (100% CPU availability would do it). Thanks!

2022-03-09 22:29

Krill

Registered: Apr 2002
Posts: 3098

Quoting Sparta

Are these from real HW of VICE? Can you please share the total number of frames required to load the whole benchmark for each cruncher (100% CPU availability would do it). Thanks!

VICE, i just needed some numbers quickly. =)

Equivalent figures with number of video frames:

CPU% ZX0  TS   WIN
100  03a4 03b8 ZX0
 90  03fd 03d7 TS
 80  0512 0523 ZX0
 70  05d3 05ba TS
 60  068b 0664 TS
 50  07b3 07bc ZX0
 40  0905 0850 TS
 30  0c14 0a8f TS
 20  11da 0f12 TS
 10  2702 202b TS

Formula to get troughput as above is 185526 * 50 / numframes = X B/s.

2022-03-09 22:45

Sparta

Registered: Feb 2017
Posts: 52

Thanks!

2022-03-09 23:38

Krill

Registered: Apr 2002
Posts: 3098

Some more WIP results. :)

Spindle-Code:
CPU% ZX0  TS   WIN
100  7380 7002 ZX0
 90  6580 6770 TS
 80  5120 5425 TS
 70  4668 4722 TS
 60  4117 4539 TS
 50  3378 3562 TS
 40  2782 3225 TS
 30  2056 2564 TS
 20  1321 1756 TS
 10   603  817 TS

Spindle-Graphics:
CPU% ZX0  TS   WIN
100  9461 8618 ZX0
 90  8450 8618 TS
 80  6520 6173 ZX0
 70  5861 5440 ZX0
 60  5259 5259 TIE
 50  4288 4396 TS
 40  3524 3868 TS
 30  2552 3033 TS
 20  1687 2019 TS
 10   778  952 TS

Remarkably, TSCrunch wins practically all real-world scenarios with the former benchmark, and most (again) in the latter.

2022-03-09 23:51

Burglar

Registered: Dec 2004
Posts: 1137

should've been called tonycrunch ;)

2022-03-10 08:38

tonysavon

Registered: Apr 2014
Posts: 27

Quote: Some more WIP results. :)

Spindle-Code: CPU% ZX0 TS WIN 100 7380 7002 ZX0 90 6580 6770 TS 80 5120 5425 TS 70 4668 4722 TS 60 4117 4539 TS 50 3378 3562 TS 40 2782 3225 TS 30 2056 2564 TS 20 1321 1756 TS 10 603 817 TS Spindle-Graphics: CPU% ZX0 TS WIN 100 9461 8618 ZX0 90 8450 8618 TS 80 6520 6173 ZX0 70 5861 5440 ZX0 60 5259 5259 TIE 50 4288 4396 TS 40 3524 3868 TS 30 2552 3033 TS 20 1687 2019 TS 10 778 952 TS
Remarkably, TSCrunch wins practically all real-world scenarios with the former benchmark, and most (again) in the latter.

Thanks for putting some time on these tests, krill. Happy to see TSCrunch being an option for fast decrunching in productions where speed is pivotal. I guess you can read these results also the other way around, like if you need a lot of CPU time for your parts but you still want to be able to perform some loading in the background, guaranteeing a certain throughput, then this is a viable option. Animations spring to mind. Of course if you want to ammass as much data as possible on a floppy side, zx0 remains the goat. Such a neat cruncher it is!

2022-03-10 09:14

Krill

Registered: Apr 2002
Posts: 3098

Quoting tonysavon

if you need a lot of CPU time for your parts but you still want to be able to perform some loading in the background, guaranteeing a certain throughput, then this is a viable option. Animations spring to mind.

Still not entirely convinced that media data should be packed with general-purpose non-lossy compression.
If anything, this would be some kind of first stage of the decoder, which would then perform more data shuffling until output.
Seems like this kind of thing should be integrated in a custom codec, which would also use the loader's pollblock and getblock calls when loading.

2022-03-10 09:57

tonysavon

Registered: Apr 2014
Posts: 27

Quote: Quoting tonysavon
if you need a lot of CPU time for your parts but you still want to be able to perform some loading in the background, guaranteeing a certain throughput, then this is a viable option. Animations spring to mind.
Still not entirely convinced that media data should be packed with general-purpose non-lossy compression.
If anything, this would be some kind of first stage of the decoder, which would then perform more data shuffling until output.
Seems like this kind of thing should be integrated in a custom codec, which would also use the loader's pollblock and getblock calls when loading.

Well, sometimes you must do both. Speaking only for my parts and my games here, but I usually do Video animations with Vector Quantisation plus delta coding. So you basically end up with a codebook (typically a charset), and this is the lossy part, and deltas for the position of the char data on screen. Now, that, in itself, is a bitstream that can be crunched but must be crunched losslessly. That data you might want to stream from disk, but you have to be really fast, and this is where a fast decruncher helps, especially if you must guarantee a certain throughput to always have the next delta-encoded video frame in ram when the frame counter ticks.
Same for audio, really, I usually do vector quantisation on a fixed audio frame: You keep the codebooks in ram, stream the payload from disk or from wherever you want to do it and that's it. Maybe you refresh the codebook every now and then for very long samples. If you choose a 8-window for audio, then the sample is compressed ~8x PLUS whatever the Cruncher gives you for the payload on top of it. And again, there's not much rastertime left if you are decoding the lossy part and playing the digi, so the faster the loader+decruncher, the higher you can go for playing frequency.
So you are right, one uses a two-stage decoder but the second stage must be lossless, fast, and (now) integrated with your loader :-).

2022-03-10 10:52

Krill

Registered: Apr 2002
Posts: 3098

Quoting tonysavon

[...] and this is where a fast decruncher helps, especially if you must guarantee a certain throughput to always have the next delta-encoded video frame in ram when the frame counter ticks.
Same for audio [...]

I see that part of the codec may well be non-lossy. :) Just meant that a tightly-coupled integration of decompression and further decoding seems more performant than two separate stages.

But that receive-input-in-time guarantee isn't required in a real-time streaming setup if the codec is aware of eventual delayed input.
The decoder could skip over stale frames (audio or video) already before the decompression stage, if that's deemed better than simply stalling.

Previous - 1 | 2 | 3 | 4 | 5 - Next

Refresh

Subscribe to this thread: