[CSDb] - User Forums - Theoretically optimal 1541 GCR encoding scheme

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Theoretically optimal 1541 GCR encoding scheme

2012-11-14 00:28

Kabuto
Account closed

Registered: Sep 2004
Posts: 58

Theoretically optimal 1541 GCR encoding scheme

I did some research on GCR encoding and ended up getting 7 data bits stored into 8 data bits each without violating GCR constraints (no 3 "0" bits in a row and no 10 "1" bits in a row).

This is barely above the theoretical limit.

And interestingly decoding is very fast using this scheme - but encoding is a more difficult.

This scheme uses arithmetic coding similar to what arithmetic packers use, just the other way round (storing constant-entropy 7-bit values in variable-entropy GCR bytes).

Here's a simple javascript implementation to show the idea:

http://kabuto.untergrund.net/files/c64stuff/gcrTest.js

EDIT: here's a sketched decoder loop, lacking drive sync stuff, though, that reads and decodes 8 raw bytes to 7 data bytes.

	; initially A contains the arithmetical decoder state (up to 7 bits) and C might contain a carry to be added to it
loop:
	LDX databyte
	ADC ari_lo, X  ; these table entries contain 7 bits each, too - erroneous entries contain $FF and thus might set C
	ROL            ; since we used bits 0..6 so far bit 7 contains the carry. Shift it into the C flag.
                       ; (And shift the C flag, which is hopefully set upon errors due to adding $FF (TODO change offsets to ensure that's actually the case), into the now-unused bit 0 of tmp.)
	STA tmp        ; Don't store the first decoded 7 bits but use them for filling the next 7-bit values
	LDA ari_hi, X  ; Fetch upper 7 arith bits

	; repeat the following block 7 times

	LDX databyte
	ADC ari_lo, X  ; Same here - add 7 lower bits of table entry for GCR byte
	ROL tmp	       ; Get a data bit and store error bit
	; in the last iteration tmp contains error flags from each of the 8 steps so we could thus BNE to an error handler
	ROL             ; Combine the 7 decoded data bits with the additional bit, also push the carry from bit 7 to flag C
	STA result+i, Y ; and store decoded byte
	LDA ari_hi, X   ; Get 7 upper bits of table entry
	
	; end repeat

2012-11-14 02:23

Krill

Registered: Apr 2002
Posts: 2851

Impressive.

Is that initial decoder state (let me call it seed) required for each encoded 8-byte chunk, or for a whole block?
If the former, then the actual ratio is 7/9, which is slightly worse than GCR's 4/5. If the latter, then i guess those additional 7 bits to determine a whole block can very well be optimized away.

What is the actual theoretical limit, as in entropy of a bitstream under the 3-zeros/10-ones condition? I had thought about computing it but never found a computationally feasible method, the naive approach being to count all valid bit combinations in a chunk of N bytes and to log2 that count.

2012-11-14 07:24

Oswald

Registered: Apr 2002
Posts: 5022

I've no idea how that works, but impressively short indeed. trick question: does it arrange the result so that the data can be sent in 2 bit style right away to c64 ? :)

2012-11-14 09:10

Krill

Registered: Apr 2002
Posts: 2851

Reordering and flipping the bits in a data byte does not change the coding scheme, so yeah, perfectly possible.

2012-11-14 21:47

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quoting Krill

What is the actual theoretical limit, as in entropy of a bitstream under the 3-zeros/10-ones condition? I had thought about computing it but never found a computationally feasible method, the naive approach being to count all valid bit combinations in a chunk of N bytes and to log2 that count.

Interesting problem. I believe it's about 7.01948 bits/byte.

Imagine recursively producing all legal codes via the naive approach. You'd always be in one out of eleven coder states deciding what the options for the next state (and bit) are. Except work backwards from the end of the track and memoize the counts for each possible state at the subsequent bit.

E.g.:

future[1..N] = 1;
for(bits = 1 to M) {
	present[O1] = future[O2] + future[Z1];
	present[O2] = future[O3] + future[Z1];
	present[O3] = future[O4] + future[Z1];
	present[O4] = future[O5] + future[Z1];
	present[O5] = future[O6] + future[Z1];
	present[O6] = future[O7] + future[Z1];
	present[O7] = future[O8] + future[Z1];
	present[O8] = future[O9] + future[Z1];
	present[O9] = future[Z1];
	present[Z1] = future[O1] + future[Z2];
	present[Z2] = future[O1];
	future = present;
}
result = log2(present[O1]) / bits;

At least I think that should do it.

Anyone care to show me how to solve it properly? That is analytically, for an infinite bitstream.

2012-11-14 22:49

Kabuto
Account closed

Registered: Sep 2004
Posts: 58

@doynax: that's what I did as well, just the other way round, but results are the same.

I don't think this is solveable analytically. To make an equation out of it multiply all left side parts of those assignments with a variable (to express that values grow constantly for every step taken). Also you should fix one of the values to 1 (to avoid 0 for everything but f being a valid solution). Example for maxima: algsys([O1*f = O2 + Z1,O2*f = O3 + Z1,O3*f = O4 + Z1,O4*f = O5 + Z1,O5*f = O6 + Z1,O6*f = O7 + Z1,O7*f = O8 + Z1,O8*f = O9 + Z1,O9*f = Z1,Z1*f = O1 + Z2,Z2*f = O1,O1=1],[f,O1,O2,O3,O4,O5,O6,O7,O8,O9,Z1,Z2]);

In fact I didn't use sequences of 9 '1' bits in a row for technical reasons (because when you get a GCR byte of value $FF you won't be able to determine whether you can add another '1' bit without looking at the byte before as well) so I get just 7.0078 bits per byte.

@Krill: The "initial decoder state" is required for each sequence of GCR-coded values which can have any length you want. You can think about it as follows: each GCR-coded byte contributes its information content to 2 adjacent raw 7-bit values each. So e.g. with 10 GCR bytes you get 9 raw 7-bit values and one incomplete value at start and end each.

2012-11-24 15:06

Urban Space Cowboy

Registered: Nov 2004
Posts: 45

Kabuto, could you please edit your first post so the code part isn't a hundred columns wide and causing the entire page to have to be scrolled horizontally?

2012-11-24 18:11

Frantic

Registered: Mar 2003
Posts: 1627

I believe you can't edit any posts on CSDb except the ones which are last in a thread.

2013-01-30 17:50

Kisiel
Account closed

Registered: Jul 2003
Posts: 56

Yes I can confirm, it works. I tested this idea near 1995 with my 1541-II drive (with burst). In 2013 there is no room for new hardware ideas ... Ultimate rules ;)

2013-01-30 18:07

chatGPZ

Registered: Dec 2001
Posts: 11130

isnt this a pure software issue (which applies to 1541u just aswell) ?

2013-01-30 19:02

Kisiel
Account closed

Registered: Jul 2003
Posts: 56

IMO 1541U have no GCR chip emulation it is only table of GCR value, stream of correct data for software with correct time (e.g.26cycles).
If Ultimate emulates GCR chip is possible to copy disk by nibbler (copy protected disk). I can not check it, I don't have 1541U.
My idea was to pack as much bytes as possible by changing GCR conversion table (as above by skipping "wrong" combination of bits), the Idea is correct, but for this days.... you can simply ask Gideon to change GCR table to none it will improve conversion time, loading time or "size" of track.
NO GCR ANY MORE !!! ;)

2013-01-30 19:26

chatGPZ

Registered: Dec 2001
Posts: 11130

where did you get the impression this is somehow related to "GCR chip" (whatever that is) ?

2013-01-30 20:35

Kisiel
Account closed

Registered: Jul 2003
Posts: 56

GCR conversion is inside chip to 1541 header. In other word 1541 Ultimate does not emulate "header". Sorry for short cut, I thought this topic is not "academic" implementation but the real one.

2013-01-30 20:53

chatGPZ

Registered: Dec 2001
Posts: 11130

WAT

its about writing software dude.

2013-01-30 21:02

Kisiel
Account closed

Registered: Jul 2003
Posts: 56

shame on me :)

2013-07-24 23:32

doynax
Account closed

Registered: Oct 2004
Posts: 212

I told myself I'd work out how this scheme works but I think it's time to admit defeat. For the record it seems to work in practice, though admittedly I haven't yet to managed to dump a G64 image.

Quoting Kabuto

This scheme uses arithmetic coding similar to what arithmetic packers use, just the other way round (storing constant-entropy 7-bit values in variable-entropy GCR bytes).

You say that as if the details ought to naturally follow from the realization of (ab-)using arithmetic coding in reverse but there seems to be some deep voodoo going on here. Hell, the 8-bit repacking code alone is remarkable as far as I'm concerned.

My first impulse would be to use a regular range coder. A standard implementation should certainly work but would be dog-slow. With care multiplication and dynamic renormalization may apparently be avoided [1] albeit at the cost of huge tables (they even carry overflows which would seem to defeat the purpose of the fixed-rate refinement.)

A second look at your code reveals that you're not actually doing traditional arithmetic coding at all but rather some backwards LIFO variant. Most everything on the subject seems to be hiding behind IEEE/ACM paywalls but [2] contains a quick recap on enumerative coding in this context.

Apparently enumerative coding is the fancy name for generating all valid "GCR" strings in order and interpreting the binary data to be encoded as an index into this set. The clever bit is that for simple run-length limited codes (e.g. at most two zeroes in a row in our case) the number of possible suffixes codes following a one-bit depends only on the bits remaining to be filled. Therefore all possible strings above a certain sequence may be counted simply by adding up a fixed weight per bit position for every zero encountered.

In our case the weight table is the tribonacci series for which the ratio between terms tends to ~1.83929. Rounding down to 7 bits/byte and applying some fixed-point the approximate implementation eventually reduces down to your code with an alternate table and appears to work well enough. Unfortunately there just isn't enough redundancy to avoid long strings of ones even with your brute-force encoder.

Of course your actually table looks nothing like this, for one thing it is ordered by the number of trailing zeroes/ones.

Any chance you might enlighten us as to how this thing actually works?

1. "A general fixed-rate arithmetic coding method for constrained channels" (Todd, Langdon & Martin)
2. "Effects of floating-point arithmetic in enumerative coding" (Immink & Jansen)

2013-07-25 08:15

Krill

Registered: Apr 2002
Posts: 2851

Quoting doynax

Any chance you might enlighten us as to how this thing actually works?

But seriously, i'm interested to know, too. Honestly haven't spent much time and thought on the topic yet, but if even a coder of your calibre fails to wrap his head around it... *returns to thesis at hand*

2013-07-25 18:00

Kabuto
Account closed

Registered: Sep 2004
Posts: 58

I'll try to explain how this is working later, first I need to get some demo code done.

Quote:

Unfortunately there just isn't enough redundancy to avoid long strings of ones even with your brute-force encoder.

There is enough redundandy for that. This encoding scheme never generates more than 8 "1"s in a row (the technical limit would be 9 "1"s in a row, but permitting just 8 ones still leaves a tiny bit of redundancy and it simplifies the scheme - with 9 ones, if you have a "11111111" byte, you'd have to look at the byte before that to determine whether you may write another 1. With 8 ones that problem doesn't exist.)

2016-01-20 18:45

Kabuto
Account closed

Registered: Sep 2004
Posts: 58

I finally wrote down an explanation, hope it'll be halfway understandable :)

http://kabuto.untergrund.net/files/c64stuff/gcr.txt

2016-01-21 19:55

Flavioweb

Registered: Nov 2011
Posts: 447

Quoting gcr.txt

First of all I wrote down all the bytes one could write to disk without violating the rules above, or rather their bit patterns:

00100100
00100101
00100110
00100111
00101001
00100101

If we write on disk in sequence "00100100", "00100110" or other values which "loop" with "0-00" or "00-0", we are violating the rules, or not?

2016-01-21 20:02

Kabuto
Account closed

Registered: Sep 2004
Posts: 58

Quote: Quoting gcr.txt
First of all I wrote down all the bytes one could write to disk without violating the rules above, or rather their bit patterns:

00100100
00100101
00100110
00100111
00101001
00100101

If we write on disk in sequence "00100100", "00100110" or other values which "loop" with "0-00" or "00-0", we are violating the rules, or not?

Yeah, we would violate the rules if we did.

Should have mentioned that I mean isolated bytes, i.e. without considering neighbour bytes.

2016-01-21 21:55

doynax
Account closed

Registered: Oct 2004
Posts: 212

Thank you! Now I finally "grok" it :)

Somehow I could never quite wrap my head around that the possible adjacency a suffix state (number of zeroes/ones) linearly adjoined to the next prefix state, so there is no need for carrying over explicit state. Brilliant!

The claim to being an optimal GCR encoding scheme is certainly false however as it is, in fact, not a grouped code at all.. Well, not unless anything with a fixed word-length like FM/MFM is to be considered technically "grouped".

Presumably another tricky bit was scaling it such that the width of every code ranges rounds down and therefore no input ends up being unrepresentable. Despite the limited precision and minimal extra entropy available as slack.

The "brute-force" encoder doesn't appear to be a practical problem for off-line encoding. There may be degenerate input but in practice it always seems to terminate after a shallow search. Much like carries in traditional arithmetic coding I should imagine.

Incidentally I can confirm that the coding works just fine on hardware, at least on the few drives which I have access to. Admittedly the inconvenience of foregoing standard tools and general compatibility nightmare doesn't make up for the space and speed savings in dropping standard CBM formatting (at most 40% or so) but it is still a lot of fun to tinker with.

2016-01-22 23:14

Kabuto
Account closed

Registered: Sep 2004
Posts: 58

Just rewrote the encoder, it uses no more recursion and runs much faster, you could even implement it directly on a C64 (though it still might be slightly too large for fitting into drive RAM):

http://kabuto.untergrund.net/files/c64stuff/gcrTest2.js

doynax: haha, totally overlooked that this is no longer GCR!

2016-01-23 05:04

Oswald

Registered: Apr 2002
Posts: 5022

how much more data does this allow on a disk?

2016-01-23 07:32

Hoogo

Registered: Jun 2002
Posts: 102

If you keep the current layout of sectors with 320 encoded bytes, then you will get 280 usable bytes instead of 256, about 10% more.

2016-01-23 07:35

ChristopherJam

Registered: Aug 2004
Posts: 1380

(7/8)/(5/4)=1.09375, so about 9%?

If you kept the header blocks the same and just replaced the 320 raw bytes in the data blocks, each block should grow from 256 encoded bytes to 280 encoded bytes.

(edit: @hoogo, snap!)

2016-01-23 13:39

Kabuto
Account closed

Registered: Sep 2004
Posts: 58

You could squeeze 25/23/21/20 sectors into each track depending on the speed zone in use, assuming you could keep overhead small, as opposed to the original encoding, so you'd get about 18% more disk space. Use tracks 36-40 or use higher speed zones than Commodore did and you'd get more space, but I think there was a reason Commodore chose the speed zones the way they did.

After all I wonder how useful this will really be. Potential benefits:

* Less disk sides for your demo. But OTOH there doesn't seem to be a "maximum number of disk sides" limit anymore. People often run demos from 1541 Ultimate & Co so having 3 disk sides instead of 2 doesn't make handling more difficult (like it does when you have to work with actual disks)
* "Flip Disk" less often - 15% to 20% less often.
* Load speed will be just a tiny bit faster than for GCR-based loaders. Comparing my proposed loader with LFT's existing loader: both loaders have comparable sizes, the only benefit of my routine would be that it would load a sector faster just because it's shorter on disk and it needs to change tracks less often, but that's just a small factor, in real life the advantage will be pretty small.

Compare that with disadvantages:

* .gcr is more difficult to handle than .d64. Both are handled equally well by emulators these days, but transferring .gcr to a standard disk drive requires a custom-made write/copy program (unless you have a drive with a parallel cable). It's not a big hurdle, but something that might be annoying when your favourite copier fails to copy a disk.

Refresh

Subscribe to this thread: