| |
Kabuto Account closed
Registered: Sep 2004 Posts: 58 |
Theoretically optimal 1541 GCR encoding scheme
I did some research on GCR encoding and ended up getting 7 data bits stored into 8 data bits each without violating GCR constraints (no 3 "0" bits in a row and no 10 "1" bits in a row).
This is barely above the theoretical limit.
And interestingly decoding is very fast using this scheme - but encoding is a more difficult.
This scheme uses arithmetic coding similar to what arithmetic packers use, just the other way round (storing constant-entropy 7-bit values in variable-entropy GCR bytes).
Here's a simple javascript implementation to show the idea:
http://kabuto.untergrund.net/files/c64stuff/gcrTest.js
EDIT: here's a sketched decoder loop, lacking drive sync stuff, though, that reads and decodes 8 raw bytes to 7 data bytes.
; initially A contains the arithmetical decoder state (up to 7 bits) and C might contain a carry to be added to it
loop:
LDX databyte
ADC ari_lo, X ; these table entries contain 7 bits each, too - erroneous entries contain $FF and thus might set C
ROL ; since we used bits 0..6 so far bit 7 contains the carry. Shift it into the C flag.
; (And shift the C flag, which is hopefully set upon errors due to adding $FF (TODO change offsets to ensure that's actually the case), into the now-unused bit 0 of tmp.)
STA tmp ; Don't store the first decoded 7 bits but use them for filling the next 7-bit values
LDA ari_hi, X ; Fetch upper 7 arith bits
; repeat the following block 7 times
LDX databyte
ADC ari_lo, X ; Same here - add 7 lower bits of table entry for GCR byte
ROL tmp ; Get a data bit and store error bit
; in the last iteration tmp contains error flags from each of the 8 steps so we could thus BNE to an error handler
ROL ; Combine the 7 decoded data bits with the additional bit, also push the carry from bit 7 to flag C
STA result+i, Y ; and store decoded byte
LDA ari_hi, X ; Get 7 upper bits of table entry
; end repeat
|
|
... 16 posts hidden. Click here to view all posts.... |
| |
chatGPZ
Registered: Dec 2001 Posts: 11386 |
where did you get the impression this is somehow related to "GCR chip" (whatever that is) ? |
| |
Kisiel Account closed
Registered: Jul 2003 Posts: 56 |
GCR conversion is inside chip to 1541 header. In other word 1541 Ultimate does not emulate "header". Sorry for short cut, I thought this topic is not "academic" implementation but the real one. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11386 |
WAT
its about writing software dude. |
| |
Kisiel Account closed
Registered: Jul 2003 Posts: 56 |
shame on me :) |
| |
doynax Account closed
Registered: Oct 2004 Posts: 212 |
I told myself I'd work out how this scheme works but I think it's time to admit defeat. For the record it seems to work in practice, though admittedly I haven't yet to managed to dump a G64 image.
Quoting KabutoThis scheme uses arithmetic coding similar to what arithmetic packers use, just the other way round (storing constant-entropy 7-bit values in variable-entropy GCR bytes). You say that as if the details ought to naturally follow from the realization of (ab-)using arithmetic coding in reverse but there seems to be some deep voodoo going on here. Hell, the 8-bit repacking code alone is remarkable as far as I'm concerned.
My first impulse would be to use a regular range coder. A standard implementation should certainly work but would be dog-slow. With care multiplication and dynamic renormalization may apparently be avoided [1] albeit at the cost of huge tables (they even carry overflows which would seem to defeat the purpose of the fixed-rate refinement.)
A second look at your code reveals that you're not actually doing traditional arithmetic coding at all but rather some backwards LIFO variant. Most everything on the subject seems to be hiding behind IEEE/ACM paywalls but [2] contains a quick recap on enumerative coding in this context.
Apparently enumerative coding is the fancy name for generating all valid "GCR" strings in order and interpreting the binary data to be encoded as an index into this set. The clever bit is that for simple run-length limited codes (e.g. at most two zeroes in a row in our case) the number of possible suffixes codes following a one-bit depends only on the bits remaining to be filled. Therefore all possible strings above a certain sequence may be counted simply by adding up a fixed weight per bit position for every zero encountered.
In our case the weight table is the tribonacci series for which the ratio between terms tends to ~1.83929. Rounding down to 7 bits/byte and applying some fixed-point the approximate implementation eventually reduces down to your code with an alternate table and appears to work well enough. Unfortunately there just isn't enough redundancy to avoid long strings of ones even with your brute-force encoder.
Of course your actually table looks nothing like this, for one thing it is ordered by the number of trailing zeroes/ones.
Any chance you might enlighten us as to how this thing actually works?
1. "A general fixed-rate arithmetic coding method for constrained channels" (Todd, Langdon & Martin)
2. "Effects of floating-point arithmetic in enumerative coding" (Immink & Jansen) |
| |
Krill
Registered: Apr 2002 Posts: 2980 |
Quoting doynaxAny chance you might enlighten us as to how this thing actually works?
But seriously, i'm interested to know, too. Honestly haven't spent much time and thought on the topic yet, but if even a coder of your calibre fails to wrap his head around it... *returns to thesis at hand* |
| |
Kabuto Account closed
Registered: Sep 2004 Posts: 58 |
I'll try to explain how this is working later, first I need to get some demo code done.
Quote:Unfortunately there just isn't enough redundancy to avoid long strings of ones even with your brute-force encoder.
There is enough redundandy for that. This encoding scheme never generates more than 8 "1"s in a row (the technical limit would be 9 "1"s in a row, but permitting just 8 ones still leaves a tiny bit of redundancy and it simplifies the scheme - with 9 ones, if you have a "11111111" byte, you'd have to look at the byte before that to determine whether you may write another 1. With 8 ones that problem doesn't exist.) |
| |
Kabuto Account closed
Registered: Sep 2004 Posts: 58 |
I finally wrote down an explanation, hope it'll be halfway understandable :)
http://kabuto.untergrund.net/files/c64stuff/gcr.txt |
| |
Flavioweb
Registered: Nov 2011 Posts: 463 |
Quoting gcr.txtFirst of all I wrote down all the bytes one could write to disk without violating the rules above, or rather their bit patterns:
00100100
00100101
00100110
00100111
00101001
00100101
If we write on disk in sequence "00100100", "00100110" or other values which "loop" with "0-00" or "00-0", we are violating the rules, or not? |
| |
Kabuto Account closed
Registered: Sep 2004 Posts: 58 |
Quote: Quoting gcr.txtFirst of all I wrote down all the bytes one could write to disk without violating the rules above, or rather their bit patterns:
00100100
00100101
00100110
00100111
00101001
00100101
If we write on disk in sequence "00100100", "00100110" or other values which "loop" with "0-00" or "00-0", we are violating the rules, or not?
Yeah, we would violate the rules if we did.
Should have mentioned that I mean isolated bytes, i.e. without considering neighbour bytes. |
Previous - 1 | 2 | 3 - Next |