| | Kabuto Account closed
Registered: Sep 2004 Posts: 58 |
Theoretically optimal 1541 GCR encoding scheme
I did some research on GCR encoding and ended up getting 7 data bits stored into 8 data bits each without violating GCR constraints (no 3 "0" bits in a row and no 10 "1" bits in a row).
This is barely above the theoretical limit.
And interestingly decoding is very fast using this scheme - but encoding is a more difficult.
This scheme uses arithmetic coding similar to what arithmetic packers use, just the other way round (storing constant-entropy 7-bit values in variable-entropy GCR bytes).
Here's a simple javascript implementation to show the idea:
http://kabuto.untergrund.net/files/c64stuff/gcrTest.js
EDIT: here's a sketched decoder loop, lacking drive sync stuff, though, that reads and decodes 8 raw bytes to 7 data bytes.
; initially A contains the arithmetical decoder state (up to 7 bits) and C might contain a carry to be added to it
loop:
LDX databyte
ADC ari_lo, X ; these table entries contain 7 bits each, too - erroneous entries contain $FF and thus might set C
ROL ; since we used bits 0..6 so far bit 7 contains the carry. Shift it into the C flag.
; (And shift the C flag, which is hopefully set upon errors due to adding $FF (TODO change offsets to ensure that's actually the case), into the now-unused bit 0 of tmp.)
STA tmp ; Don't store the first decoded 7 bits but use them for filling the next 7-bit values
LDA ari_hi, X ; Fetch upper 7 arith bits
; repeat the following block 7 times
LDX databyte
ADC ari_lo, X ; Same here - add 7 lower bits of table entry for GCR byte
ROL tmp ; Get a data bit and store error bit
; in the last iteration tmp contains error flags from each of the 8 steps so we could thus BNE to an error handler
ROL ; Combine the 7 decoded data bits with the additional bit, also push the carry from bit 7 to flag C
STA result+i, Y ; and store decoded byte
LDA ari_hi, X ; Get 7 upper bits of table entry
; end repeat
|
|
... 20 posts hidden. Click here to view all posts.... |
| | doynax Account closed
Registered: Oct 2004 Posts: 212 |
Thank you! Now I finally "grok" it :)
Somehow I could never quite wrap my head around that the possible adjacency a suffix state (number of zeroes/ones) linearly adjoined to the next prefix state, so there is no need for carrying over explicit state. Brilliant!
The claim to being an optimal GCR encoding scheme is certainly false however as it is, in fact, not a grouped code at all.. Well, not unless anything with a fixed word-length like FM/MFM is to be considered technically "grouped".
Presumably another tricky bit was scaling it such that the width of every code ranges rounds down and therefore no input ends up being unrepresentable. Despite the limited precision and minimal extra entropy available as slack.
The "brute-force" encoder doesn't appear to be a practical problem for off-line encoding. There may be degenerate input but in practice it always seems to terminate after a shallow search. Much like carries in traditional arithmetic coding I should imagine.
Incidentally I can confirm that the coding works just fine on hardware, at least on the few drives which I have access to. Admittedly the inconvenience of foregoing standard tools and general compatibility nightmare doesn't make up for the space and speed savings in dropping standard CBM formatting (at most 40% or so) but it is still a lot of fun to tinker with. |
| | Kabuto Account closed
Registered: Sep 2004 Posts: 58 |
Just rewrote the encoder, it uses no more recursion and runs much faster, you could even implement it directly on a C64 (though it still might be slightly too large for fitting into drive RAM):
http://kabuto.untergrund.net/files/c64stuff/gcrTest2.js
doynax: haha, totally overlooked that this is no longer GCR! |
| | Oswald
Registered: Apr 2002 Posts: 5126 |
how much more data does this allow on a disk? |
| | Hoogo
Registered: Jun 2002 Posts: 105 |
If you keep the current layout of sectors with 320 encoded bytes, then you will get 280 usable bytes instead of 256, about 10% more. |
| | ChristopherJam
Registered: Aug 2004 Posts: 1423 |
(7/8)/(5/4)=1.09375, so about 9%?
If you kept the header blocks the same and just replaced the 320 raw bytes in the data blocks, each block should grow from 256 encoded bytes to 280 encoded bytes.
(edit: @hoogo, snap!) |
| | Kabuto Account closed
Registered: Sep 2004 Posts: 58 |
You could squeeze 25/23/21/20 sectors into each track depending on the speed zone in use, assuming you could keep overhead small, as opposed to the original encoding, so you'd get about 18% more disk space. Use tracks 36-40 or use higher speed zones than Commodore did and you'd get more space, but I think there was a reason Commodore chose the speed zones the way they did.
After all I wonder how useful this will really be. Potential benefits:
* Less disk sides for your demo. But OTOH there doesn't seem to be a "maximum number of disk sides" limit anymore. People often run demos from 1541 Ultimate & Co so having 3 disk sides instead of 2 doesn't make handling more difficult (like it does when you have to work with actual disks)
* "Flip Disk" less often - 15% to 20% less often.
* Load speed will be just a tiny bit faster than for GCR-based loaders. Comparing my proposed loader with LFT's existing loader: both loaders have comparable sizes, the only benefit of my routine would be that it would load a sector faster just because it's shorter on disk and it needs to change tracks less often, but that's just a small factor, in real life the advantage will be pretty small.
Compare that with disadvantages:
* .gcr is more difficult to handle than .d64. Both are handled equally well by emulators these days, but transferring .gcr to a standard disk drive requires a custom-made write/copy program (unless you have a drive with a parallel cable). It's not a big hurdle, but something that might be annoying when your favourite copier fails to copy a disk. |
| | Krill
Registered: Apr 2002 Posts: 3083 |
Just in case the links to Kabuto's stuff go dead (again), here is the content.
gcrTest.js - The original (slow but short) encoder with roundtrip:// This is Kabuto's 7-per-8-bit GCR encoding scheme. I hope I don't reinvent the wheel here :-)
// Problem with GCR values on disk is that entropy varies, also dependencies between adjacent bytes play a role.
// This scheme uses arithmetic coding similar to what arithmetic packers use, just the other way round (storing constant-entropy 7-bit values in variable-entropy GCR bytes).
// The decoder is very simple (as you'll see below) and should be capable of realtime decoding when implemented on 1541.
// One disadvantage, though: it's neccessary to store one extra GCR byte on disk to avoid arithmetical problems at both ends.
// Also, the encoder is quite complicated, but it can be optimized for sure.
// GCR decode table
var gcrMap = [-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,- 1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,2096,1142,3238,-1,1813,859,2955,339,2435,1481,357 7,-1,-1,705,2801,185,2281,1327,3423,-1,1998,1044,3140,524,2620,1666,3762,-1,-1,- 1,-1,-1,-1,-1,-1,-1,1767,813,2909,293,2389,1435,3531,-1,-1,659,2755,139,2235,128 1,3377,-1,1952,998,3094,478,2574,1620,3716,-1,-1,-1,-1,55,2151,1197,3293,-1,1868 ,914,3010,394,2490,1536,3632,-1,-1,760,2856,240,2336,1382,3478,-1,2053,1099,3195 ,579,2675,1721,3817,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,634,27 30,114,2210,1256,3352,-1,1927,973,3069,453,2549,1595,3691,-1,-1,-1,-1,30,2126,11 72,3268,-1,1843,889,2985,369,2465,1511,3607,-1,-1,735,2831,215,2311,1357,3453,-1 ,2028,1074,3170,554,2650,1696,3792,-1,-1,-1,-1,-1,-1,-1,-1,-1,1797,843,2939,323, 2419,1465,3561,-1,-1,689,2785,169,2265,1311,3407,-1,1982,1028,3124,508,2604,1650 ,3746,-1,-1,-1,-1,85,2181,1227,3323,-1,1898,944,3040,424,2520,1566,3662,-1,-1,79 0,2886,270,2366,1412,3508,-1,2083,1129,3225,609,2705,1751,3847];
// The decoder
function decodeGCR(gcrBytes) {
var result = [];
var gcrBuf = gcrMap[gcrBytes[0]];
for (var i = 1; i < gcrBytes.length; i++) {
gcrBuf = (gcrBuf >> 7) + gcrMap[gcrBytes[i]];
result.push(gcrBuf&0x7F);
}
return result;
}
// GCR verifier: no 3 zeroes or 9 ones in a row. Actual hardware limit is 10 ones in a row but reducing the test to 9 ones in a row still leaves barely enough entropy and permits verifying by looking at adjacent bytes each.
function isGCRValid2(a,b) {
var v = (a<<8)|b;
var v2 = v & (v>>3) & (v>>6);
return (v | (v >> 1) | (v >> 2) | 0xC000) == 0xFFFF && (v2 & (v2>>1) & (v2>>2)) == 0;
}
// The encoder - nasty stuff.
// First of all, prepare a few tables
var reverseGcrValues = [];
var reverseGcrIndices = [];
var tmp = [];
for (var i = 0; i < gcrMap.length; i++) {
if (gcrMap[i] != -1) {
tmp.push([gcrMap[i],i]);
}
}
tmp.sort(function(a,b){ return a[0]-b[0]; });
for (var i = 0; i < tmp.length; i++) {
reverseGcrValues[i] = tmp[i][0];
reverseGcrIndices[i] = tmp[i][1];
}
// Returns the index of the largest offset that's not larger than the given value (binary search)
function findOffsetUpTo(value) {
var lo = 0;
var hi = reverseGcrValues.length;
while (lo <= hi) {
var mid = (lo + hi) >> 1;
var test = reverseGcrValues[mid];
if (test == value) {
return mid;
} else if (test > value) {
hi = mid - 1;
} else {
lo = ++mid;
}
}
return mid-1;
}
// GCR main routine
function encodeGCR(rawBytes) {
// Add an offset on top to avoid running into an un-encodeable value
var arithBuf = 1;
var result = [];
if (!encodeGCRRecur(rawBytes, rawBytes.length-1, result, arithBuf, null)) {
throw new Error("GCR encoding failed - did not find a path through the arithmetics tree");
}
return result;
}
function encodeGCRRecur(rawBytes, pos, result, arithBuf, nextByte) {
arithBuf = (arithBuf<<7) + (pos == -1 ? 0x7F : rawBytes[pos]);
// Look for an offset to encode, starting with the highest one that seems to work
for (var ofsIdx = findOffsetUpTo(arithBuf); ofsIdx >= 0; ofsIdx--) {
var newGcrBuf = arithBuf - reverseGcrValues[ofsIdx];
// Test whether arith buffer so large that we'll never catch up when continueing here
if (newGcrBuf*0x7F > reverseGcrValues[reverseGcrValues.length-1]) {
return false;
}
var curByte = reverseGcrIndices[ofsIdx];
if (nextByte == null || isGCRValid2(curByte, nextByte)) {
result[pos+1] = curByte;
if (pos == -1 || encodeGCRRecur(rawBytes, pos-1, result, newGcrBuf, curByte)) {
return true;
}
}
}
return false;
}
// And now: tests
// As we use 7 bits per byte we start with a text and obfuscate a bit to increase the range of covered byte values
var rawText = "Ja, Junge, Alde, datt ist Kaffee. Echt jetzt. Und Kaffee ist nun mal lecker.";
var rawBytes = [];
for (var i = 0; i < rawText.length; i++) rawBytes[i] = (rawText.charCodeAt(i)+i*(i+1))&0x7F;
// Encode
var result = encodeGCR(rawBytes);
console.log(result);
// Verify that GCR is correct
for (var i = 0; i < result.length-1; i++) {
if (!isGCRValid2(result[i], result[i+1])) {
throw new Error("GCR fail");
}
}
console.log(decoded);
// Deocde and then undo the obfuscation
var decoded = decodeGCR(result);
for (var i = 0; i < decoded.length; i++) decoded[i] = (decoded[i]-i*(i+1))&0x7F;
// Print... and hope it'll yield the original text
console.log(decoded);
console.log(String.fromCharCode.apply(String,decoded));
|
| | Krill
Registered: Apr 2002 Posts: 3083 |
The write-up gcr.txt:This document explains how Kabuto's GCR scheme works. It's work in progress, supposed to be understandable by any experienced coder. Feel free to contribute or get back to me (preferably CSDB) if you don't understand a section :)
Version 0.1
GCR encoding is needed due to how the drive electronics work.
A bit of prerequisite: The drive electronics writes bits to disk with a small coil to magnetise the disk media. To encode bits it can feed the coil with either positive or negative current, magnetising the media in that direction. However, engineers chose not to write one direction for zeroes and the other direction for ones but instead they chose to keep the direction the same whenever a 0 is written and reverse the direction whenever a 1 is written. To give you an example what the magnetic flux looks like for a given bitstream:
| 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
>> <<<<<<< >>> <<<<<<<<<<< >>>>>>> <<<<<<< >>
When reading the disk again a short pulse is induced in the read/write head whenever it comes across a flux change (but no current is being induced while a constant flux is applied!):
>> <<<<<<< >>> <<<<<<<<<<< >>>>>>> <<<<<<< >>
--v-------^---v-----------^-------v-------^--
In the next step the signal is lowpass filtered to transform that back into a rectangular wave that represents the actual flux on the disk:
--v-------^---v-----------^-------v-------^--
_______ ___________ _______
__/ \___/ \_______/ \__
And then edges are detected:
_______ ___________ _______
__/ \___/ \_______/ \__
| | | | | |
Finally the drive electronics outputs a 1 for every edge it detected and a 0 whenever it had to wait for a certain time after it saw an edge:
| | | | | |
1 0 1 1 0 0 1 0 1 0 1
Internally the drive outputs the first 0 when it waited for 1.5 times the normal distance of 2 consecutive 1s and the second 0 after 2.5 times that, so there's a bit of tolerance for speed changes and jitter (detecting edges slightly too early or late).
As you can see there's no global clock telling the drive when to output bits, it directly deduces that information from data on the disk. Long sequences of 0s impose a few problems: they don't contain clock info (as there's no change while reading them) so if the disk isn't spinning exactly as fast as it was when it was written the chances of detecting an edge too early or too late would increase. Also, the circuitry that transforms pulses back to a rectangular wave can only cope with pulses up to a certain length as otherwise it could misdetect noise for a signal (yes, the input signal from the read/write head is very noisy). Thus commodore said "there must never be 3 or more consecutive 0's".
As the bitstream is very fast (up to slightly more than one bit every 4 CPU cycles) the CPU won't be able to deal with bits directly, so the drive electronic accepts and delivers whole bytes and handles dividing them into / recombining them from bits internally. But how should it know where a byte starts when reading bit after bit? That's why commodore reserved sequences of 10 or more 1's as a "sync": whenever the last 10 bits were 1's the bit accumulator is reset so it will start reading the next bit when it encounters the next 0 (yes, the first byte following a sync always must start with a 0 bit, as 1 leading 1 bits would be considered part of the sync.) Thus any sequence of data bits written to disk must not contain 10 or more consecutive 1's either.
How can you write arbitrary data to disk using these constraints? This is where GCR encoding comes into play: for every 4 bits of raw data the drive looks up 5 bits to actually write to disk:
Data Written to disk
0000 01010
0001 01011
0010 10010
0011 10011
0100 01110
0101 01111
0110 10110
0111 10111
1000 01001
1001 11001
1010 11010
1011 11011
1100 01101
1101 11101
1110 11110
1111 10101
(Source: Wikipedia)
As you can see, no code contains more than two consecutive 0's. Also, no code has more than one 0 or four 1's at either end, so if you concatenate any 2 you'll never get a sequence of more than two 0's or eight 1's either, thus adhering to the limits explained earlier.
This scheme means that you need to write 5/4 as many bits to disk as your original data contains. But it's not optimal either, so I wondered what could be done. Doing a bit of math revealed that the theoretical limit is encoding about 7.01 bits into every 8 bits on disk. Fractional bits, sounds like some evil magic and very advanced maths, probably way too advanced for a poor old commodore to handle!
(TODO include formula)
After thinking about it a lot I came up with a scheme that's simple enough to allow decoding in realtime. (For comparison, only very recently did lft come up with a way of decoding Commodore GCR itself in realtime and doing so requires a number of advanced programming tricks, beyond what commodore dared to do back in the days). Still, the encoding part is quite difficult. But maybe it can be optimised as well.
First of all I wrote down all the bytes one could write to disk without violating the rules above, or rather their bit patterns:
00100100
00100101
00100110
00100111
00101001
00100101
...
01001001
...
10010010
...
11001001
...
11111100
11111101
11111110
11111111
Then I needed to classify these bit patterns a bit by the sequence of identical bits they start and end with:
# 00100100 #
# 00100101 1
# 00100110 0
# 00100111 3
# 00101001 1
# 00100101 1
# ...
0 01001001 1
0 ...
1 10010010 0
1 ...
2 11001001 1
...
6 11111100 #
6 11111101 1
7 11111110 0
8 11111111 8
Legend:
# 00
0 0
1 1
2 11
3 111
4 1111
5 11111
6 111111
7 1111111
8 11111111
This classification is needed for choosing adjacent bytes.
Unfortunately there's a small problem: when we have a 11111111 byte, we don't know whether the previous byte ended with 01 (i.e. 9 1's) or with 0 (i.e. 8 1's), a problem that doesn't exist for all other bytes (since there's always an internal bit that ends the respective end sequence).
But there's a solution: we disallow sequences of 9 1's, this still allows us to encode 7 raw bits into 8 bits on disk, but this way we can be sure that when we see such a byte that the previous byte ended with a 0.
This leads to the following adjacency table of what previous byte endings are compatible with following byte starts (- = no, * = yes):
#012345678
#--********
0-*********
1*********-
2********--
3*******---
4******----
5*****-----
6****------
7***-------
8**--------
Fortunately depending on the last byte's end there's always a *range* of legal successor byte starts, and since the byte list is already sorted by prefixes this means that we can directly select a range of legal successor bytes using a lookup on the previous byte.
E.g. if the current byte is 11010111, this means its end is "3", and the table says that has a starting sequence classified from "#" to "5" is legal, thus any bytes from 00100100 to 11111011, inclusive.
Still that doesn't tell us how to encode. There are way more than 128 legal byte values we could write, but depending on the previous byte there can be way less than 128 legal byte values. Especially following 11111111 there are way less than 128 legal byte values since they all need to start with 0.
But we can see that it's still theoretically possible to encode enough values in the long term: write down all legal byte values. This gives you TODO maybe around 170 states. Then replace each byte value with all byte values that might follow the byte value you just wrote down. And do so again and again. Of course not on paper (would be way too much work), but using a computer and clever programming (combine identical byte values with a counter instead of writing them down multiple times). And for every step write down the factor by which the number of total states increased. You'll see that this number converges to about 129, even though some of the encoded bytes allow a number of follow-up byte that's way less than 128, that's more than outweighed by the encoded bytes that do allow more follow-up bytes.
To demonstrate that, I'll use a simplified version that uses bytes of 2 bits each and allows sequences of 1's of arbitrary length, but still has the limit of just up to 2 0's.
All legal states:
|00|01|10|11|
Add all legal follow-up states:
|00 |01 |10 |11 |
|10|11|00|01|10|11|01|10|11|00|01|10|11|
And doing that one more time:
|00 |01 |10 |11 |
|10 |11 |00 |01 |10 |11 |01 |10 |11 |00 |01 |10 |11 |
|01|10|11|00|01|10|11|10|11|00|01|10|11|01|10|11|00|01|10|11|00|01|10|11|01|10|1 1|00|01|10|11|10|11|00|01|10|11|01|10|11|00|01|10|11|
And we can do that as often as we like, but of course it'll quickly exceed the pixel density of your monitor, let's do this one more time:
|00 |01 |10 |11 |
|10 |11 |00 |01 |10 |11 |01 |10 |11 |00 |01 |10 |11 |
|01|10|11|00|01|10|11|10|11|00|01|10|11|01|10|11|00|01|10|11|00|01|10|11|01|10|1 1|00|01|10|11|10|11|00|01|10|11|01|10|11|00|01|10|11|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||
We can't see much yet, but of course we can zoom in to reveal more detail:
|00 |01 |10 |11 |
|10 |11 |00 |01 |10 |11 |01 |10 |11 |00 |01 |10 |11 |
|01|10|11|00|01|10|11|10|11|00|01|10|11|01|10|11|00|01|10|11|00|01|10|11|01|10|1 1|00|01|10|11|10|11|00|01|10|11|01|10|11|00|01|10|11|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||
__________________/ \_____________________________________________________________________
/ \
|01 |
|00 |01 |10 |11 |
|10 |11 |00 |01 |10 |11 |01 |10 |11 |00 |01 |10 |11 |
|01|10|11|00|01|10|11|10|11|00|01|10|11|01|10|11|00|01|10|11|00|01|10|11|01|10|1 1|00|01|10|11|10|11|00|01|10|11|01|10|11|00|01|10|11|
We're intersted in the number of states in each row.
If we keep the order 00,01,10,11 we can just write the initial count as 1,1,1,1.
Determining the follow-up states each:
|00| -> |10|11| = 1,0,0,0 -> 0,0,1,1
|01| -> |00|01|10|11| = 0,1,0,0 -> 1,1,1,1
|10| -> |01|10|11| = 0,0,1,0 -> 0,1,1,1
|11| -> |00|01|10|11| = 0,0,0,1 -> 1,1,1,1
we can easily compute the number of occurrences of each state in each row:
1,1,1,1
2,3,4,4
7,11,13,13
24,37,44,44
81,125,149,149
274,423,504,504
927,1431,1705,1705
3136,4841,5768,5768
10609,16377,19513,19513
35890,55403,66012,66012
...
The total number of states in each row is:
4,13,44,149,504,1705,5768,19513,66012,223317,...
And the ratio compared to the predecessor is:
3.25000000..
3.38461538...
3.38636363...
3.38255033...
3.38293650...
3.38299120...
3.38297503...
3.38297545...
3.38297582...
...
3.38297576...
As you can see this quickly converges, and every step eventually increases the number of possible states by a constant factor of 3.38297576... . States equal bits, and the dual logarithm tells us the number of bits we could encode theoretically (e.g. 8 bits = 256 states, log2(256) = 8): Each encoded "byte" adds 1.75829283... raw bits.
And if we repeat that with our 8-bits-per-byte encoding, we come to the same conclusion as earlier: We get about 7.01 raw bits per encoded byte.
But how can we encode actual data with that? All we got so far is just some theory...
Let's go back to appending bytes in legal ways. Imagine you've already appended a large number of bytes:
|00 |01 |10 |11 |
|10 |11 |00 |01 |10 |11 |01 |10 |11 |00 |01 |10 |11 |
|01|10|11|00|01|10|11|10|11|00|01|10|11|01|10|11|00|01|10|11|00|01|10|11|01|10|1 1|00|01|10|11|10|11|00|01|10|11|01|10|11|00|01|10|11|
Just notice that each original state has a range of different length! And the zoom factor we used for zooming in earlier is exactly that value 3.38297576...
The idea here is to assume a length of 1 for the whole range and assign each byte the fraction of length its value takes. We also write down the offset (i.e. sum of lengths of all previous states) so we'll end up with a range with all ranges filling the space between 0 and 1 completely:
Value Length Offset
|00| 0.16071 0.00000
|01| 0.29560 0.16071
|10| 0.24809 0.45631
|11| 0.29560 0.70440
Now if we have a byte sequence we could take the first byte's offset, add the second byte's offest scaled down by the scale factor, add the third byte's offset scaled down twice by the scale factor and so on... wait!
After e.g. |10| not all bytes are permitted, here |00| is not permitted, so all further ranges have different offsets - still, as we saw above, all permitted ranges are always adjacent so we just need to subtract the offset of the first permitted range (scaled down of course), and for speed-up we just create a 3rd column of offsets with the offset of the first permitted follow-up byte already scaled down and subtracted to yield the "adjusted offset".
Value Length Offset AdjustedOffset
|00| 0.16071 0.00000 -0.13488
|01| 0.29560 0.16071 0.16071
|10| 0.24809 0.45631 0.40880
|11| 0.29560 0.70440 0.70440
So to find out where in the range a given byte sequence will end up we use the following algorithm (offset[byte] means "the offset of the byte just read" etc.)
* scale=1, pos=0
* For each byte:
** pos += adjustedOffset[byte]*scale
** scale /= 3.38297576...
Let's have a look at the encoding algorithm again, but this time for 8-bit bytes:
* scale=1, pos=0
* For each byte:
** pos += adjustedOffset[byte]*scale
** scale /= 129...
Of course we don't want to deal with huge floating-point numbers, especially not on c64.
What if we just divide by 128 instead of 129? This means that after each step ranges will overlapping each other a little. Of course adjusted offsets must be recomputed wit the changed scale factor as we might get gaps as well if we don't, yielding a table we just name "adjustedOffset2".
* scale=1, pos=0
* For each byte:
** pos += adjustedOffset2[byte]*scale
** scale /= 128...
Also, who says that we need to divide? We can just increase the size of the global scale instead of decreasing the size of the scales added:
* scale=1, pos=0
* For each byte:
** pos += adjustedOffset2[byte]
** pos *= 128
* pos /= pow(128,bytecount)
Note that we no longer need the scale, also we could also move the "add offset" step after "multiply" if we pre-multiply adjustedOffset2 by 128 to yield adjustedOffset3:
* pos=0
* For each byte:
** pos *= 128
** pos += adjustedOffset3[byte]
* pos /= pow(128,bytecount)
Also we could scale adjustedOffset3 up by e.g. a factor of 20, this yields table adjustedOffset4:
* pos=0
* For each byte:
** pos *= 128
** pos += adjustedOffset4[byte]
* pos /= pow(128,bytecount)*20
Remember how I mentioned that ranges are now slightly overlapping? Adjusted offsets were originally values between 0 and 1, but after pre-multiplication they're rather in the range between 0 and 2500. Chances are that we can adjust them enough to make them integers while still not creating gaps between ranges - since they're slightly overlapping we can move them around a bit.
Because the scale factor is a power of two we could in theory extract bits while we're adding them and scaling them up. But as old offsets are constantly scaled up new offsets to be added could introduce a carry that could climb all the way up in worst case... that wouldn't be very nice.
But there's a solution. We go back one step and imagine we were reading bytes the other way round, easily possible since addition doesn't care about order:
* pos=0
* For each byte in reverse order:
** pos += adjustedOffset4[byte]
** pos /= 128
Reading bytes in reverse order? Not so nice, and also not neccessary. We just reverse the whole sequence of encoded bits, this means that not only byte order is reversed (and thus normal again) but also bit order in each byte is reversed - this we can solve by introducing table adjustedOffset5 which is just adjustedOffset4 with values re-mapped from the same index with bits reversed.
Also, dividing the offset by 128 is THE opportunity of extracting bits! No risk of bits ever carrying over :)
* pos=0
* For each byte:
** pos += adjustedOffset5[byte]
** Store pos & 127
** pos >>= 7
Since the C64 is an 8-bit machine we split up the offsets into 2 tables, one carrying the lower 7 bits, one carrying the upper 7 bits:
* pos=0
* For each byte:
** pos += adjustedOffset5Upper[byte]<<7
** pos += adjustedOffset5Lower[byte]
** Store pos & 127
** pos >>= 7
And rearrange, since the upper value doesn't affect the bits to be added we can move it after updating pos:
* pos=0
* For each byte:
** pos += adjustedOffset5Lower[byte]
** Store pos & 127
** pos >>= 7
** pos += adjustedOffset5Upper[byte]
Introducing a "carry":
* pos=0, carry=0
* For each byte:
** pos += adjustedOffset5Lower[byte] + carry
** Store pos & 127
** carry = pos >> 7
** pos = adjustedOffset5Upper[byte]
Just one small bit left: We read 7-bit values that way but want 8-bit values. And adding 2 7-bit values give you your carry in bit 7 instead of the carry flag...
We fix this as follows:
* The first 7-bit value decoded is shifted to the left (-> carry in bit 7 becomes carry) and stored in a temporary location
* The following 7 7-bit values are shifted to the left by shifting in one bit from the temporary location each and then stored as actually decoded data.
But how to write a good encoder? I wrote a javascript version that brute forces its way through the ranges and offsets to create a byte stream that decodes properly. Maybe one day someone will write a proper encoder that even works on C64 :) |
| | Krill
Registered: Apr 2002 Posts: 3083 |
The optimised encoder gcrTest2.js:// This is Kabuto's 7-per-8-bit GCR encoding scheme. I hope I don't reinvent the wheel here :-)
// Problem with GCR values on disk is that entropy varies, also dependencies between adjacent bytes play a role.
// This scheme uses arithmetic coding similar to what arithmetic packers use, just the other way round (storing constant-entropy 7-bit values in variable-entropy GCR bytes).
// The decoder is very simple (as you'll see below) and should be capable of realtime decoding when implemented on 1541.
// One disadvantage, though: it's neccessary to store one extra GCR byte on disk to avoid arithmetical problems at both ends.
// Also, the encoder is quite complicated, but it can be optimized for sure.
// GCR decode table
var gcrMap = [-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,- 1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,2096,1142,3238,-1,1813,859,2955,339,2435,1481,357 7,-1,-1,705,2801,185,2281,1327,3423,-1,1998,1044,3140,524,2620,1666,3762,-1,-1,- 1,-1,-1,-1,-1,-1,-1,1767,813,2909,293,2389,1435,3531,-1,-1,659,2755,139,2235,128 1,3377,-1,1952,998,3094,478,2574,1620,3716,-1,-1,-1,-1,55,2151,1197,3293,-1,1868 ,914,3010,394,2490,1536,3632,-1,-1,760,2856,240,2336,1382,3478,-1,2053,1099,3195 ,579,2675,1721,3817,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,634,27 30,114,2210,1256,3352,-1,1927,973,3069,453,2549,1595,3691,-1,-1,-1,-1,30,2126,11 72,3268,-1,1843,889,2985,369,2465,1511,3607,-1,-1,735,2831,215,2311,1357,3453,-1 ,2028,1074,3170,554,2650,1696,3792,-1,-1,-1,-1,-1,-1,-1,-1,-1,1797,843,2939,323, 2419,1465,3561,-1,-1,689,2785,169,2265,1311,3407,-1,1982,1028,3124,508,2604,1650 ,3746,-1,-1,-1,-1,85,2181,1227,3323,-1,1898,944,3040,424,2520,1566,3662,-1,-1,79 0,2886,270,2366,1412,3508,-1,2083,1129,3225,609,2705,1751,3847];
/*
// For each GCR value we iterate over legal predecessors and successors
var gcrRangesMin = [], gcrRangesMax = [];
for (var i = 0; i < 16; i++) {
gcrRangesMin[i] = [];
gcrRangesMax[i] = [];
for (var a = 0; a < 256; a++) if (gcrMap[a] != -1) for (var b = 0; b < 256; b++) if (gcrMap[b] != -1 && isGCRValid2(a,b)) {
var min = (i ? gcrRangesMin[i-1] : gcrMap)[a]/128 + gcrMap[b];
var max = (i ? gcrRangesMax[i-1] : gcrMap)[a]/128 + gcrMap[b];
if (gcrRangesMin[i][b] == null) {
gcrRangesMin[i][b] = min;
gcrRangesMax[i][b] = max;
} else {
gcrRangesMin[i][b] = Math.min(gcrRangesMin[i][b],min);
gcrRangesMax[i][b] = Math.max(gcrRangesMax[i][b],max);
}
}
}
var idxList = [];
for (var a = 0; a < 256; a++) if (gcrMap[a] != -1) idxList.push(a);
idxList.sort(function(a,b){return gcrRangesMin[15][a]-gcrRangesMin[15][b];});
var lastlast;
var boundaries = [];
for (var a = 0; a < idxList.length; a++) {
var i = idxList[a];
// We require a bit of minimum overlap here, this simplifies all follow-up steps
var limit = Math.max(gcrRangesMin[1][i],gcrRangesMin[15][i]);
if (a && limit+1/128 > lastlast) {
throw new Error("Range overlap error");
}
if (a) boundaries.push(Math.floor((limit + lastlast)/2*128));
console.log((i+0x100).toString(2).substring(1),Math.floor(gcrRangesMin[15][i]*1 28),Math.floor(gcrRangesMax[15][i]*128),Math.floor(gcrRangesMin[1][i]*128),Math. floor(gcrRangesMax[1][i]*128),Math.floor(gcrRangesMin[0][i]*128),Math.floor(gcrR angesMax[0][i]*128));
lastlast = Math.min(gcrRangesMax[1][i],gcrRangesMax[15][i]);
}
// Slot 0: entry and its predecessor each. There is some range overlap. Dimension of entries: 4096, 32
// Slot 1 and onward: No range overlap. (Slot 1 added an entry of dimension 1/4. That's more than enough for cancelling out all further possible overlap.)
// Compute table for fast lookup of the relevant boundary
var boundaryLookup = [];
var p = 0;
for (var i = 0; i < boundaries.length; i++) {
while ((p+1)*2048 <= boundaries[i]) {
boundaryLookup[p++] = i*2+0;
}
if ((boundaries[i]&2047) != 0) {
if (p != boundaries[i]>>11) throw new Error();
boundaryLookup[p++] = i*2+1;
}
}
for (var i = 0; i < boundaryLookup.length/64; i++) boundaryLookup[p++] = boundaries.length*2;
console.log(boundaryLookup.join(","));
*/
var idxList = [36,164,100,228,148,84,212,52,180,116,244,76,204,44,172,108,236,156,92,220,60,18 8,124,252,146,82,210,50,178,114,242,74,202,42,170,106,234,154,90,218,58,186,122, 250,38,166,102,230,150,86,214,54,182,118,246,78,206,46,174,110,238,158,94,222,62 ,190,126,254,73,201,41,169,105,233,153,89,217,57,185,121,249,37,165,101,229,149, 85,213,53,181,117,245,77,205,45,173,109,237,157,93,221,61,189,125,253,147,83,211 ,51,179,115,243,75,203,43,171,107,235,155,91,219,59,187,123,251,39,167,103,231,1 51,87,215,55,183,119,247,79,207,47,175,111,239,159,95,223,63,191,127,255];
var boundaries = [3857,7680,10897,14630,18432,21649,25452,27537,31360,34577,38194,41361,45164,472 49,51072,54289,58022,61824,65041,68844,70929,74752,77969,81185,84992,88209,92012 ,94097,97920,101137,104754,107921,111724,113809,117632,120849,124582,128384,1316 01,135404,137489,141312,144529,147998,150033,153856,157073,160806,164608,167825, 171628,173713,177536,180753,184370,187537,191340,193425,197248,200465,204198,208 000,211217,215020,217105,220928,224145,226834,230033,233836,235921,239744,242961 ,246694,250496,253713,257516,259601,263424,266641,270110,272145,275968,279185,28 2918,286720,289937,293740,295825,299648,302865,306482,309649,313452,315537,31936 0,322577,326310,330112,333329,337132,339217,343040,346257,349473,353280,356497,3 60300,362385,366208,369425,373042,376209,380012,382097,385920,389137,392870,3966 72,399889,403692,405777,409600,412817,416286,418321,422144,425361,429094,432896, 436113,439916,442001,445824,449041,452658,455825,459628,461713,465536,468753,472 486,476288,479505,483308,485393,489216,492433];
var boundaryLookup = [0,1,2,3,4,5,6,7,8,10,11,12,13,15,16,17,19,20,21,22,23,24,25,27,29,30,31,32,33,3 4,35,37,38,39,41,42,43,44,45,47,48,49,50,51,53,55,56,57,58,59,60,61,63,64,65,67, 68,69,70,71,73,74,75,76,77,78,79,81,82,84,85,86,87,89,90,91,93,94,95,96,97,99,10 0,101,103,104,105,106,107,108,109,111,112,113,115,116,117,119,120,121,122,123,12 4,125,127,128,129,131,132,133,135,136,137,138,139,141,142,143,145,146,147,148,14 9,151,152,153,155,156,157,158,159,161,163,164,165,166,167,168,169,170,172,173,17 4,175,177,178,179,181,182,183,184,185,186,187,189,191,192,193,194,195,196,197,19 9,200,201,203,204,205,206,207,209,210,211,212,213,215,217,218,219,220,221,222,22 3,225,226,227,229,230,231,232,233,235,236,237,238,239,240,241,243,244,246,247,24 8,249,251,252,253,255,256,257,258,259,261,262,263,265,266,267,268,269,270,271,27 3,274,275,277,278,279,281,282,283,284,285,286,287,289,290,291,293,294,295,296,29 6,296,296];
var reverseGCRMap = {}
for (var i = 0; i < gcrMap.length; i++) if (gcrMap[i] != -1) reverseGCRMap[gcrMap[i]] = i;
// The decoder
function decodeGCR(gcrBytes) {
var result = [];
var gcrBuf = gcrMap[gcrBytes[0]];
for (var i = 1; i < gcrBytes.length; i++) {
gcrBuf = (gcrBuf >> 7) + gcrMap[gcrBytes[i]];
result.push(gcrBuf&0x7F);
}
return result;
}
// GCR verifier: no 3 zeroes or 9 ones in a row. Actual hardware limit is 10 ones in a row but reducing the test to 9 ones in a row still leaves barely enough entropy and permits verifying by looking at adjacent bytes each.
function isGCRValid2(a,b) {
var v = (a<<8)|b;
var v2 = v & (v>>3) & (v>>6);
return (v | (v >> 1) | (v >> 2) | 0xC000) == 0xFFFF && (v2 & (v2>>1) & (v2>>2)) == 0;
}
// GCR encoder
function encodeGCR(rawBytes) {
// The encoder scans the bytes to be encoded in reverse order, last to first, always selecting GCR values that allow us to encode any further preceding GCR values.
// The decoder effectively computes (gcrMap[gcr[0]]>>7) + gcrMap[gcr[1]] + (gcrMap[gcr[2]]<<7) + (gcrMap[gcr[3]]<<14) + (gcrMap[gcr[4]]<<21) + ... + (gcrMap[gcr[n]]<<((n-1)*7))
// and stores the result into raw[0] + (raw[1]<<7) + (raw[2]<<14) + (raw[3]<<21) + ... + (raw[n-1]<<((n-1)*7)) + extra<<(n*7).
// Both these terms must be equal of course.
// This "extra" is the content of the accumulator that's normally discarded after decoding, but it plays an important role here. It can contain values from 1 to 31 (not 100% sure, might be off by 1 or 2 at either end).
// Choosing a value for gcr[n-1] limits the total possible range of the sum - the previous addends only have a certain range they can span (based on gcrMap) and as not all combinations of adjacent GCR bytes are possible the size of the range is restricted further... in the end each possible value for gcr[n-1] creates a range and all ranges overlap a little bit. This is where boundaries come into play - these are precomputed boundaries that tell us for which raw bytes which gcr value to choose.
// After choosing gcr[n-1] we can pop off both the last raw and the last gcr byte from the above formula by decrementing n. But before doing so we must recompute extra as:
// extra := extra << 7 + raw[n-1] - gcrMap[gcr[n]]
// by adding the effects of the bytes just popped off to it.
// We do that until we're left with just one raw byte and 2 GCR bytes. This time the boundary list isn't used because it becomes too inprecise in this case - we just scan over all possible values for gcr[0] and choose matching gcr[1] values from a precomputed map.
// The actual algorithm always considers the topmost 2 raw bytes for finding the correct boundary.
// Initial "extra" value
var val = 16;
// Load one "byte"
var idx = rawBytes.length;
val = val*128 + rawBytes[--idx];
var result = [];
// Load all further "bytes" and for each loaded byte output a GCR byte
while (idx) {
val = val*128 + rawBytes[--idx];
// Lookup which boundaries between adjacent GCR values we're close
var p = boundaryLookup[val>>11];
if (p&1) {
p = (p >> 1) + (boundaries[p >> 1] <= val ? 1 : 0);
} else {
p /= 2;
}
// It sometimes happens that we accidentally select a GCR value that can't be legally combined with the next GCR value, well possible due to overlap between sections. We might have to choose an adjacent entry from our list in this case.
if (result.length && !isGCRValid2(idxList[p],result[0])) {
if (p && isGCRValid2(idxList[p-1],result[0])) p--; else p++;
}
result.unshift(idxList[p]);
val -= gcrMap[idxList[p]]*128;
}
// Still need to encode the first 2 GCR bytes, just scan over all values for the first GCR byte and if we find a matching second byte then return both.
// Also ensure the first byte starts with a '0' bit so it's valid after sync.
for (var i = 0; i < idxList.length; i++) {
var a = idxList[i];
var b = reverseGCRMap[val - (gcrMap[a]>>7)];
if (b != null && a < 128 && isGCRValid2(a,b) && isGCRValid2(b,result[0])) {
result.unshift(b);
result.unshift(a);
return result;
}
}
throw new Error("Couldn't find a solution, and that really shouldn't happen.");
}
// And now: create sectors of random data and decode them again.
for (var j = 0; j < 1024; j++) {
var rawBytes = [];
for (var i = 0; i < 256; i++) rawBytes[i] = Math.floor(Math.random()*128);
// Encode
var result = encodeGCR(rawBytes);
// Verify that GCR is correct
for (var i = 0; i < result.length-1; i++) {
if (!isGCRValid2(result[i], result[i+1])) {
throw new Error("GCR fail");
}
}
var decoded = decodeGCR(result);
for (var i = 0; i < rawBytes.length; i++) if (rawBytes[i] != decoded[i]) throw new Error("Mismatch!");
}
console.log("Success!"); |
| | tlr
Registered: Sep 2003 Posts: 1807 |
Good, would be a shame if this disappeared.
I have a C implementation/conversion of this somewhere in my repo. |
Previous - 1 | 2 | 3 - Next | |