{167, 217, 268, 318, 369, 419, 470},
+----------------+-----------------+--------------+ | Pulse duration | arithmetic code | huffman code | +----------------+-----------------+--------------+ | 167 cycles | 0.31886 | 11 | | 217 cycles | 0.22645 | 01 | | 267 cycles | 0.16083 | 101 | | 317 cycles | 0.11422 | 100 | | 367 cycles | 0.08112 | 000 | | 417 cycles | 0.05761 | 0011 | | 467 cycles | 0.04091 | 0010 | +----------------+-----------------+--------------+ mean cycles per bit, arithmetic code = 101.3 mean cycles per bit, huffman code = 102.9
+----------------+-----------------+--------------+ | Pulse duration | arithmetic code | huffman code | +----------------+-----------------+--------------+ | 150 cycles | 0.39979 | 0 | | 230 cycles | 0.24517 | 10 | | 310 cycles | 0.15035 | 110 | | 390 cycles | 0.09220 | 1110 | | 470 cycles | 0.05654 | 11111 | | 550 cycles | 0.03468 | 111101 | | 630 cycles | 0.02127 | 111100 | +----------------+-----------------+--------------+ mean cycles per bit, arithmetic code = 113.4 mean cycles per bit, huffman code = 116.2
+----------------+-----------------+--------------+ | Pulse duration | arithmetic code | huffman code | +----------------+-----------------+--------------+ | 150 cycles | 0.41389 | 0 | | 230 cycles | 0.25856 | 10 | | 320 cycles | 0.15230 | 110 | | 416 cycles | 0.08660 | 1110 | | 516 cycles | 0.04810 | 11111 | | 618 cycles | 0.02640 | 111101 | | 724 cycles | 0.01415 | 111100 | +----------------+-----------------+--------------+ mean cycles per bit, arithmetic code = 117.9 mean cycles per bit, huffman code = 119.7
+----------------+-----------------+--------------+ | Pulse duration | arithmetic code | huffman code | +----------------+-----------------+--------------+ | 180 cycles | 0.38997 | 0 | | 270 cycles | 0.24352 | 10 | | 360 cycles | 0.15207 | 110 | | 450 cycles | 0.09497 | 1110 | | 540 cycles | 0.05930 | 11110 | | 630 cycles | 0.03703 | 111111 | | 720 cycles | 0.02313 | 111110 | +----------------+-----------------+--------------+ mean cycles per bit, arithmetic code = 132.5 mean cycles per bit, huffman code = 136.4
Cool stuff. I remember that I did some similar research back then.
You limited your research to 7 different pulse lengths. Depending on the shortest pulse and the spacing, this may not be the optimal solution. E.g. for the last, conservative approach, I am pretty sure that a lower number of pulse lengths would give a better result.
Also, what sample data did you base your pattern distribution on? E.g uncompressed data must result into pretty different values depending on the content and very different than compressed one. For the latter, all bits and combinations should have about the same probability.
With random data all shouldn't arithmetic data be 0.5 for single bits and 0.25 for two bit combinations? How did you get to these different values? I am sure I am missing something (again).
Cycles|Bits| Prob.| Cycles per bit and probability ------+----+------+------------------------------- 100 | 0 | 0.5 | 50.00 150 | 10 | 0.25 | 18.75 200 | 11 | 0.25 | 25.00 ------+----+------+------ Total: 93,75
Cycles|Bits| Prob.| Cycles per bit and probability ------+----+------+------------------------------- 167 |11 |0,25 | 20.875 217 |01 |0,25 | 27.125 267 |101 |0,125 | 11.125 317 |100 |0,125 | 13.208 367 |000 |0,125 | 15.292 417 |0011|0,0625| 6.516 467 |0010|0,0625| 7.297 ------+----+------+-------- total: 101.4375
from math import log def log2(x): return log(x)/log(2) def cycles_per_bit_given_arithmetic_code(code): den = num = 0 for dur,p in code: bits = -log2(p) weight = p*bits # this is critical num += weight*dur/bits den += weight cycles_per_bit = num/den return cycles_per_bit def hc_to_ac(hc_code): # convert a huffman code to corresponding arithmetic code return [(d,0.5**len(s)) for d,s in hc_code] hccode = [ (100, '0'), (150, '10'), (200, '11'), ] print(cycles_per_bit_given_arithmetic_code(hc_to_ac(hccode)))
+-----+----+------+-----------+--------+-------+ Cycles|Bits| Prob.| w = p*b | c/b | c/b*w | (b=len(Bits)) +-----+----+------+-----------+--------+-------+ | 100 | 0 | 0.50 | 0.5 | 100.0 | 50.0 | | 150 | 10 | 0.25 | 0.5 | 75.0 | 37.5 | | 200 | 11 | 0.25 | 0.5 | 100.0 | 50.0 | +-----+----+------+-----------+--------+-------+ total weighted rates = 137.5 total weights = 1.5 weighted average = 91.667
You're making the same mistake I was a few days ago :)
Symbols that output longer bit sequences consume more of the original file when it's being converted to a pulse stream, so they need to be weighted accordingly when you're computing the expected cycles per bit.
...There a lots of factors we can play with: 1. short pulse length 2. pulse gap 3. number of pulse lengths (a result of 1. + 2.) 4. ???
And what is causing the inaccuracies? - the varying tape speed (how much? +/-5%?) (increasing pulse gaps can cope with that) - frequencies? E.g high frequencies have a lower amplitude than lower ones, causing less correct reads (then we would need decreasing pulse gaps to handle that) - enthusi's graph shows some intervals of heavy distortions, what is causing these? - varying frequencies? (e.g. a low frequency followed by a high one give less accurate results than two high ones) - aging?...
I just encoded a sequence of 1000 random bits with the [0,10,11]=>[100,150,200] encoding...
Could you show something for 4 pulse lenghts, please? I did a little test in Basic V2 with 2KBit of random data. "1/00/01/111" was always longer than "00/01/10/11", "0/1/000/111" was even worse.
...But can I ask what pulse lengths you were using? 00/01/10/11 will always minimise the count of pulses you output, but I wouldn’t expect it to minimise the total recording length unless your four pulse lengths are quite similar to each other.
spacing of (100+ 100*n) +----------------+-----------------+--------------+ | Pulse duration | arithmetic code | huffman code | +----------------+-----------------+--------------+ | 100 cycles | 0.51879 | 1 | | 200 cycles | 0.26914 | 01 | | 300 cycles | 0.13963 | 001 | | 400 cycles | 0.07244 | 000 | +----------------+-----------------+--------------+ mean cycles per bit, arithmetic code = 105.6 mean cycles per bit, huffman code = 107.1 spacing of (100+ 28*n) +----------------+-----------------+--------------+ | Pulse duration | arithmetic code | huffman code | +----------------+-----------------+--------------+ | 100 cycles | 0.36380 | 0 | | 128 cycles | 0.27410 | 10 | | 156 cycles | 0.20651 | 111 | | 184 cycles | 0.15559 | 110 | +----------------+-----------------+--------------+ mean cycles per bit, arithmetic code = 68.6 mean cycles per bit, huffman code = 71.1 spacing of (100+ 27*n) +----------------+-----------------+--------------+ | Pulse duration | arithmetic code | huffman code | +----------------+-----------------+--------------+ | 100 cycles | 0.36057 | 11 | | 127 cycles | 0.27376 | 10 | | 154 cycles | 0.20785 | 01 | | 181 cycles | 0.15781 | 00 | +----------------+-----------------+--------------+ mean cycles per bit, arithmetic code = 68.0 mean cycles per bit, huffman code = 70.2