Cycles|Bits| Prob.| Cycles per bit and probability ------+----+------+------------------------------- 100 | 0 | 0.5 | 50.00 150 | 10 | 0.25 | 18.75 200 | 11 | 0.25 | 25.00 ------+----+------+------ Total: 93,75

Cycles|Bits| Prob.| Cycles per bit and probability ------+----+------+------------------------------- 167 |11 |0,25 | 20.875 217 |01 |0,25 | 27.125 267 |101 |0,125 | 11.125 317 |100 |0,125 | 13.208 367 |000 |0,125 | 15.292 417 |0011|0,0625| 6.516 467 |0010|0,0625| 7.297 ------+----+------+-------- total: 101.4375

from math import log def log2(x): return log(x)/log(2) def cycles_per_bit_given_arithmetic_code(code): den = num = 0 for dur,p in code: bits = -log2(p) weight = p*bits # this is critical num += weight*dur/bits den += weight cycles_per_bit = num/den return cycles_per_bit def hc_to_ac(hc_code): # convert a huffman code to corresponding arithmetic code return [(d,0.5**len(s)) for d,s in hc_code] hccode = [ (100, '0'), (150, '10'), (200, '11'), ] print(cycles_per_bit_given_arithmetic_code(hc_to_ac(hccode)))

+-----+----+------+-----------+--------+-------+ Cycles|Bits| Prob.| w = p*b | c/b | c/b*w | (b=len(Bits)) +-----+----+------+-----------+--------+-------+ | 100 | 0 | 0.50 | 0.5 | 100.0 | 50.0 | | 150 | 10 | 0.25 | 0.5 | 75.0 | 37.5 | | 200 | 11 | 0.25 | 0.5 | 100.0 | 50.0 | +-----+----+------+-----------+--------+-------+ total weighted rates = 137.5 total weights = 1.5 weighted average = 91.667

You're making the same mistake I was a few days ago :)

Symbols that output longer bit sequences consume more of the original file when it's being converted to a pulse stream, so they need to be weighted accordingly when you're computing the expected cycles per bit.

...There a lots of factors we can play with: 1. short pulse length 2. pulse gap 3. number of pulse lengths (a result of 1. + 2.) 4. ???

And what is causing the inaccuracies? - the varying tape speed (how much? +/-5%?) (increasing pulse gaps can cope with that) - frequencies? E.g high frequencies have a lower amplitude than lower ones, causing less correct reads (then we would need decreasing pulse gaps to handle that) - enthusi's graph shows some intervals of heavy distortions, what is causing these? - varying frequencies? (e.g. a low frequency followed by a high one give less accurate results than two high ones) - aging?...

I just encoded a sequence of 1000 random bits with the [0,10,11]=>[100,150,200] encoding...

Could you show something for 4 pulse lenghts, please? I did a little test in Basic V2 with 2KBit of random data. "1/00/01/111" was always longer than "00/01/10/11", "0/1/000/111" was even worse.

...But can I ask what pulse lengths you were using? 00/01/10/11 will always minimise the count of pulses you output, but I wouldn’t expect it to minimise the total recording length unless your four pulse lengths are quite similar to each other.

spacing of (100+ 100*n) +----------------+-----------------+--------------+ | Pulse duration | arithmetic code | huffman code | +----------------+-----------------+--------------+ | 100 cycles | 0.51879 | 1 | | 200 cycles | 0.26914 | 01 | | 300 cycles | 0.13963 | 001 | | 400 cycles | 0.07244 | 000 | +----------------+-----------------+--------------+ mean cycles per bit, arithmetic code = 105.6 mean cycles per bit, huffman code = 107.1 spacing of (100+ 28*n) +----------------+-----------------+--------------+ | Pulse duration | arithmetic code | huffman code | +----------------+-----------------+--------------+ | 100 cycles | 0.36380 | 0 | | 128 cycles | 0.27410 | 10 | | 156 cycles | 0.20651 | 111 | | 184 cycles | 0.15559 | 110 | +----------------+-----------------+--------------+ mean cycles per bit, arithmetic code = 68.6 mean cycles per bit, huffman code = 71.1 spacing of (100+ 27*n) +----------------+-----------------+--------------+ | Pulse duration | arithmetic code | huffman code | +----------------+-----------------+--------------+ | 100 cycles | 0.36057 | 11 | | 127 cycles | 0.27376 | 10 | | 154 cycles | 0.20785 | 01 | | 181 cycles | 0.15781 | 00 | +----------------+-----------------+--------------+ mean cycles per bit, arithmetic code = 68.0 mean cycles per bit, huffman code = 70.2