[CSDb] - User Forums - Tape loaders using more than two pulse widths for data

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Tape loaders using more than two pulse widths for data

2018-12-08 20:16

ChristopherJam

Registered: Aug 2004
Posts: 1409

Tape loaders using more than two pulse widths for data

I’ve been thinking a bit about a novel tape encoding, but it would rely on (among other things) having more than two pulse widths. So far as I can see, none of the old turbo loaders used a long pulse for anything beyond end of byte/end of block - the bitstream itself was just a sequence of short and medium pulses (usually around 200 cycles for short, 300 to 450 for long).

Is there any particular reason none of the popular loaders used (eg) four pulse widths for each bitpair? Even using quite widely separated durations of 210/320/450/600 would then lead to an average time per bit of just 197 cycles, a massive improvement on the 265 cycles per bit you’d get for 210/320.

(The idea I’ve been working on is a bit more complex than that, but if a simple bitpair scheme wouldn’t work for some reason, then the idea I have would need to be scaled back somewhat. Promising that long pulses were used for framing, mind..)

... 23 posts hidden. Click here to view all posts....

2018-12-16 21:18

ChristopherJam

Registered: Aug 2004
Posts: 1409

Quoting thrust64

Cool stuff. I remember that I did some similar research back then.

Thanks! I assumed as much.

Quote:

You limited your research to 7 different pulse lengths. Depending on the shortest pulse and the spacing, this may not be the optimal solution. E.g. for the last, conservative approach, I am pretty sure that a lower number of pulse lengths would give a better result.

My apologies, the tables are results for 7 pulse lengths, but the graphs show average cycles per bit as a function of the number of pulse lengths used, from 2 to 8. If you look at those, you can see that the more pulses lengths you use, the better your throughput (though admittedly you're most of the way to optimal by five or six).

Quote:

Also, what sample data did you base your pattern distribution on? E.g uncompressed data must result into pretty different values depending on the content and very different than compressed one. For the latter, all bits and combinations should have about the same probability.

I'm assuming random data - compression helps a *lot* if your raw speed is this low.

2018-12-16 21:55

thrust64

Registered: Jun 2006
Posts: 8

> ...but the graphs show average cycles per bit as a function of the number of pulse lengths used, from 2 to 8

Ah, my bad. Interesting that 8 is still the best even with conservative timing.

> I'm assuming random data...

Hm... With random data all shouldn't arithmetic data be 0.5 for single bits and 0.25 for two bit combinations? How did you get to these different values? I am sure I am missing something (again).

2018-12-17 09:08

ChristopherJam

Registered: Aug 2004
Posts: 1409

Quoting thrust64

With random data all shouldn't arithmetic data be 0.5 for single bits and 0.25 for two bit combinations? How did you get to these different values? I am sure I am missing something (again).

It's a bit inside out - the loaded file is effectively a compressed representation of the stream of pulses from the tape (something of a headfuck, I know) - so it's the probabilities of the pulses arriving in the stream that's the pertinent factor.

The key point is that longer pulses (by definition) contribute more to the length of the recording than the short pulses, so unless they contain as much information per second as the shorter pulses, they're not pulling their weight.

I initially wasn't entirely sure of my intuition on that count, so I also wrote a function that just incrementally adjusts the weights for a set of symbols and runs simulated encodings.

That took a bit more to implement but got identical results in the end; I left the code for that in the script I linked above (it's just commented out).

Short version:

If r is the optimal transmission rate in bits per second, then each symbol needs to encode d*r bits, where d is the duration of the symbol in question. The sum of the probabilities of the symbols is one, so you just need to solve sum(0.5^(d*r))=1 to find r, then you can use r to find the bits per symbol

2018-12-17 09:44

thrust64

Registered: Jun 2006
Posts: 8

I could be wrong, but I think your math is slightly incorrect.

Simple example:

Cycles|Bits| Prob.| Cycles per bit and probability
------+----+------+-------------------------------
  100 | 0  | 0.5  | 50.00
  150 | 10 | 0.25 | 18.75
  200 | 11 | 0.25 | 25.00
------+----+------+------
            Total:  93,75

That's easy to verify: 0 takes 100 cycles and 1 takes (150+200)/2/2 = 87,5 cycles on average. So the average length per bit is (100+87.5)/2 = 93.75 cycles.

Using the same math for my own loader I get:

Cycles|Bits| Prob.| Cycles per bit and probability
------+----+------+-------------------------------
  167 |11  |0,25  | 20.875
  217 |01  |0,25  | 27.125
  267 |101 |0,125 | 11.125
  317 |100 |0,125 | 13.208
  367 |000 |0,125 | 15.292
  417 |0011|0,0625|  6.516
  467 |0010|0,0625|  7.297
------+----+------+--------
            total: 101.4375

So that would be ~101.4 cycles/bit vs ~102.9 as in your math (both using Huffman).

2018-12-17 11:00

ChristopherJam

Registered: Aug 2004
Posts: 1409

You're making the same mistake I was a few days ago :)

Symbols that output longer bit sequences consume more of the original file when it's being converted to a pulse stream, so they need to be weighted accordingly when you're computing the expected cycles per bit.

I just encoded a sequence of 1000 random bits with the [0,10,11]=>[100,150,200] encoding, and found 322 bare zeros, 165 "10" pairs, and 174 "11" pairs;
total duration 322*100+165*150+174*200=91750 cycles, or 91.75 cycles per bit, quite close to the 91.667 I get from the python snippet below:

from math import log
def log2(x):  return log(x)/log(2)

def cycles_per_bit_given_arithmetic_code(code):
    den = num = 0
    for dur,p in code:
        bits = -log2(p)
        weight = p*bits         # this is critical
        num += weight*dur/bits
        den += weight
    cycles_per_bit = num/den
    return cycles_per_bit

def hc_to_ac(hc_code):  # convert a huffman code to corresponding arithmetic code
    return [(d,0.5**len(s)) for d,s in hc_code]

hccode = [
        (100, '0'),
        (150, '10'),
        (200, '11'),
        ]

print(cycles_per_bit_given_arithmetic_code(hc_to_ac(hccode)))

or in table form:

+-----+----+------+-----------+--------+-------+
Cycles|Bits| Prob.|  w = p*b  |  c/b   | c/b*w |  (b=len(Bits))
+-----+----+------+-----------+--------+-------+
| 100 | 0  | 0.50 |    0.5    |  100.0 |  50.0 |
| 150 | 10 | 0.25 |    0.5    |   75.0 |  37.5 |
| 200 | 11 | 0.25 |    0.5    |  100.0 |  50.0 |
+-----+----+------+-----------+--------+-------+

total weighted rates = 137.5
total weights        =   1.5
weighted average     =  91.667

2018-12-17 11:25

thrust64

Registered: Jun 2006
Posts: 8

On average, I would expect ~333 0 bits plus ~167 01 and ~167 10 bits. Then we would have 333*100 + 167*150 + 167*200 = 91,750 for 1001 bits. Which about matches your number.

Quote:

You're making the same mistake I was a few days ago :)

Looks so. :)

Quote:

Symbols that output longer bit sequences consume more of the original file when it's being converted to a pulse stream, so they need to be weighted accordingly when you're computing the expected cycles per bit.

I stand corrected. You are right.

Thanks for explaining again.

2018-12-17 11:56

thrust64

Registered: Jun 2006
Posts: 8

So when we look at the results and graphs (also from enthusi), what would be the best way for a fast but still robust solution?

There a lots of factors we can play with:
1. short pulse length
2. pulse gap
3. number of pulse lengths (a result of 1. + 2.)
4. ???

And what is causing the inaccuracies?
- the varying tape speed (how much? +/-5%?) (increasing pulse gaps can cope with that)
- frequencies? E.g high frequencies have a lower amplitude than lower ones, causing less correct reads (then we would need decreasing pulse gaps to handle that)
- enthusi's graph shows some intervals of heavy distortions, what is causing these?
- varying frequencies? (e.g. a low frequency followed by a high one give less accurate results than two high ones)
- aging?
- what else?

This is really tricky. :)

2018-12-17 15:36

Hoogo

Registered: Jun 2002
Posts: 105

Quoting thrust64

...There a lots of factors we can play with:
1. short pulse length
2. pulse gap
3. number of pulse lengths (a result of 1. + 2.)
4. ???

4. 1 pulse lenght for all gaps, or make a pulse as long as the following gap?
5. Use a Datasette for recording, or can Hifi equipment create better signals?
6. ???

I'd base anything on pulses of 1/44100Hz or multiples. It's much more easy to do this kind of stuff on PC.
Quoting thrust64

And what is causing the inaccuracies?
- the varying tape speed (how much? +/-5%?) (increasing pulse gaps can cope with that)
- frequencies? E.g high frequencies have a lower amplitude than lower ones, causing less correct reads (then we would need decreasing pulse gaps to handle that)
- enthusi's graph shows some intervals of heavy distortions, what is causing these?
- varying frequencies? (e.g. a low frequency followed by a high one give less accurate results than two high ones)
- aging?...

Did some tests last sunday.

The reading test showed a little in $d020/$d418 when something was received. That happened every 1 or 2 seconds, even when no tape was running. The space was crowded, cable was running close to the monitor.

A simple write test was writing one byte repeatedly, with 22 cycles for each bit. There is a scope available, but I need more practice in its usage (and more hands).

Seems that writing 10101010 creates a flat line, 00001111 gives a very nice wave.

Most likely I will do more tests next weekend.

2018-12-17 15:46

Hoogo

Registered: Jun 2002
Posts: 105

Quoting ChristopherJam

I just encoded a sequence of 1000 random bits with the [0,10,11]=>[100,150,200] encoding...

Could you show something for 4 pulse lenghts, please?
I did a little test in Basic V2 with 2KBit of random data.
"1/00/01/111" was always longer than "00/01/10/11", "0/1/000/111" was even worse.

2018-12-17 19:57

ChristopherJam

Registered: Aug 2004
Posts: 1409

Quoting Hoogo

Could you show something for 4 pulse lenghts, please?
I did a little test in Basic V2 with 2KBit of random data.
"1/00/01/111" was always longer than "00/01/10/11", "0/1/000/111" was even worse.

Sure, I’ll put something together tomorrow. But can I ask what pulse lengths you were using? 00/01/10/11 will always minimise the count of pulses you output, but I wouldn’t expect it to minimise the total recording length unless your four pulse lengths are quite similar to each other.

Those hardware tests sound useful; I’d love to know what frequency your square waves stop being reliably readable, and how much difference the wave shape makes to that threshold (eg, if 50 cycles high/50 cycles low is fine, how about 60/40?). Sadly I sold my datasettes a few months ago, so I can’t measure anything myself at the moment,

Previous - 1 | 2 | 3 | 4 - Next

Refresh

Subscribe to this thread: