[CSDb] - User Forums - Tape loaders using more than two pulse widths for data

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Tape loaders using more than two pulse widths for data

2018-12-08 20:16

ChristopherJam

Registered: Aug 2004
Posts: 1378

Tape loaders using more than two pulse widths for data

I’ve been thinking a bit about a novel tape encoding, but it would rely on (among other things) having more than two pulse widths. So far as I can see, none of the old turbo loaders used a long pulse for anything beyond end of byte/end of block - the bitstream itself was just a sequence of short and medium pulses (usually around 200 cycles for short, 300 to 450 for long).

Is there any particular reason none of the popular loaders used (eg) four pulse widths for each bitpair? Even using quite widely separated durations of 210/320/450/600 would then lead to an average time per bit of just 197 cycles, a massive improvement on the 265 cycles per bit you’d get for 210/320.

(The idea I’ve been working on is a bit more complex than that, but if a simple bitpair scheme wouldn’t work for some reason, then the idea I have would need to be scaled back somewhat. Promising that long pulses were used for framing, mind..)

2018-12-08 20:37

tlr

Registered: Sep 2003
Posts: 1714

Blueload* uses 3 pulse lengths to implement GCR but proved unreliable on hardware. I suspect that is more due to the syncronization scheme but I haven't really investigated it deeply. I was toying with the idea to use a more complex scheme like RLL1,7 but haven't implemented anything so far.

There was some tape loader a few years back that implements hamming code IIRC.
Pavloda uses different pulse lengths but maybe that is just for end markers?

One complexity when using a lot of different pulse lengths is that the transfer function of the datasette is far from linear. There is a bandpass in the path so I guess pulses can get moved around quite a bit due to that.

* found in the trance sector master in the examples dir of tapmaster 0.4.

2018-12-08 23:44

DanPhillips

Registered: Jan 2003
Posts: 30

I wrote a 2 bits per pulse loader for the Kixx version of Armalyte, but I'm not sure that Thalamus actually used it.

Cheers

Dan

2018-12-09 05:56

ChristopherJam

Registered: Aug 2004
Posts: 1378

Dan:

http://c64tapes.org/title.php?id=1214 comments that "Main game is Cyberload loader, but level data is using a currently undocumented format," so it looks like there's a fair chance they actually used it :)

(edit: I've just been corrected by SLC on IRC, apparently the rest of the files on that release are just Warpload after all. Aww.)

Thanks for an excellent game btw; happy to say I actually bought that one on disk back in the day.

2018-12-09 06:42

ChristopherJam

Registered: Aug 2004
Posts: 1378

tlr:

Yes, I was wondering about the transfer function. But on the other hand, the CBM standard loader uses long pulses of 688 cycles, and rainbow arts microload used short pulses of 200, so surely anything in that range should be pretty safe?

Thanks for the pointer to Blueload, I should investigate if I carry on any further.

2018-12-09 12:15

tlr

Registered: Sep 2003
Posts: 1714

Quote: tlr:

Yes, I was wondering about the transfer function. But on the other hand, the CBM standard loader uses long pulses of 688 cycles, and rainbow arts microload used short pulses of 200, so surely anything in that range should be pretty safe?

Thanks for the pointer to Blueload, I should investigate if I carry on any further.

I'm thinking there might be problems if you use both extremes in combination with some medium length pulses, but that needs to be measured.

With regards to multiple pulse lengths, Kabuto style encoding should be close to optimal for three different lengths. Think of those representing '1', '10' and '100' to get the idea. I use that representation for the standardish GCR in Blueload with timings being a multiple of the single '1'. An even more efficient way would be to "compress" the timing slightly, e.g use the original kernel timings.

2018-12-09 23:26

ChristopherJam

Registered: Aug 2004
Posts: 1378

Yes, a few days ago I was looking at assigning 'costs' of 1,2,3,4 (or 2,3,4,5) to four different pulse widths, then computing the tree of all possible codes of a given length. I suspect that's isomorphic to the Kabuto approach you've just mentioned.

I was just starting to think about how to store the tree, when I had a better idea.

The optimal usage of the codes happens when each code consumes an amount of time directly proportional to the number of bits of information it represents. Eg, for two pulses one of which is double the length of the other, the probability of the longer pulse appearing should be the square of the probability of the smaller.

Throw something like
solve 0.5^(210/r)+0.5^(320/r)+0.5^(450/r)+0.5^(600/r)=1 for r
at Wolfram Alpha or some such, and you get a theoretical efficiency of one bit per 178 cycles; 3% better than I could manage from the exhaustive tree approach.

( p per symbol = [0.442 0.288 0.174 0.097]
bits per symbol = [1.179 1.797 2.527 3.369])

The loader would extract the bitstream by arithmetically encoding the incoming pulse widths. This only requires low precision constant multiplies, so a few simple to build tables would make that quite fast. Easier for me to wrap my brain around than Kabuto's decoder, too.

Sadly, for the example set of pulses widths above it's only an 11% improvement on just using each pulse widths for a different output bitpair, so I really don't know how worthwhile any of this is. At this stage I'm just throwing the idea out there, because I'd prefer to be working on other things right now. I just don't want the idea to get lost :)

2018-12-10 10:08

enthusi

Registered: May 2004
Posts: 675

The most prominent multi-pulse-format would probably be Super Tape that utilized 3 different lengths.
It was first announced in german C't mag 04/1984 and ported to many systems of that time. It gained 3600 Bits/s and also had a 7200 Bit/s moe that however, was supposed to only work with suitable hardware.
The problem is less the min/max times but rather the proper thresholding. Some loaders self-adjust to the pilot tone which helps a great deal.
If you are interested in heavy error correction mechanisms for tape loading this would be suitable:
Neoload
Possibly the most overkill loader ever ;-)
I recently dumped a loader that used 6 (!) different pulselengths by Thomas Jentzsch. ;-)

We eventually got most of the data from the old 90'er tapes to load. The sources of the loader helped, though :)

2018-12-10 13:32

Frantic

Registered: Mar 2003
Posts: 1627

I know Deep Throat involved experimentation with three (or four?) different pulse lengths for data while it was created, but when I asked the authors just now none of them could remember if that made it into the final production. One of them said they probably settled for two different ones in the end after all, and the idea of using three different pulse lengths was abandoned for various reasons relating to limitations for what works well on the real hardware and things like code size and so forth.

(Is it true that the format used by the Kernal actually represents each bit of data four times on a cassette, or is that something I've dreamed in some dreadful nightmare at some point?)

2018-12-10 14:52

ChristopherJam

Registered: Aug 2004
Posts: 1378

Ooh, interesting about Super Tape. Haven't manage to find any documentation on that one, though I'll keep looking.

Neoload's error correction is seriously impressive; I guess that's a must for such a long file mind :)

What were the range of pulse lengths on the loader that used six of them? Interesting to see that they're evenly spaced in any case; I'd been pondering using larger gaps between the longer lengths so as to be more tolerant of speed variations (a 10% slower tape motor will affect the duration of long pulses more than the duration of short ones).

Frantic: yup, Kernal represents each bit as short+medium for zero, medium+short for one. Then the entire file is repeated. There's a nice writup at

http://c64tapes.org/dokuwiki/doku.php?id=loaders:rom_loader

But yes, looks like the final version of Deep Throat just used two pulse lengths for data. The tap file contains a long pulse every 20 pulses (framing bytes or words I assume), with the 19 between being a mix of short and medium pulses.

2018-12-10 15:24

Frantic

Registered: Mar 2003
Posts: 1627

I may be wrong, but I think think the long pulses in deep throat are used as "key frames", to allow the user to ffwd and rewind, and sync again, to watch the movie from that point on. At least I remember that this was a feature that was supported in this demo.

2018-12-11 10:00

enthusi

Registered: May 2004
Posts: 675

The 6-pulse loader is NOTHING you'd want for a release.
It was used privately to store own sources/code by Thomas 'back in the day'. Each pulse encoded 3 bits:
%101,%111,%110,%100,%001,%000
Ah, but it used 7 pulses in fact :)
One pulse length encodes 2 bits '01'
I just made a quick histogram (cycles):

While this looks nice and clean, this is how the actual data of basically the best dump looked like:

For a 'public use' loader I strongly suggest to use MUCH larger gaps. The loaders I wrote for the load of tap demo or Jars' Revenge etc are pretty stable. Stability over speed :)
(I just skipped the kernal load header for the histogram, it is based on that data)

2018-12-12 08:40

ChristopherJam

Registered: Aug 2004
Posts: 1378

Wow, so fitting a line to the approximate peaks, I'm getting pulse lengths of

 {167, 217, 268, 318, 369, 419, 470},

so increments of around 50 cycles. That's... brave.

Thanks for the encoding details, simulating how well it performs has revealed a flaw in my calculations. (I was only weighting the symbols by probability that they'd be next, instead of also including a factor for how much output they'd produce).

(also, everywhere I've said pulse width on this page, substitute pulse length; apologies for the ambiguity).

2018-12-12 09:53

Hoogo

Registered: Jun 2002
Posts: 102

...this is how the actual data of basically the best dump looked like...

Do you have any idea what caused these absolutely blurred signals mostly at the end of the blocks?

I guess these were multiple files, and those signals might be some of the original music...

2018-12-12 10:24

enthusi

Registered: May 2004
Posts: 675

Yes, at least the backside had music on it ;-)
But two other things to note. If you press stop/play the starting motor will also produce all sorts of pulse-lengths (at least when read back via c2n) and of course these graphs show TAP bytes, not mere data bytes, so encoded pauses are also shown as pulses. That is the reason for those seemingly spurios pixels at the same horizontal position. Those others I'd bet as well are music.
50 cycles is _WAY_ too short unless you record/play on the very same device without touching anything (and no bad lines, sprites etc obviously).

2018-12-12 12:07

thrust64

Registered: Jun 2006
Posts: 8

That loader is mine. :)

And of course it was only meant for personal use on the same hardware. And hey, it worked for me! :D

I don't remember exactly how I came to 7 pulse lengths. But I did some statistics about bit combinations from some files I had and then some math based on the minimum pulse length differences which seemed feasible. And that resulted into the 7 lengths for optimal performance.

2018-12-12 12:11

thrust64

Registered: Jun 2006
Posts: 8

My target speed was 10k. That also played a role, IIRC.

2018-12-12 12:30

ChristopherJam

Registered: Aug 2004
Posts: 1378

I'm impressed :)

2018-12-12 12:47

enthusi

Registered: May 2004
Posts: 675

Ah good to see you here :)
I, too, was/am impressed, hehe.

2018-12-15 00:39

ChristopherJam

Registered: Aug 2004
Posts: 1378

I've managed to verify my hypothesis that the optimal weights to use for an arithmetic encoding based loader are those that ensure each symbol outputs the same number of bits per cycle.

Approximating the arithmetic encoder with a huffman code only loses one or two percent of efficiency, so that would probably be a somewhat saner path to take.

On the way, I found a marginally faster encoding using thrust64's pulse lengths. Using output words of { 11, 01, 101, 100, 000, 0011, 0010 } results in an average cost per bit of 102.9 cycles, compared
to thrust64's 108.8

Have some pretty graphs, and the script that generated them.

Each graph covers a family of pulse lengths, as given by the equations in the titles. The two lines show optimal cost per output bit, as a function of the number of pulse lengths used for the encoding.
This is purely for the data rate within each word; framing bits and error correction are outside the scope of this investigation, and would in practice add an overhead of 10-20%

There's a table after each graph giving the optimal encoding for the case where seven distinct pulse lengths are used. Note that if the gap is comparable to the shortest pulse length, the
optimal huffman encoding is isomorphic to transition=1, no transition=0, with some extra transitions inserted after long runs of zeros. This changes when the gap is relatively
small compared to the shortest length.

Using Thomas Jentzsch's pulse spacing (167+50*n cycles)
but optimising the encoding for that spacing

+----------------+-----------------+--------------+
| Pulse duration | arithmetic code | huffman code |
+----------------+-----------------+--------------+
| 167 cycles     | 0.31886         | 11           |
| 217 cycles     | 0.22645         | 01           |
| 267 cycles     | 0.16083         | 101          |
| 317 cycles     | 0.11422         | 100          |
| 367 cycles     | 0.08112         | 000          |
| 417 cycles     | 0.05761         | 0011         |
| 467 cycles     | 0.04091         | 0010         |
+----------------+-----------------+--------------+
mean cycles per bit, arithmetic code = 101.3
mean cycles per bit,    huffman code = 102.9

Using AR Turbo's short pulse, Fast Evil's spacing
(150+80*n cycles)

+----------------+-----------------+--------------+
| Pulse duration | arithmetic code | huffman code |
+----------------+-----------------+--------------+
| 150 cycles     | 0.39979         | 0            |
| 230 cycles     | 0.24517         | 10           |
| 310 cycles     | 0.15035         | 110          |
| 390 cycles     | 0.09220         | 1110         |
| 470 cycles     | 0.05654         | 11111        |
| 550 cycles     | 0.03468         | 111101       |
| 630 cycles     | 0.02127         | 111100       |
+----------------+-----------------+--------------+
mean cycles per bit, arithmetic code = 113.4
mean cycles per bit,    huffman code = 116.2

Using AR Turbo's short pulse, Fast Evil's min spacing,
increasing gap for longer pulses (150+80*n**1.1 cycles)

+----------------+-----------------+--------------+
| Pulse duration | arithmetic code | huffman code |
+----------------+-----------------+--------------+
| 150 cycles     | 0.41389         | 0            |
| 230 cycles     | 0.25856         | 10           |
| 320 cycles     | 0.15230         | 110          |
| 416 cycles     | 0.08660         | 1110         |
| 516 cycles     | 0.04810         | 11111        |
| 618 cycles     | 0.02640         | 111101       |
| 724 cycles     | 0.01415         | 111100       |
+----------------+-----------------+--------------+
mean cycles per bit, arithmetic code = 117.9
mean cycles per bit,    huffman code = 119.7

something a bit more conservative (180+ 90*n)

+----------------+-----------------+--------------+
| Pulse duration | arithmetic code | huffman code |
+----------------+-----------------+--------------+
| 180 cycles     | 0.38997         | 0            |
| 270 cycles     | 0.24352         | 10           |
| 360 cycles     | 0.15207         | 110          |
| 450 cycles     | 0.09497         | 1110         |
| 540 cycles     | 0.05930         | 11110        |
| 630 cycles     | 0.03703         | 111111       |
| 720 cycles     | 0.02313         | 111110       |
+----------------+-----------------+--------------+
mean cycles per bit, arithmetic code = 132.5
mean cycles per bit,    huffman code = 136.4

https://jamontoads.herokuapp.com/csdb/201812/compare_encodings...

No, I'm not going to go and write a loader based on this investigation in the near future. But, I hope this is useful or at least interesting to someone else out there.

Many thanks to SLC, Enthusi, thrust64 & tlr for their assistance.

2018-12-16 14:52

thrust64

Registered: Jun 2006
Posts: 8

Cool stuff. I remember that I did some similar research back then.

You limited your research to 7 different pulse lengths. Depending on the shortest pulse and the spacing, this may not be the optimal solution. E.g. for the last, conservative approach, I am pretty sure that a lower number of pulse lengths would give a better result.

Also, what sample data did you base your pattern distribution on? E.g uncompressed data must result into pretty different values depending on the content and very different than compressed one. For the latter, all bits and combinations should have about the same probability.

2018-12-16 21:18

ChristopherJam

Registered: Aug 2004
Posts: 1378

Quoting thrust64

Cool stuff. I remember that I did some similar research back then.

Thanks! I assumed as much.

Quote:

You limited your research to 7 different pulse lengths. Depending on the shortest pulse and the spacing, this may not be the optimal solution. E.g. for the last, conservative approach, I am pretty sure that a lower number of pulse lengths would give a better result.

My apologies, the tables are results for 7 pulse lengths, but the graphs show average cycles per bit as a function of the number of pulse lengths used, from 2 to 8. If you look at those, you can see that the more pulses lengths you use, the better your throughput (though admittedly you're most of the way to optimal by five or six).

Quote:

Also, what sample data did you base your pattern distribution on? E.g uncompressed data must result into pretty different values depending on the content and very different than compressed one. For the latter, all bits and combinations should have about the same probability.

I'm assuming random data - compression helps a *lot* if your raw speed is this low.

2018-12-16 21:55

thrust64

Registered: Jun 2006
Posts: 8

> ...but the graphs show average cycles per bit as a function of the number of pulse lengths used, from 2 to 8

Ah, my bad. Interesting that 8 is still the best even with conservative timing.

> I'm assuming random data...

Hm... With random data all shouldn't arithmetic data be 0.5 for single bits and 0.25 for two bit combinations? How did you get to these different values? I am sure I am missing something (again).

2018-12-17 09:08

ChristopherJam

Registered: Aug 2004
Posts: 1378

Quoting thrust64

With random data all shouldn't arithmetic data be 0.5 for single bits and 0.25 for two bit combinations? How did you get to these different values? I am sure I am missing something (again).

It's a bit inside out - the loaded file is effectively a compressed representation of the stream of pulses from the tape (something of a headfuck, I know) - so it's the probabilities of the pulses arriving in the stream that's the pertinent factor.

The key point is that longer pulses (by definition) contribute more to the length of the recording than the short pulses, so unless they contain as much information per second as the shorter pulses, they're not pulling their weight.

I initially wasn't entirely sure of my intuition on that count, so I also wrote a function that just incrementally adjusts the weights for a set of symbols and runs simulated encodings.

That took a bit more to implement but got identical results in the end; I left the code for that in the script I linked above (it's just commented out).

Short version:

If r is the optimal transmission rate in bits per second, then each symbol needs to encode d*r bits, where d is the duration of the symbol in question. The sum of the probabilities of the symbols is one, so you just need to solve sum(0.5^(d*r))=1 to find r, then you can use r to find the bits per symbol

2018-12-17 09:44

thrust64

Registered: Jun 2006
Posts: 8

I could be wrong, but I think your math is slightly incorrect.

Simple example:

Cycles|Bits| Prob.| Cycles per bit and probability
------+----+------+-------------------------------
  100 | 0  | 0.5  | 50.00
  150 | 10 | 0.25 | 18.75
  200 | 11 | 0.25 | 25.00
------+----+------+------
            Total:  93,75

That's easy to verify: 0 takes 100 cycles and 1 takes (150+200)/2/2 = 87,5 cycles on average. So the average length per bit is (100+87.5)/2 = 93.75 cycles.

Using the same math for my own loader I get:

Cycles|Bits| Prob.| Cycles per bit and probability
------+----+------+-------------------------------
  167 |11  |0,25  | 20.875
  217 |01  |0,25  | 27.125
  267 |101 |0,125 | 11.125
  317 |100 |0,125 | 13.208
  367 |000 |0,125 | 15.292
  417 |0011|0,0625|  6.516
  467 |0010|0,0625|  7.297
------+----+------+--------
            total: 101.4375

So that would be ~101.4 cycles/bit vs ~102.9 as in your math (both using Huffman).

2018-12-17 11:00

ChristopherJam

Registered: Aug 2004
Posts: 1378

You're making the same mistake I was a few days ago :)

Symbols that output longer bit sequences consume more of the original file when it's being converted to a pulse stream, so they need to be weighted accordingly when you're computing the expected cycles per bit.

I just encoded a sequence of 1000 random bits with the [0,10,11]=>[100,150,200] encoding, and found 322 bare zeros, 165 "10" pairs, and 174 "11" pairs;
total duration 322*100+165*150+174*200=91750 cycles, or 91.75 cycles per bit, quite close to the 91.667 I get from the python snippet below:

from math import log
def log2(x):  return log(x)/log(2)

def cycles_per_bit_given_arithmetic_code(code):
    den = num = 0
    for dur,p in code:
        bits = -log2(p)
        weight = p*bits         # this is critical
        num += weight*dur/bits
        den += weight
    cycles_per_bit = num/den
    return cycles_per_bit

def hc_to_ac(hc_code):  # convert a huffman code to corresponding arithmetic code
    return [(d,0.5**len(s)) for d,s in hc_code]

hccode = [
        (100, '0'),
        (150, '10'),
        (200, '11'),
        ]

print(cycles_per_bit_given_arithmetic_code(hc_to_ac(hccode)))

or in table form:

+-----+----+------+-----------+--------+-------+
Cycles|Bits| Prob.|  w = p*b  |  c/b   | c/b*w |  (b=len(Bits))
+-----+----+------+-----------+--------+-------+
| 100 | 0  | 0.50 |    0.5    |  100.0 |  50.0 |
| 150 | 10 | 0.25 |    0.5    |   75.0 |  37.5 |
| 200 | 11 | 0.25 |    0.5    |  100.0 |  50.0 |
+-----+----+------+-----------+--------+-------+

total weighted rates = 137.5
total weights        =   1.5
weighted average     =  91.667

2018-12-17 11:25

thrust64

Registered: Jun 2006
Posts: 8

On average, I would expect ~333 0 bits plus ~167 01 and ~167 10 bits. Then we would have 333*100 + 167*150 + 167*200 = 91,750 for 1001 bits. Which about matches your number.

Quote:

You're making the same mistake I was a few days ago :)

Looks so. :)

Quote:

Symbols that output longer bit sequences consume more of the original file when it's being converted to a pulse stream, so they need to be weighted accordingly when you're computing the expected cycles per bit.

I stand corrected. You are right.

Thanks for explaining again.

2018-12-17 11:56

thrust64

Registered: Jun 2006
Posts: 8

So when we look at the results and graphs (also from enthusi), what would be the best way for a fast but still robust solution?

There a lots of factors we can play with:
1. short pulse length
2. pulse gap
3. number of pulse lengths (a result of 1. + 2.)
4. ???

And what is causing the inaccuracies?
- the varying tape speed (how much? +/-5%?) (increasing pulse gaps can cope with that)
- frequencies? E.g high frequencies have a lower amplitude than lower ones, causing less correct reads (then we would need decreasing pulse gaps to handle that)
- enthusi's graph shows some intervals of heavy distortions, what is causing these?
- varying frequencies? (e.g. a low frequency followed by a high one give less accurate results than two high ones)
- aging?
- what else?

This is really tricky. :)

2018-12-17 15:36

Hoogo

Registered: Jun 2002
Posts: 102

Quoting thrust64

...There a lots of factors we can play with:
1. short pulse length
2. pulse gap
3. number of pulse lengths (a result of 1. + 2.)
4. ???

4. 1 pulse lenght for all gaps, or make a pulse as long as the following gap?
5. Use a Datasette for recording, or can Hifi equipment create better signals?
6. ???

I'd base anything on pulses of 1/44100Hz or multiples. It's much more easy to do this kind of stuff on PC.
Quoting thrust64

And what is causing the inaccuracies?
- the varying tape speed (how much? +/-5%?) (increasing pulse gaps can cope with that)
- frequencies? E.g high frequencies have a lower amplitude than lower ones, causing less correct reads (then we would need decreasing pulse gaps to handle that)
- enthusi's graph shows some intervals of heavy distortions, what is causing these?
- varying frequencies? (e.g. a low frequency followed by a high one give less accurate results than two high ones)
- aging?...

Did some tests last sunday.

The reading test showed a little in $d020/$d418 when something was received. That happened every 1 or 2 seconds, even when no tape was running. The space was crowded, cable was running close to the monitor.

A simple write test was writing one byte repeatedly, with 22 cycles for each bit. There is a scope available, but I need more practice in its usage (and more hands).

Seems that writing 10101010 creates a flat line, 00001111 gives a very nice wave.

Most likely I will do more tests next weekend.

2018-12-17 15:46

Hoogo

Registered: Jun 2002
Posts: 102

Quoting ChristopherJam

I just encoded a sequence of 1000 random bits with the [0,10,11]=>[100,150,200] encoding...

Could you show something for 4 pulse lenghts, please?
I did a little test in Basic V2 with 2KBit of random data.
"1/00/01/111" was always longer than "00/01/10/11", "0/1/000/111" was even worse.

2018-12-17 19:57

ChristopherJam

Registered: Aug 2004
Posts: 1378

Quoting Hoogo

Could you show something for 4 pulse lenghts, please?
I did a little test in Basic V2 with 2KBit of random data.
"1/00/01/111" was always longer than "00/01/10/11", "0/1/000/111" was even worse.

Sure, I’ll put something together tomorrow. But can I ask what pulse lengths you were using? 00/01/10/11 will always minimise the count of pulses you output, but I wouldn’t expect it to minimise the total recording length unless your four pulse lengths are quite similar to each other.

Those hardware tests sound useful; I’d love to know what frequency your square waves stop being reliably readable, and how much difference the wave shape makes to that threshold (eg, if 50 cycles high/50 cycles low is fine, how about 60/40?). Sadly I sold my datasettes a few months ago, so I can’t measure anything myself at the moment,

2018-12-17 21:18

Hoogo

Registered: Jun 2002
Posts: 102

Quoting ChristopherJam

...But can I ask what pulse lengths you were using? 00/01/10/11 will always minimise the count of pulses you output, but I wouldn’t expect it to minimise the total recording length unless your four pulse lengths are quite similar to each other.

I have used 1 pulse + 1 to 4 gaps of equal size, so 2...5 overall length.

For the random test data:

00 207
01 232
10 239
11 202
=>1800 Bits, length 3156 pulses+gaps.
If all appeared 225 times, length would be 3150.

1 365
00 297
01 311
111 73
=>1800 Buts, length 3230.

The arithmetic distribution would be best at 39%, 26%, 19%, 16%, but I don't get the idea how to find a nice Huffmann for that.

2018-12-17 22:17

ChristopherJam

Registered: Aug 2004
Posts: 1378

oh!

Um, that's not a huffman code. (you can't tell from the bits read so far whether you've finished the current codeword or not)
Try
0
10
110
111

Algorithm is to combine the two smallest probability items into a tree node that has the two items as children, and replace the children in your list of things to process with the parent. (Have a look at the function huffman_encode in compare_encodings.py linked above - I lifted it from rosettacode,org )

Naming the codes for the four pulse lengths A B C & D
codes C and D combine first, so whether you get a flat code (two bits each) or not depends on whether the next combine step joins code B with {parent of C & D} or with code A

If the gap is more than around 28% of the minimum pulse length, the likelyhood of C or D is less than the likelyhood of A, so an unbalanced tree is generated.

spacing of (100+ 100*n)
+----------------+-----------------+--------------+
| Pulse duration | arithmetic code | huffman code |
+----------------+-----------------+--------------+
| 100 cycles     | 0.51879         | 1            |
| 200 cycles     | 0.26914         | 01           |
| 300 cycles     | 0.13963         | 001          |
| 400 cycles     | 0.07244         | 000          |
+----------------+-----------------+--------------+
mean cycles per bit, arithmetic code = 105.6
mean cycles per bit,    huffman code = 107.1

spacing of (100+  28*n)
+----------------+-----------------+--------------+
| Pulse duration | arithmetic code | huffman code |
+----------------+-----------------+--------------+
| 100 cycles     | 0.36380         | 0            |
| 128 cycles     | 0.27410         | 10           |
| 156 cycles     | 0.20651         | 111          |
| 184 cycles     | 0.15559         | 110          |
+----------------+-----------------+--------------+
mean cycles per bit, arithmetic code =  68.6
mean cycles per bit,    huffman code =  71.1

spacing of (100+  27*n)
+----------------+-----------------+--------------+
| Pulse duration | arithmetic code | huffman code |
+----------------+-----------------+--------------+
| 100 cycles     | 0.36057         | 11           |
| 127 cycles     | 0.27376         | 10           |
| 154 cycles     | 0.20785         | 01           |
| 181 cycles     | 0.15781         | 00           |
+----------------+-----------------+--------------+
mean cycles per bit, arithmetic code =  68.0
mean cycles per bit,    huffman code =  70.2

2018-12-17 23:07

Hoogo

Registered: Jun 2002
Posts: 102

Got it! Soo obvious now :)

Refresh

Subscribe to this thread: