[CSDb] - User Forums - Understanding 1541 byte-sync and buffering

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Understanding 1541 byte-sync and buffering

2016-01-16 13:18

doynax
Account closed

Registered: Oct 2004
Posts: 212

Understanding 1541 byte-sync and buffering

Lately I have been attempting work the kinks out of some old drive code. To be honest much of it was produced by trial-and-error and by peeking at the code of others, so I've been putting off getting to grips with how the device _actually_ works for quite some time now.

At the moment I'm stuck trying to resolve some issues with the drive head occasionally dropping bytes during reads and injecting extra bits during writes and I've come to the conclusion that I ought to make sense of how the GCR byte buffer actually works.

Unfortunately the documentation available is somewhat lacking and it is difficult to know how far to trust the emulator sources. Incidentally, I don't suppose there is a high-quality scan of the classic 1541 schematic (the discrete version without the PLA) out there? Ideally annotated for the hardware-challenged among us :)

My mental model is that of an 8-bit shift-register clocking through flux transitions as set bits to/from the drive head. Once empty/full the next byte is placed onto/taken from the VIA2 PRA port and a byte-ready pulse sent to the 6502 V-flag input along with VIA2 CA1. Plus there is a counter detecting >=10-bit SYNC fields during reads, at which point the shift register is reset and the SYNC signal asserted. While writing the speedzone-divider clocks this directly, whereas during reads the clock is recovered from the flux-transitions or after spaces somewhat wider than the bit period.

This broadly jives with observations such as the initial post-SYNC $FF byte, the echoing of previously read data after a write-mode transition, and observed behavior when a byte is read/written late. Except I still see glitches and oddities.

For one thing there appears to be some form of handshaking affecting the byte-ready signal. One my code does by trial-and-error is a dummy read of the $FF byte after the sync field, without which the tag byte doesn't get extracted properly, i.e.:

	bit $1c00	;Wait for sync
	bmi *-3
	nop $1c01	;Reading any other address causes trouble
	clv
	bvc *
	lda $1c01	;Tag byte

This is despite VIA read handshaking having been disabled with SoE on CA1 kept permanently asserted.
At any rate my 1541-II/1571 and Kryoflux have trouble whereas VICE 2.4 doesn't care aside from unlatching a 1571 status-bit on any VIA2 register read.
It is not immediately obvious from the Kryoflux code what is going on but then VHDL is admittedly hard going for me. Plus given how it handles write buffering (byte-aligning to the stream and dropping the first two bytes) I'm not putting much faith in its accuracy.

</rant>

I apologize for making a mountain out of a molehill here but I really do keep running into weird glitches which I can't quite understand and this is about the only reproducible one out of the lot ;)

Side-note: I warmly endorse the Kryoflux for anyone tinkering with the 1541 and wanting to know what is getting written out to disk

2016-01-16 13:38

chatGPZ

Registered: Dec 2001
Posts: 11386

have you looked at the current VICE source (not 2.4, its way old and _very_ broken)? there is a longish description of how the data separator works (written by one of the kryoflux guys :))

edit: https://sourceforge.net/p/vice-emu/code/HEAD/tree/trunk/vice/sr..

2016-01-16 16:02

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quoting Groepaz

have you looked at the current VICE source (not 2.4, its way old and _very_ broken)? there is a longish description of how the data separator works (written by one of the kryoflux guys :))

Thank you, I wasn't aware of that new code. Those comments certainly look like the circuit description I was looking for :)

I did run the test in Kryoflux "SPS" branch of VICE and a recent nightly build with the same results as well, but to be honest I don't have a firm grasp of what VICE releases are considered stable or how the project has forked in recent years.

Incidentally I meant to say that I had been testing on the 1541-U before and not the Kryoflux. I guess I must that new toy of mine on my mind ;)

2016-01-16 16:39

tlr

Registered: Sep 2003
Posts: 1790

Quoting doynax

Unfortunately the documentation available is somewhat lacking and it is difficult to know how far to trust the emulator sources. Incidentally, I don't suppose there is a high-quality scan of the classic 1541 schematic (the discrete version without the PLA) out there? Ideally annotated for the hardware-challenged among us :)

I can recommend you SAMS "Commodore 1541 Troubleshooting & Repair Guide":
http://www.codebase64.net/books/commodore-1541-troubleshooting-..

2016-01-16 16:40

chatGPZ

Registered: Dec 2001
Posts: 11386

using a recent VICE build is recommended, especially for drive related stuff... its usually stable enough for daily use (except exceptions!)

using 1541U for testing on the other hand isnt a good idea, the drive emu is in some parts even more broken than VICE is - really always use a real drive, or you will get nasty surprises.

2016-01-16 22:43

doynax
Account closed

Registered: Oct 2004
Posts: 212

Well, it turned out that it was all just me being stupid. As usual.

There is even a terse "FIXME" note in the recent VICE sources about it.

Essentially the byte-sync input latching isn't done externally by the shift-register but internally by the VIA based on the byte-ready flanks. When enabled anyway.
Furthermore PA isn't latched in on every flank. Rather the port is sampled every cycle up until byte-sync is asserted, at which point it stops and waits for $1c01 to be read by the CPU which resets the process. Lather, rinse and repeat.

In other words the post-sync dummy read is required in order for the byte-sync flank after the first proper payload byte to be latched in. It isn't even necessarily $FF but whatever byte came along immediately after the sampling had last stopped, so spinning up until the expected $52/$54 tag byte is subtly buggy.

I suppose a neat side-effect is that the spin-down case of the sector reading loop doesn't require strict timing in grabbing the final byte.

Quoting tlr

I can recommend you SAMS "Commodore 1541 Troubleshooting & Repair Guide"

Thank you, that's another excellent resource I had somehow managed to miss out on. It seems to be easily sourced off-of eBay even.

Quoting Groepaz

using 1541U for testing on the other hand isnt a good idea, the drive emu is in some parts even more broken than VICE is - really always use a real drive, or you will get nasty surprises.

Curiously it is actually more accurate in this particular case. The VICE 1541 VIA2 emulation seems to be the minimum required for the normal operating mode whereas 1541-U uses a fully-featured generic VIA block.

2016-01-17 10:46

chatGPZ

Registered: Dec 2001
Posts: 11386

Quote:

The VICE 1541 VIA2 emulation seems to be the minimum required for the normal operating mode whereas 1541-U uses a fully-featured generic VIA block.

not really. try running the VIA testprogs from vice repo.... VICE passes more of them than 1541U (using 2.6k fw at least, i didnt bother testing the 3.0beta yet). 1541U even fails a couple CPU tests from the lorenz suite for that matter (which is pretty wtf)

2016-01-17 11:20

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quoting Groepaz

not really. try running the VIA testprogs from vice repo.... VICE passes more of them than 1541U (using 2.6k fw at least, i didnt bother testing the 3.0beta yet).

I think you are right. Looking a little deeper it seems that it is mostly just latching being compiled out/dodgy and instead the floppy stream code takes partial responsibility for handling the latching. Nothing comparable to the horrendous Kryoflux GCR write hack certainly.

Incidentally would you recommend submitting test cases on running into these sorts of edge cases? I am much too lazy to work out all of the implications of enabling VIA latching globally but I ought to at least be able to manage a short repro case ;)

2016-01-17 11:31

chatGPZ

Registered: Dec 2001
Posts: 11386

if you have small test cases, please submit them to the VICE repo indeed :) the existing VIA tests only test very basic things, i am sure there is a lot more thats broken and needs fixing :)

2016-01-17 12:50

Thierry

Registered: Oct 2009
Posts: 48

Quote: Quote:
The VICE 1541 VIA2 emulation seems to be the minimum required for the normal operating mode whereas 1541-U uses a fully-featured generic VIA block.

not really. try running the VIA testprogs from vice repo.... VICE passes more of them than 1541U (using 2.6k fw at least, i didnt bother testing the 3.0beta yet). 1541U even fails a couple CPU tests from the lorenz suite for that matter (which is pretty wtf)

Try unofficial Horrocks Update,

http://ar.c64.org/rrwiki/images/a/af/1541_U_II_2.6k_-_Unofficia..

many fixes and more compatible

2016-01-17 12:54

chatGPZ

Registered: Dec 2001
Posts: 11386

i know that update obviously, since i added it to that page =) (however, i only use official versions for testing things)

2016-01-17 14:48

soci

Registered: Sep 2003
Posts: 480

He only up-streamed some VIA related changes for version 3 as far as I know.

2016-01-17 20:17

Fungus

Registered: Sep 2002
Posts: 686

Hrm setting the clock divider to the wrong frequency when reading data will result in incorrect data being read. So it can't be derived from reading the incoming flux transitions.

I was just reading up on Frequency Shift Keying (like tape uses) and it would seem to be that the drives are using the same type of technique in hardware rather than software. Since the incorrect divider frequency would change the time period for flux transitions to be valid, no?

Also the read is needed to clear the latched value, that's pretty straight forward and normal VIA/CIA behavior. If it's not cleared fast enough then clocked in transitions will be missed.

2016-01-18 10:36

Martin Piper

Registered: Nov 2007
Posts: 722

Back a few years ago I had better success with debugging drive code with Hoxs. I seem to the ability to single step either C64 or 1541 helped a lot.

2016-01-18 12:34

chatGPZ

Registered: Dec 2001
Posts: 11386

you can do that with VICE just fine :)

2016-01-18 12:52

Martin Piper

Registered: Nov 2007
Posts: 722

Probably can now. Years ago VICE emulation with regards to drive code timing was very poor. Hoxs and real hardware were the only options.
I must try it in the newest VICE sometime.

2016-01-18 12:55

chatGPZ

Registered: Dec 2001
Posts: 11386

it has been possible in VICE since forever. and HOXS just (relativly) recently catched up with it.

2016-01-18 13:10

Martin Piper

Registered: Nov 2007
Posts: 722

I distinctly remember Hoxs and real hardware working when VICE did not, when debugging 2 bit drive transfer code.

2016-01-18 13:20

chatGPZ

Registered: Dec 2001
Posts: 11386

let me guess, you were using that 2.1 VICE from your repo?

2016-01-18 13:49

Martin Piper

Registered: Nov 2007
Posts: 722

Before that.

2016-01-18 13:51

chatGPZ

Registered: Dec 2001
Posts: 11386

yeah, ok. not "a few" years ago then in my book (more like 10 or even more :=))

2016-01-18 13:52

Martin Piper

Registered: Nov 2007
Posts: 722

I'm old. Time passes differently.

2016-01-18 15:34

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quoting Fungus

Hrm setting the clock divider to the wrong frequency when reading data will result in incorrect data being read. So it can't be derived from reading the incoming flux transitions.

I was just reading up on Frequency Shift Keying (like tape uses) and it would seem to be that the drives are using the same type of technique in hardware rather than software. Since the incorrect divider frequency would change the time period for flux transitions to be valid, no?

As near as I can tell it is implemented by a counter clocked by a multiple of bit timer. On a flux transition the counter is reset and a 1-bit shifted out, whereas a lack of flux changes runs up the timer until a 0-bit is shifted out when no data seems to be is forthcoming.

Quoting Fungus

Also the read is needed to clear the latched value, that's pretty straight forward and normal VIA/CIA behavior. If it's not cleared fast enough then clocked in transitions will be missed.

Indeed, I don't know how I missed that. I had somehow completely forgotten about the internal VIA latching and gotten the notion that it was handled externally.

Quoting Martin Piper

Probably can now. Years ago VICE emulation with regards to drive code timing was very poor. Hoxs and real hardware were the only options.
I must try it in the newest VICE sometime.

Well, Hoxs does have the advantage of being able to step cycle-by-cycle, which comes in handy when debugging something timing-critical. Plus the lazy C64/drive synchronization in VICE can be a tad confusing when single-stepping through an IEC communication loop in parallel.

Of course for general code VICE makes up for it all with being able to import label files and script breakpoints/assertions.

2016-01-18 17:49

Fungus

Registered: Sep 2002
Posts: 686

Yes it using FSK then, the flux transitions are all edge triggered and half waves. This makes sense since it's the technology they use for tape and modem communications too. It's old an easy to implement and works. I was looking at the timing diagrams in the PRG after reading this and that appears to be a correct assumption which is easily verifiable by anyone with a scope or capture tool.

So that does mean that the divider has to be set to the correct speed or the in-clocking will produce invalid results. It's possible this could be exploited for copy protection purposes... hrm interesting idea.

2016-01-18 19:25

Kabuto
Account closed

Registered: Sep 2004
Posts: 58

Wondering how reliably a disk could be read where all pulses (= ranges of same flux direction) are shortened by some % of a bit's duration. This could be abused for creating nearly uncopyable disks.

With standard GCR and its average pulse duration of 1.6 bits shortening all of them by 25% would allow squeezing in 18% more data.

But copying such a disk would be impossible with standard equipment, you could slow down the motor, but that would reduce length of 3-bit pulses (i.e. encoded bits 1001) to 2.5 bits, as mentioned earlier that's exactly where the electronics decide whether to treat it as 101 or 1001.

2016-01-18 19:28

chatGPZ

Registered: Dec 2001
Posts: 11386

thats actually not uncommon - vmax for example did this iirc

2016-01-18 19:51

tlr

Registered: Sep 2003
Posts: 1790

Quoting Fungus

Yes it using FSK then, the flux transitions are all edge triggered and half waves. This makes sense since it's the technology they use for tape and modem communications too. It's old an easy to implement and works.

This is not what we'd normally call FSK. FSK as used on CBM tapes has variable bit lengths.

The scheme is rather a very crude PLL trying to lock on to the rate of bits by just resetting every time a '1' is seen.
The encoding is still just constant length '1's or '0's which have the requirement that no more than two '0's can be in a row.

There are two reasons for the requirement:
1. if there are to many '0's the PLL can't keep track of the bits within the variation of speed it is required to handle.
2. there is an anomaly in the implementation that wraps around after a '1' and three '0's. When it wraps it will generate a spurious '1' (and repeat the process). This is what is seen if the track contains no flux transitions. Three '0's in a row _can_ work but is unreliable. I see to remember that newer 1541 have more problems with these.

2016-01-18 19:54

chatGPZ

Registered: Dec 2001
Posts: 11386

too bad mr.drew cant comment at this point :)

2016-01-18 21:09

Kabuto
Account closed

Registered: Sep 2004
Posts: 58

Quote: Quoting Fungus
Yes it using FSK then, the flux transitions are all edge triggered and half waves. This makes sense since it's the technology they use for tape and modem communications too. It's old an easy to implement and works.

This is not what we'd normally call FSK. FSK as used on CBM tapes has variable bit lengths.

The scheme is rather a very crude PLL trying to lock on to the rate of bits by just resetting every time a '1' is seen.
The encoding is still just constant length '1's or '0's which have the requirement that no more than two '0's can be in a row.

There are two reasons for the requirement:
1. if there are to many '0's the PLL can't keep track of the bits within the variation of speed it is required to handle.
2. there is an anomaly in the implementation that wraps around after a '1' and three '0's. When it wraps it will generate a spurious '1' (and repeat the process). This is what is seen if the track contains no flux transitions. Three '0's in a row _can_ work but is unreliable. I see to remember that newer 1541 have more problems with these.

Hmm... according to the VICE doc sequences of 3 0's should be perfectly stable, even longer sequences (except for that every 4th 0 becomes a 1), as long as timing deviations don't destroy sync. This also makes me wonder how copy protection schemes worked that relied on the 1541 reading different data every time a long sequence of 0s was read. Maybe it wasn't really long sequences of 0s but actual noise or pulses of 1.5 a bit's length that deliberately caused random data to be read.

Could the analog circuitry play a role too? I'm not advanced enough in electronics to understand it... Reading back data written to magnetic media gives you the 1st derivative of what was originally written so this needs to be taken care of, and it looks like they added a low-pass filter for that purpose, I don't know how well this can deal with unexpected pulse lengths since it might just be tuned for the sweet spot of usual pulse lengths to be able to better cope with weak signals

2016-01-18 21:17

Zer0-X
Account closed

Registered: Aug 2008
Posts: 78

1541 has a simple 4 bit counter that that wraps around producing the 100010001000...

1541-II has different implementation that doesn't have this "limit" so in theory it can produce infinite amount of 0s.

Variation of spindle speed is the cause of more than 3x 0s to become unreliable to count, the more the speed if off from the norm.

The other variation is the read head amplifier circuitry which can easily pick up noise resetting the counter and thus producing random 1s to long stream of 0s.

And there's also a filter to hide pulses that are too close together from the counter logic.

2016-01-18 21:18

chatGPZ

Registered: Dec 2001
Posts: 11386

3 0s "work" (most of the time) on the very old longboard version... on later revisions it becomes more and more instable.

2016-01-18 21:39

doynax
Account closed

Registered: Oct 2004
Posts: 212

Just how large are the margins for reliably recording at higher bit densities anyway?

40 tracks seems to be reasonably reliable after all. So then taking the density of track 40 at speed-zone 0 as an upper bound would allow the inner tracks to use lower zones.

If my figures are correct zone 3 up to 26, zone 2 up to track 33, zone 1 up to track 36 and zone 0 up to track 40. Thereby gaining ~5% percent or so of extra storage.

2016-01-19 08:00

Fungus

Registered: Sep 2002
Posts: 686

hrm right the clock itself controls the data rate, I forgot about that. But it will still mess with the reading if it's set wrong, I did some tests with that once, and I would get unreliable data. There's a trick that pirate slayer does with toggling the data rate to change the track framing.

Something else that's interesting is the 4040 (or was it 8050?) had hardware GCR decoding using a rom with some address line trickery. This could be reproduced and stuck into a 1541 pretty easily.

Yes V-Max already does the pulse shortening, pretty nifty stuff. Also short syncs and other weird stuff like signature checking in pre and post gaps. Lord Crass could explain some of v-max's mechanisms more clearly, he's got a lot of experience dealing with v-max in general.

Also only 38 tracks is reliable, very old mechanisms will get the head stuck any higher than that. Some of the newer drives can go in to 41, but most will lock up higher than 40.

2016-01-19 08:51

chatGPZ

Registered: Dec 2001
Posts: 11386

40 is perfectly fine, thats what the mechs are made for actually.... it only was limited to 35 because of crappy media in the late 70s :)

prologicdos iirc uses a decoder ROM for GCR decoding

2016-01-19 12:43

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quoting Groepaz

40 is perfectly fine, thats what the mechs are made for actually.... it only was limited to 35 because of crappy media in the late 70s :)

Do you happen to have a source for that?

It would be comforting to know for certain the maximum number of tracks which can be targeted safely.

2016-01-19 12:56

chatGPZ

Registered: Dec 2001
Posts: 11386

it should be mentioned in the datasheets for the mech - however, no idea where to find them.

2016-03-14 08:19

Bitbreaker

Registered: Oct 2002
Posts: 508

I guess this is related to that:
https://sourceforge.net/p/vice-emu/bugs/582/
Also, do not expect this to be $ff on real hardware, except maybe if you read $1c01 just before the sync arrives :-) I injured myself as well at hat point back then, as i hoped to save those 3 bytes, but you will fail miserably on real hardware if you do so :-)

2016-03-14 17:13

Fungus

Registered: Sep 2002
Posts: 686

The data sheets say they should go to track 40, but you know commodore and their cheapness. They used faulty mechs sometimes and they do indeed get stuck past track 38. I had once such drive. It was mentioned somewhere else at some point too... maybe kracker jax documentations or the v-max usenet thread, my memory fails me.

Refresh

Subscribe to this thread: