[CSDb] - User Forums - Drive code for IECHost

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Drive code for IECHost

2015-11-22 14:12

TCE

Registered: Sep 2011
Posts: 29

Drive code for IECHost

Hi guys.

I recently started the IECHost project: an IEC bus master that can control Commodore drives such as the 1541 and 1541-II, and possibly other Commodore devices interfaced via the IEC bus such as printers and plotters.

I got to the point where I can upload code into my 1541's RAM and use a common 2-bit protocol afterwards, in order to transfer data to and from the drive pretty quickly. In particular the IECHost side is quite faster than a C64 so I can shorten the 1541 byte receive routine by at least 6 cycles (8 or 12 in a better or best scenario, but I need to do more testing for that).

I've created a simple server application that runs on the drive and waits for a command to which it replies with a message: both transfers use the 2-bit protocol.

; Main loop on the drive
mloop
	jsr readbyte
	cmp #'h'
	bne mloop

	lda #hmsge-hellomsg	; number of bytes that we'll be sending
	jsr sendbyte

	ldx #hmsge-hellomsg
	stx $06

sendhello
	ldx $06
	lda hellomsg-1,x	; send message to IECHost using the fast protocol
	jsr sendbyte
	dec $06
	bne sendhello

	beq mloop

hellomsg
	.text "!DLROW OLLEH"	; hello world!
hmsge

In the short term I'd like to add disk dumping through the use of the server application and fast 2-bit protocol but I also thought to ask for opinions and suggestions here.
Is there anything you would think worthwhile adding at this stage? Would you be interested in writing your own flavour of the server code?

Have a say, join the fun!

Cheers,

TCE/HF

2015-12-07 15:12

TCE

Registered: Sep 2011
Posts: 29

Hi all.

Quick update on the status of IECHost: I had a chance to finish the drive code for sector reading, using the DOS API as per below:

chkr	cmp #'r'		; read sector command
	bne mloop

	jsr readbyte		; read track number
	sta $0c
	jsr readbyte		; read sector number
	sta $0d

	lda #$03		; use buffer #3 ($0600-$06ff)
	sta $f9

	cli
	jsr $d586		; read sector into buffer
	sei

	lda $03			; send error code to IECHost
	jsr sendbyte

	lda $03
	cmp #$01		; if the job completed successfully
	bne mloop

	lda #$00
	sta srclow

sndsec
srclow=*+1
	lda $0600		; send buffer contents to IECHost
	jsr sendbyte
	inc srclow
	bne sndsec

Here's the corresponding code running on IECHost:

	fastaccess_send_byte('r');
	fastaccess_send_byte(track);
	fastaccess_send_byte(sector);

	b = fastaccess_receive_byte();

	switch (b) {
	case 0x01: /* Job completed successfully */
		fastaccess_read(buffer, 256);
		break;
	...

In order to get a speed gain, e.g. in order get D64 dumping times down to WarpCopy64 levels, I am looking into custom sector reading without GCR decoding (which I can do on the fly at destination).

I know there have been advances in the area since WarpCopy64 came out so if you got some recommendations, please feel free to drop a line here.
Cheers,

TCE/HF

2015-12-17 22:49

Repose

Registered: Oct 2010
Posts: 222

Can we see your sendbyte routine, from the drive side? I worked one out before, and it seems to me that it should take about 32 cycles.

I had an idea for synchronizing just based on bit changes on the lines, ie. use the pin change interrupt on the server side to synchronize the bit timing. Once you have any two pin changes, you know the relative clock speed and can keep it updated.

You should be able to send the raw data in real time in one pass, no need for sector seek etc., just start sending out whatever passes under the head and interpret that as track data later.

I believe there's just enough time to do this at the highest track density.

2015-12-18 11:27

TCE

Registered: Sep 2011
Posts: 29

Quote: Can we see your sendbyte routine, from the drive side? I worked one out before, and it seems to me that it should take about 32 cycles.

I had an idea for synchronizing just based on bit changes on the lines, ie. use the pin change interrupt on the server side to synchronize the bit timing. Once you have any two pin changes, you know the relative clock speed and can keep it updated.

You should be able to send the raw data in real time in one pass, no need for sector seek etc., just start sending out whatever passes under the head and interpret that as track data later.

I believe there's just enough time to do this at the highest track density.

Thanks for your reply.

I've now changed the sector read approach with one based on lft's on-the-fly GCR decoder. I also let the drive scan a track until all sectors are read&sent, instead of requesting individual sectors from IECHost.

After that, I ended up with a testing setup in VICE with a C64 program that receives data from the drive. My new faster transfer routines seem to not work as they should so I am not getting the expected data across in the sector buffer on the C64, but I see the sector data is exactly as it should be in the sector buffer on the drive itself at $0100-$01ff (i.e. it's in reverse order and off by a single byte).
With this setup the disk program manages to read in one sector out of three at each disk revolution: The rest of the time it's transferring data to the C64. I am fairly convinced that by replacing the C64 side with IECHost, whose clock can be run at a different speed than the C64, I might get away with less cycles "wasted" for synchronization purposes and end up with just two revolutions per track during a disk dump.

In fact, by disabling the transfer from the drive to the C64 in VICE my whole disk dump program running on the drive finishes its job in just about 10 seconds.

All of the above stats need further validation, but they seem to be consistent with expectations. If they were one order of magnitude better than expected I'd avoid publishing any of them until verified on the field.

If you wrote a fast protocol for transferring data from the drive to the C64 I'd be interested in reading more about it. Again, I'd like to stress the fact that the core clock in IECHost is not a limitation and that the device runs uninterrupted during a transfer so it might be possible to push things to the very limit and implement a 2 revolution whole track dump tool.

2015-12-19 09:27

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quoting Repose

I had an idea for synchronizing just based on bit changes on the lines, ie. use the pin change interrupt on the server side to synchronize the bit timing. Once you have any two pin changes, you know the relative clock speed and can keep it updated.

You should be able to send the raw data in real time in one pass, no need for sector seek etc., just start sending out whatever passes under the head and interpret that as track data later.

I believe there's just enough time to do this at the highest track density.

I agree with your approach in principle but I can't quite get the timing to work out.

At the highest density you'll need to hit somewhere a bit shy of 26-cycles per byte, depending on how much tolerance for variance in rotational speeds you want to put up with.

The best I can come up with is a basic 28-cycle best-case innerloop, assuming ATN ACK masking via $1802. Plus occasional byte synchronization/loop overhead, presumably with ATN interrupts could break out of it all. Perhaps 30-cycles in all on average.

Of course both streaming reads and write should easily be doable on 1571 drives in 2 MHz mode.

edit: Come to think of it the floppy bit-stream doesn't afford clock-recovery on the receive end when transmitted with 2-bits in parallel, and even if it did BVC jitter would cause trouble.

2015-12-19 11:15

tlr

Registered: Sep 2003
Posts: 1714

Quoting doynax

edit: Come to think of it the floppy bit-stream doesn't afford clock-recovery on the receive end when transmitted with 2-bits in parallel, and even if it did BVC jitter would cause trouble.

We are talking cross-platform here so assuming the raw bits could be sent out, shouldn't there always be something toggling due to the encoding? (well, maybe not during sync)
Synchronization could then easily be done by correlation on a sampled buffer load.

Perhaps the bvc jitter could be solved in the same way assuming enough bits are transferred for each bvc?

The main question is though, can enough bits be sent out per byte in average?

2015-12-19 13:17

TCE

Registered: Sep 2011
Posts: 29

Quoting tlr

Quoting doynax
edit: Come to think of it the floppy bit-stream doesn't afford clock-recovery on the receive end when transmitted with 2-bits in parallel, and even if it did BVC jitter would cause trouble.

We are talking cross-platform here so assuming the raw bits could be sent out, shouldn't there always be something toggling due to the encoding? (well, maybe not during sync)
Synchronization could then easily be done by correlation on a sampled buffer load.

Perhaps the bvc jitter could be solved in the same way assuming enough bits are transferred for each bvc?

The main question is though, can enough bits be sent out per byte in average?

So you guys are suggesting that instead of doing on-the-fly GCR decoding, I go for on-the-fly transmission of each GCR image byte, possibly starting after the Header and Data IDs?
That is something I'd be happy to look into, but I'd have to set up a test application for this one too, which probably means work to be done next year.

But let's go back to my approach for a second. From measurements in VICE I get about 28237 CPU cycles between the read head crossing the sync mark of sector n and sector n+3 on track 1, which gives around 9412 CPU cycles per sector. With my own approach (on-the-fly GCR decoding) that leaves me 28237 / 3 / 259 = 36.341 CPU cycles per byte transmitted (256 bytes in a sector plus track, sector, and checksum = 259 bytes) on the IEC bus with a fast protocol in order to dump a whole track in just two revolutions:

1. dump sector n
2. transmit sector n (while the read head is crossing sector n+1)
3. dump sector n+2
4. transmit sector n+2 (while the read head is crossing sector n+3)
5. and so on...

It should be possible to rule in (but not yet entirely out?) the feasibility of this approach by checking what the fastest known transfer protocol can achieve in terms of CPU cycles per byte and assuming one can do slightly better as IECHost's clock is a multiple of the 1541's one.

I am open to suggestions, of course.

2015-12-19 17:37

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quoting tlr

We are talking cross-platform here so assuming the raw bits could be sent out, shouldn't there always be something toggling due to the encoding? (well, maybe not during sync)

Not necessarily. Assuming in-order bit transmission then a sequence of $55 bytes would arrive as static 01 or 10 pairs.

What I forgot is that we do have some leeway in the bit-order. Consider:

	lax $1c00
	sta $1800
	asl
	sta $1800
	lda swizzle,x
	sta $1800
	asl
	sta $1800

If the "swizzle" table mirrors the bits instead of swapping the nybbles then repeating 01/10 pairs would correspond to invalid %11000011/%00111100 GCR bytes.

Quoting tlr

Synchronization could then easily be done by correlation on a sampled buffer load. Perhaps the bvc jitter could be solved in the same way assuming enough bits are transferred for each bvc?

Certainly. Assuming proper CBM formatting then with sufficient analysis you almost certainly enough clues to decipher anything not intentionally crafted to confuse the system.

Perhaps the BVC affects may also be alleviated by using BVC *+2 to induce a single cycle of jitter.

Quote:

The main question is though, can enough bits be sent out per byte in average?

I can't see how. At a minimum we one GCR load and four stores per byte, leaving less than 6 cycles for bit-swizzling and synchronization/loop overhead. Possibly some nasty shenanigans with JMP ($1C00) and 256x unique transfer routines and partially frequency-modulated data, but certainly not in 2k of RAM.

It is easily doable on the 1571 though, which doesn't seem an entirely unreasonable requirement for a fast dumping utility. It might even possible to dump the back-side from the reversed bit-stream with some careful reconstruction.

2015-12-20 06:42

Flavioweb

Registered: Nov 2011
Posts: 447

Isn't possible to transfer raw GCR data to IECHost rebuilding bytes here?

2015-12-21 20:45

TCE

Registered: Sep 2011
Posts: 29

Quoting Flavioweb

Isn't possible to transfer raw GCR data to IECHost rebuilding bytes here?

Yes, that is possible indeed and is what WarpCopy64 does as well AFAIK.
However, I was hoping that by transferring decoded GCR, a whole sector (259 bytes, including track, sector, and checkbyte) could cross the IEC bus in the same time it takes for the read head to scan through a sector so that a whole track could be dumped in just two revolutions. 259 bytes is significantly less than 320-ish: that's why I chose to use lft's approach for on-the-fly GCR decoding.

2017-04-12 14:44

TCE

Registered: Sep 2011
Posts: 29

I eventually came up with a non-on-the-fly GRC implementation of the disk imaging process using IECHost, "a la" WarpCopy64.

In fact, during earlier tests I made, I pretty much convinced myself that even using on-the-fly GCR decoding, as per lft's code, and the fastest possible transfer protocol over the IEC bus, it's impossible do dump any track in just two disk revolutions, even when replacing the C64 with custom hardware.
As I mentioned before, that would require any sector transfer completing in 28237/3 cycles, i.e. about 9412 cycles. In one of my tests I was able to get to around 12k cycles to transfer 257 bytes, which is still too much for dumping any track in two revolutions only.

The 12k figure applies to the protocol with which a transfer of 4 bytes occurs over the IEC bus before C64 and drive are re-synchronized. In total, each group of 4 bytes and the re-sync take 194 cycles, and there also are a few cycles required for the initial synchronization process. Therefore 259 bytes can only be transferred in about 12k cycles. Even if it were possible to remove the re-sync step while working with a custom-hardware IEC host, one would almost surely be unable to save the extra 3k required for dumping any track in two revolutions only. Such a saving would require saving 12 cycles on average per byte transferred, getting down to 34-ish, which is asking a lot. Fetching a byte from the drive's RAM with a "lda buff,x" alone takes 4 cycles, and each bit couple transfers/is held in/for 8 cycles: we're at 36 cycles already without any loop branching.

Well, at least it was a lot of fun trying to beat WarpCopy64 with a 5 GBP hardware solution :)

2017-04-12 21:10

Repose

Registered: Oct 2010
Posts: 222

Can't you send bytes while reading from the drive? I know it takes 11 cycles to read a byte but then you have over 16 cycles left to send something. It would be a custom xfer and then switch to a faster one during the sector gap.

2017-04-12 22:11

TCE

Registered: Sep 2011
Posts: 29

Quote: Can't you send bytes while reading from the drive? I know it takes 11 cycles to read a byte but then you have over 16 cycles left to send something. It would be a custom xfer and then switch to a faster one during the sector gap.

Thanks for suggesting that. However, I am afraid it would complicate the approach to the extent that it might become difficult to maintain in future.

I am satisfied with the current solution and will only pursue a more feature rich experience for users.

Credit and glory go to Graham for a "first release" of the Warp mode. Happy to have got there, even if late. Second perhaps. Perhaps later than that, but there.

Refresh

Subscribe to this thread: