[CSDb] - User Forums - Multi-master on IEC bus

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Multi-master on IEC bus

2022-01-04 17:34

Repose

Registered: Oct 2010
Posts: 225

Multi-master on IEC bus

Continuing an off-topic thread from https://csdb.dk/forums/?roomid=11&topicid=91766, starting from #117.

The question at hand, is imagine programming a set of drives to perform computations on behalf of the C64. How can you handle communication between the C64 and one or more drives such that the drives can initiate communication when they have a result to send?

Technical info of the CIA for reference:
https://ist.uwaterloo.ca/~schepers/MJK/cia6526.html

Mapping the C64:
https://e4aws.silverdr.com/project64/mapc64/ (search DD00)

Ultimate Drive Talk:
https://hackaday.com/2021/09/19/the-ultimate-commodore-1541-dri..

56576 $DD00 CI2PRA
Data Port Register A

Bits 0-1: Select the 16K VIC-II chip memory bank (11=bank 0, 00=bank 3)
Bit 2: RS-232 data output (Sout)/Pin M of User Port
Bit 3: Serial bus ATN signal output
Bit 4: Serial bus clock pulse output
Bit 5: Serial bus data output
Bit 6: Serial bus clock pulse input
Bit 7: Serial bus data input

56589 $DD0D CI2ICR
Interrupt Control Register
Bit 4: Read / was a signal sent on the FLAG line? (1=yes)
Write/ enable or disable FLAG line interrupt (1=enable, 0=disable)

It seems there's no way to make the serial bus an interrupt source, but by connecting FLAG to one of the pins you could.

Is polling that bad? What are the needs? For a use case, imagine plotting a fractal. I wouldn't mind if the screen filled in in arbitrary chunks. You could easily poll at 50/60Hz rate and it would seem instant. The method would be for the drive to set CLK high, on the next poll, the C64 pulls it low and holds, in up to another poll time the 1541 see it's low, sets DAT high... anyhow, from there we can synchronize the two CPUs by various methods used by drive loaders.
Ultimately, you can transfer bytes in about 32 cycles at a time. I've heard the CPUs can stay in sync for at least 40 bytes at a time.
They key to efficiency is keeping the two CPU's in sync, then the handshaking is fast, and from there you can do a big burst of data.

2022-01-04 18:24

Krill

Registered: Apr 2002
Posts: 2980

An idea that just occured to me is some kind of time-division multiplex.

Consider a C-64 and 4 drives on the bus, but no other computer on the other end of the chain. Numbers are somewhat arbitrary, and some tolerance may be assumed.

The C-64 would assert and release ATN once a video frame.
This would trigger interrupts in all drives, which would reset a VIA timer used to determine valid time slots.

Once a drive has finished its work, it would wait for the next timeslot it may use. This could be something like a fixed-position window of 64 cycles within a period of 256 cycles, with each window reserved for exactly one of the 4 drives.

Once the beginning of such a slot is reached, the drive would sense the bus for ongoing communication (DATA or CLK asserted within say 32 cycles).

If communication is going on, it would wait for the next slot and retry.
Otherwise, it would assert DATA or CLK to signal the C-64 that it has finished work.
Then it would wait for an acknowledgement, send its ID followed by the burst of result data, then receive a new chunk of work to process.

Now... some tricky bits could be avoiding interference of data transfers with the ATN sync signal once a video frame, and also keeping the polling overhead of the C-64 somewhat low, as it would need to check the bus for a work-finished signal periodically while also performing some work of its own.

Having two computers on the bus adds some more complexity, but this case can be ignored until later. :)

2022-01-04 18:56

Repose

Registered: Oct 2010
Posts: 225

Great thinking!
Time slots can certainly solve the problem. You could have 5 time-slots and whomever brings a line high in their time-slot gets control, then you can burst the data. If many people need to send, they go in priority round-robin order.

Or, you can use the CAN method; we can use CLK in/out for this. Idle is when CLK is high. The node that wants to send brings CLK low. Everyone else detects the low and switches to listening mode. This is normally done in hardware yes but can also be done in software. The other difference is doing it every tick so you aren't constantly polling.
To continue, the talker sends the ID of who it is and how much data it's sending, then the listeners can go back to what they're doing until the bus is expected to be free again, and due to the amount of data sent, this is allowed to even exceed the tick time.
The next tick after that it can start again.

2022-01-04 19:27

Krill

Registered: Apr 2002
Posts: 2980

Quoting Repose

The node that wants to send brings CLK low. Everyone else detects the low and switches to listening mode. This is normally done in hardware yes but can also be done in software.

This seems to be a bit too much overhead. Every device on the bus would need to poll for CLK going low, in between doing work.

Quoting Repose

the talker sends the ID of who it is and how much data it's sending, then the listeners can go back to what they're doing until the bus is expected to be free again

Having all devices (somehow) know when a transfer is expected to end sounds like a good idea. :) But then the devices don't need to know that until they have data to send and nothing else to do than to poll for an idle bus.

2022-01-04 19:39

Repose

Registered: Oct 2010
Posts: 225

Quoting Krill

Quoting Repose
The node that wants to send brings CLK low. Everyone else detects the low and switches to listening mode. This is normally done in hardware yes but can also be done in software.
This seems to be a bit too much overhead. Every device on the bus would need to poll for CLK going low, in between doing work.

I agree that I've failed to make use of ATN here to interrupt each device; however, whether a device is interrupted by ATN or interrupted by a timer, it's the same thing. Just to be clear, I don't propose that the drives are in an infinite loop waiting for CLK low; they only do this once a tick, and for a limited time. In other words, I see this as a loop of maybe 100 cycles every 1/60th of a second. ATN is one way if the c64 wants to send to a drive, instead of using that, I'm saying have them all listen for each other once in a while, and if no one has anything to say, you can quickly go back to work.

Quoting Repose

the talker sends the ID of who it is and how much data it's sending, then the listeners can go back to what they're doing until the bus is expected to be free again

Quoting Krill

Having all devices (somehow) know when a transfer is expected to end sounds like a good idea. :) But then the devices don't need to know that until they have data to send and nothing else to do than to poll for an idle bus.

Polling can be very minimal; it only adds latency. For the use cases I'm thinking of, a latency equal to screen refresh will always be sufficient.

2022-01-04 19:49

Krill

Registered: Apr 2002
Posts: 2980

Quoting Repose

whether a device is interrupted by ATN or interrupted by a timer, it's the same thing.

The point of an ATN interrupt once a videoframe is the synchronise the clocks, as they would sooner or later all drift apart. Synchronised clocks are required to properly determine valid time slots for each device.

Quoting Repose

Polling can be very minimal; it only adds latency. For the use cases I'm thinking of, a latency equal to screen refresh will always be sufficient.

Polling is minimal if a device only polls after having finished work, as it doesn't have anything else to do.
So it can just as well poll in a loop and have minimum latency, while only being interrupted once a video frame for the ATN sync strobe, doing its work with minimum overhead.

Main source of latency would be something with the time slot granularity.

The use cases i'm thinking of shall certainly produce video measurable in frames per second, not seconds per frame. :)

2022-01-04 20:03

Repose

Registered: Oct 2010
Posts: 225

Quoting Krill

Polling is minimal if a device only polls after having finished work, as it doesn't have anything else to do.
So it can just as well poll in a loop and have minimum latency, while only being interrupted once a video frame for the ATN sync strobe, doing its work with minimum overhead.

Yes, that's a good point. But, I'm noticing a new assumption here; you are thinking of them in a more restricted sense as slaves only reporting back calculations and not accepting any new commands (such as a status check or more work) until completing the work. In that case, they can simply not participate during the polling interval. If a node is not expecting work due to it not sending finished work, it has no need to participate in arbitration.
So, when it is finished, it has to wait until the next ATN to indicate it's ready to send. It can send as long as it wants, the c64 can stop the regular ATN until the bus is free again.

2022-01-04 20:41

Krill

Registered: Apr 2002
Posts: 2980

Quoting Repose

But, I'm noticing a new assumption here; you are thinking of them in a more restricted sense as slaves only reporting back calculations and not accepting any new commands (such as a status check or more work) until completing the work.

Any commands to unsuspecting devices can be sent along with the ATN sync. They are interrupted anyways and can then check the other two lines for being asserted.

Quoting Repose

In that case, they can simply not participate during the polling interval.

Which polling interval do you mean?

Quoting Repose

So, when it is finished, it has to wait until the next ATN to indicate it's ready to send.

The ATN sync is just there to sync the clocks (and possibly dish out commands). If a device has data to send, it would not wait for ATN but its next valid time slot.

Quoting Repose

It can send as long as it wants, the c64 can stop the regular ATN until the bus is free again.

True, the ATN sync does not have to be aligned to video frames.

2022-01-04 21:08

mankeli

Registered: Oct 2010
Posts: 146

I thought it's really not possible to daisy chain too many devices on the IEC bus, because the every NMOS device has a pull-up resistor and the transistors are not strong enough?

2022-01-04 21:41

tlr

Registered: Sep 2003
Posts: 1790

Quote: I thought it's really not possible to daisy chain too many devices on the IEC bus, because the every NMOS device has a pull-up resistor and the transistors are not strong enough?

There's a 7406 TTL buffer in there. I always thought it would be the capacitive loading making the edges too slow, but I see now that the pull ups are only 1k so maybe that's it.

2022-01-04 21:49

Krill

Registered: Apr 2002
Posts: 2980

Quoting mankeli

I thought it's really not possible to daisy chain too many devices on the IEC bus

That's quite a tautology. :)

The official specs support at least 4 devices additionally to the host computer, so... will be fun to find out how many more drives people can add until it starts acting funny. =)

2022-01-04 23:05

Copyfault

Registered: Dec 2001
Posts: 478

Maybe I'm naive here, maybe not deep enough into IEC bus constraints, but the idea with the time slots should basically do the trick, at least in a setup with n drives and one C64 as "master system" - no?

In my head, the drives all do the same:
IDLE/POLL
SEND
CALC
(looping after CALC is done)

When idling, they poll the bus in order to be signalled to send their buffer content. After sending, the drive does its calculation job.

This way, the C64 (assuming only one is in play as master) can get data from each drive one after another; while one drive is sending data, the others can still do the calculation job. This job can either be executed until the internal buffer is full or it is done for a certain time which can be ensured by a timer irq (that's what Krill meant with time slots iiuc). The length of a time slot per drive can be determined depending on the no. of drives in the chain and the job that has to be done.

And if we spend some kind of initial calculation phase, i.e. that each drive can do its calc job before the master starts the pollings, it should even be possible to permanently receive data without any waiting phases.

Where did I take the wrong path? Or are there other aims than splitting up calculation jobs and receiving the results of the partial calculation in a most fluent way?

2022-01-04 23:16

Krill

Registered: Apr 2002
Posts: 2980

Quoting Copyfault

This job can either be executed until the internal buffer is full or it is done for a certain time which can be ensured by a timer irq (that's what Krill meant with time slots iiuc). The length of a time slot per drive can be determined depending on the no. of drives in the chain and the job that has to be done.

The time slots in my half-baked scheme are only relevant for the drives' polling phase.
Only in the time slots relevant for a given drive, it will check if the bus is idle, and if so, signal the master that data is ready (then send the data and receive new work).
A job may take any amount of time, and it may not be known when it's done. The amount of time to process a job is not relevant for the time slot timing, but as soon as it is done, poll again etc. =)

2022-01-05 01:11

Copyfault

Registered: Dec 2001
Posts: 478

Quoting Krill

Quoting Copyfault
This job can either be executed until the internal buffer is full or it is done for a certain time which can be ensured by a timer irq (that's what Krill meant with time slots iiuc). The length of a time slot per drive can be determined depending on the no. of drives in the chain and the job that has to be done.
The time slots in my half-baked scheme are only relevant for the drives' polling phase.
Only in the time slots relevant for a given drive, it will check if the bus is idle, and if so, signal the master that data is ready (then send the data and receive new work).
A job may take any amount of time, and it may not be known when it's done. The amount of time to process a job is not relevant for the time slot timing, but as soon as it is done, poll again etc. =)

If I get this right, in this approach the drive signals the state of "being done with the calculation". My thinking went the other way around: do calculations for a fixed amount of time. Then the master can always poll the data from a drive, since the time slice it is granted (i.e. by an internal timer irq) ensures that the drive is ready with the chunk it could handle within that time slice.

But I think I got it that it might be better to give a calc job to a drive and be informed when this task has been succesfully finished. Such a job need not be squeezable into the time slices I have in mind in my sketch above.

Hmm... any chance to have the calculation jobs performed in a way that they can fill the drive internal buffer fast enough, s.t. the buffer (assuming an init calc phase was permitted) will always be full when the master polls the data? Might be possible to have the master poll only a certain amount of bytes s.t. the drive can fill new data to the buffer while the master polls the other drives...

2022-01-05 01:44

Krill

Registered: Apr 2002
Posts: 2980

Quoting Copyfault

Hmm... any chance to have the calculation jobs performed in a way that they can fill the drive internal buffer fast enough, s.t. the buffer (assuming an init calc phase was permitted) will always be full when the master polls the data? Might be possible to have the master poll only a certain amount of bytes s.t. the drive can fill new data to the buffer while the master polls the other drives...

Probably possible with some parallel algorithms, but this wouldn't be generic and run counter to my idea of a computer cluster built with stock Commodore 8-Bit hardware. :)

And what would be the gain?

It's quite important not to have a fixed order in which the drives receive arguments or submit results.
Allowing for a dynamic communication order resulting from a given algorithm and data is a superset of having a fixed order with fixed computation times and fixed communication slots.

2022-01-05 21:04

Hoogo

Registered: Jun 2002
Posts: 105

What about a kind of Token passing?
During an init phase, all drives are detected, and each drive is given an internal number (1..) + the total number of drives + a token counter initialized with 1. That token counter tells the drives that drive 1 has the token now.

Later, the C64 polls by setting ATN. The IRQ in the drive compares the token counter with the drive number and reacts if it wants to send something. Otherwise, it just increases the internal token counter and resets it to 1 on overflow.

Should work with a 1bit loader. I guess 2bit loaders would be difficult, as every ATN would interrupt the calculation in all drives.

I just assume now that interrupt programming in the drive is somewhat similar to C64 with a vector for that, I'd have to look that up...

2022-01-05 22:06

Krill

Registered: Apr 2002
Posts: 2980

Quoting Hoogo

What about a kind of Token passing?
[...]
Later, the C64 polls by setting ATN.

ATN syncing shall not happen more often than say 50 or 60 times a second, because overhead. By the same token, you want to get more than one drive's worth of data in a videoframe.

Quoting Hoogo

Should work with a 1bit loader. I guess 2bit loaders would be difficult, as every ATN would interrupt the calculation in all drives.

Sorry, i don't understand this. What is the transfer protocol to do with the calculation? And the ATN interrupt can be masked if a drive has nothing to send.

Quoting Hoogo

I just assume now that interrupt programming in the drive is somewhat similar to C64 with a vector for that, I'd have to look that up...

It's... complicated and comes with quite a bit of overhead.

2022-01-05 23:07

Hoogo

Registered: Jun 2002
Posts: 105

Quoting Krill

Quoting Hoogo
I just assume now that interrupt programming in the drive is somewhat similar to C64 with a vector for that, I'd have to look that up...
It's... complicated and comes with quite a bit of overhead.

Nice.
So every time the C64 sets ATN, all drives can interrupt their calculation for some Token checking:
- Do drive ID and Token counter match?
- If yes: Has this drive data to sent?
- If yes: React on Clock/Data and start the transfer protocol.
- In any case: Increase the token counter and return to calculation.

Quoting Krill

ATN syncing shall not happen more often than say 50 or 60 times a second, because overhead. By the same token, you want to get more than one drive's worth of data in a videoframe....
What is the transfer protocol to do with the calculation? And the ATN interrupt can be masked if a drive has nothing to send.

Every setting/resetting of ATN is one poll, allowing only one drive to react and send data. Polling 8 drives means that ATN has to be set/reset 8 times.
A drive must process every IRQ and must not mask it, it must do the counting. No drive ID is sent by the C64, so each poll is very short and simple.

But this also means that the protocol for data transfer can't use ATN as clock, only CLOCK and DATA. You're right, this doesn't exclude 2bit transfer, but for simplicity, I'd choose a 1bit IRQ loader protocol.

2022-01-05 23:15

Krill

Registered: Apr 2002
Posts: 2980

I don't quite see the gain with a token, though. As you say, it would require all drives to maintain a correct shared state upon every transmission including those not concerning them, no matter their individual states.

True that 2bit+ATN transfer protocols won't work, but that stuff is an invention geared towards sprites with their quasi-random DMA interference.
So not much of a loss for something that would plot to a bitmap comparatively slowly. =) (Need something to work around badlines, though. But those cause rather regular DMA interference.)

2022-01-06 01:00

Hoogo

Registered: Jun 2002
Posts: 105

Quoting Krill

I don't quite see the gain with a token, though. As you say, it would require all drives to maintain a correct shared state upon every transmission including those not concerning them, no matter their individual states.

Simplicity, assuming that the overhead for ATN-Irq isn't too evil.
You can adjust the poll frequency and the data block size, and C64 can have sprites, music and any funny timing.

2022-01-06 01:29

Krill

Registered: Apr 2002
Posts: 2980

Quoting Hoogo

Simplicity, assuming that the overhead for ATN-Irq isn't too evil.

Seems like the price is a reduced maximum throughput, though.

Quoting Hoogo

You can adjust the poll frequency and the data block size, and C64 can have sprites, music and any funny timing.

And this seems like an entirely different usecase than i have in mind. :) Music is not a problem, though, but sprites and funny timing aren't required.

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

ptoing
Guests online: 54

Top Demos

1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.6)
5 Edge of Disgrace  (9.6)
6 What Is The Matrix 2  (9.6)
7 The Demo Coder  (9.6)
8 Uncensored  (9.6)
9 Comaland 100%  (9.6)
10 Wonderland XIV  (9.6)

Top onefile Demos

1 Layers  (9.6)
2 No Listen  (9.6)
3 Party Elk 2  (9.6)
4 Cubic Dream  (9.6)
5 Copper Booze  (9.6)
6 Rainbow Connection  (9.5)
7 Dawnfall V1.1  (9.5)
8 Onscreen 5k  (9.5)
9 Morph  (9.5)
10 Libertongo  (9.5)

Top Groups

1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Triad  (9.3)
5 Censor Design  (9.3)

Top Musicians

1 Rob Hubbard  (9.7)
2 Mutetus  (9.7)
3 Jeroen Tel  (9.7)
4 Linus  (9.6)
5 Stinsen  (9.6)

Page generated in: 0.123 sec.