| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Drivecode
Hi guys,
finally i wanted to give drivecode a try, but the transfer is the bottleneck (and 2kb of memory sucks as well). Actually i only would push this further if transfer of two bytes is seriously faster than 154 cycles, as that is what i need to transform one vertice, what i thought of offloading to the drive. I'd love to also implement backface-culling within the drive, but that seems to be mostly impossible due to lack of memory (assumed we do more complex stuff than a cube).
So what i do so far is on c64 side:
-
lda $d012
sbc #$31
bcc +
clc
and #$07
beq -
+
lda #%00001011
sta $dd00
nop
eor #%00001000
sta $dd00
lda #$ff
eor $dd00
lsr
lsr
eor $dd00
lsr
lsr
eor $dd00
lsr
asr #$fe ;lets carry be cleared after lsr!
eor $dd00
And on 1541 side:
!align 255,0
bin2ser
!byte %1111, %0111, %1101, %0101, %1011, %0011, %1001, %0001
!byte %1110, %0110, %1100, %0100, %1010, %0010, %1000, %0000
ldx #$0f
sbx #$00
lsr
lsr
lsr
lsr
sta .y1+1 ;keep y free
lda bin2ser,x
-
ldx $1800
bpl -
sta $1800
asl
and #$0f
sta $1800
.y1 lda bin2ser
sta $1800
asl
and #$0f
sta $1800
Any idea how to get this reasonably faster? I'd also be okay if just bit 0-6 are transferred form each byte, but that does not seem to help much, as bit 6 and 7 are the last in the transfer. I also thought of doing a burst of two bytes per sync, but that did somehow not work as i get jitter into the second byte then :-(
Bitbreaker |
|
| |
tlr
Registered: Sep 2003 Posts: 1790 |
Really fast transfers are done by syncronizing more exactly than just a bit:bne (i.e 0-6 cycles) and then transfering many bytes in a row.
This syncronization can be accomplished by sending a pattern from the transmitting side and then use several single cycle adjustment polls on the receiving side.
Check stuff like the ar turbo, oliver stillers loaders and graham's warpcopy for examples of this.
|
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
But how much is the overhead of tedious synchronisation compared to the gain of a burst transfer? On 1541 side the preparation of the byte to be transferred is also somewhat costly, so not too much to save here except the sync. Or is there also a faster way of splitting up a byte into 2 bit slices on 1541 side?
Also needless to say that i want the screen turn on during transfer and thus have to cope with badlines, that should not interrupt the transfer. |
| |
MagerValp
Registered: Dec 2001 Posts: 1078 |
According to my calculations you should be able to sync and transfer 8 bytes between two badlines, at 44 (drive) cycles per byte. |
| |
Repose
Registered: Oct 2010 Posts: 225 |
There's code for this
http://codebase64.org/doku.php?id=base:drivecalc_vectors
And off the top of my head, you can get almost 256 bytes from precise sync, someone told me they had to do 1 cycle sync every once in a while to keep it going.
Really I doubt there's much advantage to drivecalc, there's a huge overhead to the communication, it has to be a huge calculation to data ratio, and the 1541 is quite slow due to low memory, where the c64 can use huge tables.
|
| |
Repose
Registered: Oct 2010 Posts: 225 |
Looks like:
; NTSC: 16 bytes * 45 cycles at 1022727 Hz = 704.0002 ms
; drive: 16 bytes * 44 cycles at 1000000 Hz = 704 ms
|
| |
MagerValp
Registered: Dec 2001 Posts: 1078 |
That's with the screen closed, or in the border, which doesn't help here. It also doesn't work out so well in PAL, where you have to alternate between 43 and 44 cycles. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
@Respose:
The article from codebase64 is well known to me, but it uses a full handshake on each byte and even calling the transfer-routine with an jsr. So i would not say it is the fastest way to transfer.
I am more asking, if the preparation of the 2 nybbles on 1541 side could be made faster, as well as the stuff before the sync on c64 side. And if the eor/lsr/lsr is only way to go for transferring 2 bits, or if there are other (faster) ways to do so.
And yes, please refrain from telling me that drivecode make no sense. That is not part of my question. My math is at least good enough to calc the gain/loss when using drivecode. I just want to give this a try and make it perform as fast as possible. If it is not of much use in the yet case, it might come handy for some future demo i am coding, or be used for a dedicated drivecode-article on codebase64.
|
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
@MagerValp
Hmm 44 cycles sounds tight, but when having the ~63 cycles of the badline to prepare the next 8 bytes on 1541 side to setup the next burst it could work out well. I'll give that a try and would then sync to the first good line only. |
| |
MagerValp
Registered: Dec 2001 Posts: 1078 |
You have 63 * 7 + 20 = 461 cycles between each badline. At 43.5 cycles per byte in PAL you have 461 - (43 * 4 + 44 * 4) = 113 cycles to sync. You have 8 ms between each bit pair, and a clock skew of about 1.2 ms, so you should be fine with a sync to within ±2 cycles. |
| |
Dano
Registered: Jul 2004 Posts: 234 |
afaik krill did the drivecalls in the lower border. imho drivecalc makes sense where you can parallelize computing, like clearing the screen while the drive does the calc. so you can compensate slower code within the drive.
i'd be pretty much interested in some sources to try a little myself, yet i'm not into that stuff more than doing some bits of thinking. |
... 19 posts hidden. Click here to view all posts.... |
Previous - 1 | 2 | 3 - Next |