Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > SD2IEC fastloader
2019-02-19 15:51
oziphantom

Registered: Oct 2014
Posts: 490
SD2IEC fastloader

Does anybody know of a system that fast loads, or a disassembly of a fast loader that SD2IEC supports?
 
... 57 posts hidden. Click here to view all posts....
 
2019-02-21 08:40
Knight Rider

Registered: Mar 2005
Posts: 131
Please remember that the Sam's Journey fastloader is only supported by sd2iec v1.0.0. Some SD2IECs have an other firmware and can't be upgraded. Example SDRIVE1564 that has sd2iec v0.10.3 and cannot be upgraded.
2019-02-21 09:05
cadaver

Registered: Feb 2002
Posts: 1160
oziphantom: You can do a Y-range check and use the timed protocol only while outside the sprite area. Slower, but at least doesn't crash or force you to hide sprites.
2019-02-21 11:41
Krill

Registered: Apr 2002
Posts: 2980
Quoting oziphantom
and I could load with burst.. but lots of work very little gain.
When it comes to burst, indeed. I've recently added it to my loader, and it's not faster at all than a well-optimised bit-banging protocol.

But where do you get all the content to fill 5-6 disk sides? How much space do you reckon would everything take when well-compressed?
2019-02-21 11:48
Krill

Registered: Apr 2002
Posts: 2980
Quoting Knight Rider
Please remember that the Sam's Journey fastloader is only supported by sd2iec v1.0.0. Some SD2IECs have an other firmware and can't be upgraded. Example SDRIVE1564 that has sd2iec v0.10.3 and cannot be upgraded.
Really, those crappy non-upgradeable devices should go die in a fire. There's a limit to catering to emulators. Explicit SD2IEC support is already questionable, with it not being a drive replacement but a mass storage device.
2019-02-21 12:14
oziphantom

Registered: Oct 2014
Posts: 490
Quoting Krill
Quoting oziphantom
and I could load with burst.. but lots of work very little gain.
When it comes to burst, indeed. I've recently added it to my loader, and it's not faster at all than a well-optimised bit-banging protocol.

But where do you get all the content to fill 5-6 disk sides? How much space do you reckon would everything take when well-compressed?

I think Burst while not faster than 2bit in a tight coupled loop,if I'm pulling in a byte around other things, such as irqs, sprites, game logic, having 8 bits being transferred without a CPU having to be involved will make it a lot easier. As I can't miss the byte, and it will wait for me without needing to be sent again.
However does your code use Burst + 2Bit bang, because that would get you faster again right, as the Burst byte won't occupy CPU time while you transfer the 2bits and hence be almost for "free"?

the 512K cart will already be compressed. It will have some code uncompressed to do the initial boot etc, but all the data will be compressed.
2019-02-22 09:39
cadaver

Registered: Feb 2002
Posts: 1160
One note after a shot at implementing eload: doing the setup for file load using IEC protocol will be murder for your interrupts, if you use Kernal (and it takes several frames.) However, as the device is known at this point, you could take a happycase approach, and implement your own simplified IEC send byte routine that can coexist with your interrupts. Also has the advantage that you can clobber whatever memory you like.
2019-02-22 10:38
Krill

Registered: Apr 2002
Posts: 2980
Quoting cadaver
One note after a shot at implementing eload: doing the setup for file load using IEC protocol will be murder for your interrupts, if you use Kernal (and it takes several frames.) However, as the device is known at this point, you could take a happycase approach, and implement your own simplified IEC send byte routine that can coexist with your interrupts. Also has the advantage that you can clobber whatever memory you like.
I did some research into that for Thundax's yet-to-be-released game which premiered at X 2018. He required an IRQ loader as well as an IRQ saver, both not resident in the drive, so also uploading the code using KERNAL had to be IRQ-friendly. I don't have the code here at the moment, and it needs some clean-up, but i'll post the gist of it soonish.
2019-02-22 13:33
Krill

Registered: Apr 2002
Posts: 2980
Quoting oziphantom
I think Burst while not faster than 2bit in a tight coupled loop,if I'm pulling in a byte around other things, such as irqs, sprites, game logic, having 8 bits being transferred without a CPU having to be involved will make it a lot easier. As I can't miss the byte, and it will wait for me without needing to be sent again.
However does your code use Burst + 2Bit bang, because that would get you faster again right, as the Burst byte won't occupy CPU time while you transfer the 2bits and hence be almost for "free"?
Burst is C-128 only (C-64 requires a relatively simple hardware mod), and burst and non-burst transfers are mutually exclusive, thus both cannot be employed at the same time. Burst uses the DATA line for data and only adds the SRQ line for clocking. The CLK line is required to signal "byte received" so the sender can clock out the next one. (There is still ATN, but it's only host->drive, and hardware ATN acknowledgement soiling the DATA line is a thing with burst as well, so let's not go into that.)

Now, burst-clocking out a bit requires 14 cycles at 2 MHz minimum (CIA timer period set to 6, two periods make up a bit cycle), anything faster yields errors due to the physical properties of the serial bus. That's 14 * 8 / 2 = 56 cycles at 1 MHz to transfer a byte. Add 6 cycles for the aforementioned CLK toggle, another 5 cycles to store the byte, and finally another 5 cycles for looping.

You're looking at 72 cycles minimum on C-64, which co-incidentally is exactly what the 2-bits+ATN protocol I use takes (it puts a new bit-pair every 18 cycles at 1 MHz on the bus). It also allows for interrupts, sprites and DMA without any restrictions.

So actually, the good old 2-bit asynchronous method (lda $dd00:lsr:lsr:eor $dd00 etc.) at 4*7 = 28 cycles for the raw byte would be the fastest one, after overhead it comes out at something like 64-ish cycles when allowing badlines, and 32-ish with screen off and no interrupts or DMA.

Bottom line: burst may make things "easier" to implement, but does not make for faster loading at all, interrupts, sprites and DMA or not.
2019-02-22 14:42
oziphantom

Registered: Oct 2014
Posts: 490
I forgot ATN was only one way.

If we step back from the leaf and inspect the tree though is there room to be gained?

So at the moment the code does this
grabs GCR bytes, decodes on the fly to "bytes" for block
once done
Sends block to C128
looks for and syncs next block
repeats until done
right?

So while the burst transfer is not faster than the 2 bits + ATN transfer, it does leave the 2mhz CPU free, giving you 112 clocks where the CPU could be doing other things. Like say reading and decoding a byte off the disk? At which point you probably don't need the fancy decode table to do it any more, giving you more drive RAM?
So a burst transfer looks like
find start block
read gcr
convert
start sending byte
read gcr
convert
start sending byte
..go until all bytes transferred
but to loosen the timings, you could detect if the machine is ready and send byte direct, or buffer the byte, and send one from buffer next time you check.

Also if c128 is in 1Mhz mode you have upto 56 clocks to "deflate" the byte on the host? And if I don't care just get it to me as fast as possible, 112 clocks to "deflate" in 2mhz mode.

So while the transfer it self is no faster, you don't have a read from disk, send data, read from disk, you can just pull bytes from the disk and sent immediately, and then you have a constant stream, baring sector and track movements?

Also the 1571 has a drive side auto-boot, that lets you load blocks into Drive RAM at disk insert, saving you from having to install the drive code from the host?

Or even switch to MFM and let the WD1770 chip do the bits -> byte conversion, giving you more CPU time to sync, buffer and send?
2019-02-22 15:15
Krill

Registered: Apr 2002
Posts: 2980
Some apt ideas there, will have to think about them thoroughly.

But a few remarks:
- On 1571, the decoding tables are in ROM, so already free in terms of RAM usage. They are somewhat sub-optimal and entirely different to what I came up with for 1541 (standing on lft's shoulders), but still, 100% on-the-fly reading + decoding + checksumming with 2 MHz.
- Interleaving GCR read/decode and transfer implies no interruptions/DMA on the host side, which also implies screen off. (Pretty sure the byte-received detection and buffering is ruled out due to reasons.)
- When going down that route, 50x-ish speed (around 20 KB/s) is possible without burst on C-64+1541.
- 1571 auto-boot is rather slow.
- I know close to nothing about WD1770.
Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
morphfrog
Andy/AEG
Guests online: 105
Top Demos
1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.6)
5 Edge of Disgrace  (9.6)
6 What Is The Matrix 2  (9.6)
7 The Demo Coder  (9.6)
8 Uncensored  (9.6)
9 Comaland 100%  (9.6)
10 Wonderland XIV  (9.6)
Top onefile Demos
1 Layers  (9.6)
2 No Listen  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.6)
6 Rainbow Connection  (9.5)
7 Dawnfall V1.1  (9.5)
8 Onscreen 5k  (9.5)
9 Morph  (9.5)
10 Libertongo  (9.5)
Top Groups
1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Triad  (9.3)
5 Censor Design  (9.3)
Top Coders
1 Axis  (9.8)
2 Graham  (9.8)
3 Lft  (9.8)
4 Crossbow  (9.8)
5 HCL  (9.8)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.142 sec.