Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > SD2IEC fastloader
2019-02-19 15:51
oziphantom

Registered: Oct 2014
Posts: 490
SD2IEC fastloader

Does anybody know of a system that fast loads, or a disassembly of a fast loader that SD2IEC supports?
 
... 57 posts hidden. Click here to view all posts....
 
2019-02-22 13:33
Krill

Registered: Apr 2002
Posts: 2980
Quoting oziphantom
I think Burst while not faster than 2bit in a tight coupled loop,if I'm pulling in a byte around other things, such as irqs, sprites, game logic, having 8 bits being transferred without a CPU having to be involved will make it a lot easier. As I can't miss the byte, and it will wait for me without needing to be sent again.
However does your code use Burst + 2Bit bang, because that would get you faster again right, as the Burst byte won't occupy CPU time while you transfer the 2bits and hence be almost for "free"?
Burst is C-128 only (C-64 requires a relatively simple hardware mod), and burst and non-burst transfers are mutually exclusive, thus both cannot be employed at the same time. Burst uses the DATA line for data and only adds the SRQ line for clocking. The CLK line is required to signal "byte received" so the sender can clock out the next one. (There is still ATN, but it's only host->drive, and hardware ATN acknowledgement soiling the DATA line is a thing with burst as well, so let's not go into that.)

Now, burst-clocking out a bit requires 14 cycles at 2 MHz minimum (CIA timer period set to 6, two periods make up a bit cycle), anything faster yields errors due to the physical properties of the serial bus. That's 14 * 8 / 2 = 56 cycles at 1 MHz to transfer a byte. Add 6 cycles for the aforementioned CLK toggle, another 5 cycles to store the byte, and finally another 5 cycles for looping.

You're looking at 72 cycles minimum on C-64, which co-incidentally is exactly what the 2-bits+ATN protocol I use takes (it puts a new bit-pair every 18 cycles at 1 MHz on the bus). It also allows for interrupts, sprites and DMA without any restrictions.

So actually, the good old 2-bit asynchronous method (lda $dd00:lsr:lsr:eor $dd00 etc.) at 4*7 = 28 cycles for the raw byte would be the fastest one, after overhead it comes out at something like 64-ish cycles when allowing badlines, and 32-ish with screen off and no interrupts or DMA.

Bottom line: burst may make things "easier" to implement, but does not make for faster loading at all, interrupts, sprites and DMA or not.
2019-02-22 14:42
oziphantom

Registered: Oct 2014
Posts: 490
I forgot ATN was only one way.

If we step back from the leaf and inspect the tree though is there room to be gained?

So at the moment the code does this
grabs GCR bytes, decodes on the fly to "bytes" for block
once done
Sends block to C128
looks for and syncs next block
repeats until done
right?

So while the burst transfer is not faster than the 2 bits + ATN transfer, it does leave the 2mhz CPU free, giving you 112 clocks where the CPU could be doing other things. Like say reading and decoding a byte off the disk? At which point you probably don't need the fancy decode table to do it any more, giving you more drive RAM?
So a burst transfer looks like
find start block
read gcr
convert
start sending byte
read gcr
convert
start sending byte
..go until all bytes transferred
but to loosen the timings, you could detect if the machine is ready and send byte direct, or buffer the byte, and send one from buffer next time you check.

Also if c128 is in 1Mhz mode you have upto 56 clocks to "deflate" the byte on the host? And if I don't care just get it to me as fast as possible, 112 clocks to "deflate" in 2mhz mode.

So while the transfer it self is no faster, you don't have a read from disk, send data, read from disk, you can just pull bytes from the disk and sent immediately, and then you have a constant stream, baring sector and track movements?

Also the 1571 has a drive side auto-boot, that lets you load blocks into Drive RAM at disk insert, saving you from having to install the drive code from the host?

Or even switch to MFM and let the WD1770 chip do the bits -> byte conversion, giving you more CPU time to sync, buffer and send?
2019-02-22 15:15
Krill

Registered: Apr 2002
Posts: 2980
Some apt ideas there, will have to think about them thoroughly.

But a few remarks:
- On 1571, the decoding tables are in ROM, so already free in terms of RAM usage. They are somewhat sub-optimal and entirely different to what I came up with for 1541 (standing on lft's shoulders), but still, 100% on-the-fly reading + decoding + checksumming with 2 MHz.
- Interleaving GCR read/decode and transfer implies no interruptions/DMA on the host side, which also implies screen off. (Pretty sure the byte-received detection and buffering is ruled out due to reasons.)
- When going down that route, 50x-ish speed (around 20 KB/s) is possible without burst on C-64+1541.
- 1571 auto-boot is rather slow.
- I know close to nothing about WD1770.
2019-02-22 15:23
oziphantom

Registered: Oct 2014
Posts: 490
all valid however I would counter
- 1571 auto-boot is rather slow.

Not the 128 auto boot, but the disk loading to its RAM, and yes its slow, but is it slower than the time from the user closing the disk handle and then typing RUN"* and pressing return in most cases? Or does the delay it causes after the user types run"* cause it to take longer then the run"* command instructing the drive to do its install?

- Interleaving GCR read/decode and transfer implies no interruptions/DMA on the host side, which also implies screen off. (Pretty sure the byte-received detection and buffering is ruled out due to reasons.)

Maybe its due to clocks, however maybe you could make it buffer a couple of bytes up front and then make it always read from the buffer, which removes the "check if I need to use buffer" penalty.
2019-02-22 15:33
Krill

Registered: Apr 2002
Posts: 2980
Certainly worth some investigation.

As for autoboot, not sure if the boot block has provisions to load stuff into drive RAM, but when booting a C-128 program, yeah, why not also load stuff into drive RAM in the same boot procedure.

Problem however, is that you cannot load into the entire RAM, unlike the old two-stage "upload from host computer" technique.
2019-02-22 16:38
oziphantom

Registered: Oct 2014
Posts: 490
the autoboot drive is very well hidden. The 1571 manual mentions it in passing. For details you need the Abacus 1571 internals book, section 3.1.5 (p99 in my copy), it's a custom USR file ( which can't be made with BASIC 7 commands and needs BASIC 3 XD ), you can chain the USR blocks together to load as many 255 blocks as you want, each block has its own "destination address" and size if you want to load smaller parts to various parts of Drive RAM. However reading it in detail now, it doesn't sound like it actually autoboots it, and you have to issue a open 1,8,15"&filename" command to which it will then load the USR file into drive ram and start executing it.
2019-02-22 16:40
Krill

Registered: Apr 2002
Posts: 2980
Sounds like the old "utility loader" feature, which is already present in 1541.

For the C-128 target of my loader (where i've added burst support), i did implement autoboot of the test application using the bootblock, but i can't find the reference of the bootblock layout right now.

So no idea atm whether the bootblock can also load stuff to drive RAM.
2019-02-22 16:44
oziphantom

Registered: Oct 2014
Posts: 490
*= $b00

	.text "cbm"         ; Autoboot signature.

	.word $0000         ; Load address for additional sectors. (T1, S1)
	.byte $00           ; Bank number for additional sectors.
	.byte $00           ; Number of sectors to load.

	.text "dropzone", $00   ; Boot message: "BOOTING..."

	.text "boot", $00       ; Program to load on boot.

	jmp $1c0e
    .align $100,0

I though there was something else as well from memory. in the the 128 auto boot uses T1 S0 and there was another autoboot system that used T0 or T2 or something like that, but it might be the 1581.
2019-02-23 08:52
ChristopherJam

Registered: Aug 2004
Posts: 1409
Quoting oziphantom
the 512K cart will already be compressed. It will have some code uncompressed to do the initial boot etc, but all the data will be compressed.


How is this 5-6 sides then? 512K is only 3*683 blocks, so 512k only takes more than 3 sides if you reserve space for pesky things like directories and track/sector links, surely?
2019-02-23 08:58
oziphantom

Registered: Oct 2014
Posts: 490
If you do file packing and use partial blocks, then sure. However when you factor in partial blocks + the loader code + side detection code + what disk is that file on data, it would push you to 4 sides. So you could probably squeeze it onto 4, but usually in such cases its better to keep a stash of common data on each side to limit the amount of flipping you need to do, and hence you start to spill into 5/6 sides.
Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
Didi/Laxity
Shogoon/Elysium/MSL
Guests online: 100
Top Demos
1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.6)
5 Edge of Disgrace  (9.6)
6 What Is The Matrix 2  (9.6)
7 The Demo Coder  (9.6)
8 Uncensored  (9.6)
9 Comaland 100%  (9.6)
10 Wonderland XIV  (9.6)
Top onefile Demos
1 Layers  (9.6)
2 No Listen  (9.6)
3 Party Elk 2  (9.6)
4 Cubic Dream  (9.6)
5 Copper Booze  (9.6)
6 Rainbow Connection  (9.5)
7 Dawnfall V1.1  (9.5)
8 Onscreen 5k  (9.5)
9 Morph  (9.5)
10 Libertongo  (9.5)
Top Groups
1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Triad  (9.3)
5 Censor Design  (9.3)
Top NTSC-Fixers
1 Pudwerx  (10)
2 Booze  (9.7)
3 Stormbringer  (9.7)
4 Fungus  (9.6)
5 Grim Reaper  (9.3)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.045 sec.