[CSDb] - User Forums - Release id #167152 : Krill's Loader, repository version 164

You are not logged in - nap

CSDb User Forums

Forums > CSDb Entries > Release id #167152 : Krill's Loader, repository version 164

2018-08-13 21:37

Krill

Registered: Apr 2002
Posts: 2845

Release id #167152 : Krill's Loader, repository version 164

If no problems emerge (i know they will, but anyways)... I can explain a bit about the full on-the-fly GCR block read+decode+checksumming.

... 36 posts hidden. Click here to view all posts....

2018-08-21 10:32

Krill

Registered: Apr 2002
Posts: 2845

Right, LOAD_RAW_API should have no or minimal impact on speed.

The options which do make it somewhat slower are NTSC_COMPATIBILITY, as you said, and also LOAD_UNDER_D000_DFFF and LOAD_VIA_KERNAL_FALLBACK.

Exomizer, despite big speed improvements from version 2 to 3, is still among the or the slowest one.

For tinycrunch vs. *nax, it may depend more on the actual corpus of test files what's faster with combined loading + depacking. The pack ratio diff vs. depacking speed diff ratio may or may not tilt the scale in favour of one or the other, depending on the actual file.

2018-08-28 19:55

Sparta

Registered: Feb 2017
Posts: 39

Krill, first of all congratulations, your loader is truly a masterpiece. I spent considerable time with deciphering it and I think now I understand what you are doing. The GCR loop is an amazing feat. One of its major advantages vs checksum verification integrated in either side of the transfer loop is that you do not need to wait with changing tracks until after transfer of the last block in a track is completed. Shrydar stepping cuts the delay to 12 bycles. This, however, can be completely eliminated. The following (Spartan) method provides a seamless and uninterrupted transfer of data across neighboring tracks. This is how it works in the latest version of my loader developed for personal use:

		lda	$1c00		//First half-track step
		sec
		rol
		and	#$03
		eor	$1c00
		sta	$1c00		//Update VIA 2 Port B

		sec			//Calculate second half step...
		rol
		and	#$03
		eor	$1c00
		sta	LastStep+1	//…and save it for later

Then start data transfer immediately:

		ldy	#$00
		…
		lsr
		dey
TrBranch:	bmi	Loop		//Send #$81 bytes first, then the remaining #$7f

		bit	$1800
		bpl	*-3
		sta	$1800		//Last 2 bits completed

		lda	#$d0		//Replace "BMI" with "BNE"
		sta	TrBranch
LastStep:	lda	#$00
		sta	$1c00		//Update VIA 2 Port B
		cpy	#$00		//
		bne	Loop2		//Back to transfer if not done
					//C64 loop has a similar delay built in

This can be adopted to almost any transfer loop reducing delay to a few cycles.

2018-08-28 21:20

Krill

Registered: Apr 2002
Posts: 2845

Sparta: Thanks! :)

I've considered something like your method (Spartan Stepping :D), but ultimately decided against it.

Its central concept is issueing the second half-track step in the middle of the block transfer.

However, this poses a few problems in a general-purpose standard format loader:
- The computer-side resident code needs to be aware of the slight delay in the middle and wait accordingly, which would increase resident code size ("C64 loop has a similar delay built in", as commented in your example).
- The computer-side code needs to be aware that the currently-transferred block is, indeed, the final file block of the current track, otherwise the extra delay would be in vain (and possibly a net loss due to just-missed following blocks). This would increase resident code size and also require that information to be sent to the resident code somehow, meaning extra protocol overhead.
- The drive-side code is extremely tight as it is (tightest code i ever made, and i've squeezed and squeezed again to fit in everything i needed to fit). It might not be possible to use this approach without throwing out some other functionality.

2018-08-28 22:07

Sparta

Registered: Feb 2017
Posts: 39

Yes, you got it. Spartan stepping uses the transfer loop to pace half-track steps instead of a timer. :)

I respectfully disagree with your second point. The computer-side code does not need to know whether the currently transferred block is the final block of a track. Thus, code can be simplified. Fetching and transferring a block takes roughly 27000-29000 cycles depending on speed zones. Spartan stepping adds 17 cycles to this (72 vs. 72.06 bycles/block transfer). I do not think this causes a significant delay resulting in missing the next block. The total loss while loading a full 35-track disk is 664*17= 11288 cycles, spread out evenly. Shrydar stepping, on the other hand, adds 12*256*34=104448 cycles delay. The difference is about 10-fold.

After the on-the-fly GCR loop and the 72-cycles/byte transfer loop, Spartan stepping was the first thing that resulted in a significant speed improvement in my loader.

I can see your point in your third comment. Your code's complexity-to-tightness ratio is extremely high. :)

2018-08-29 11:14

ChristopherJam

Registered: Aug 2004
Posts: 1378

shrydar here.

Yes, I have wondered about doing a half track step mid transfer too, but I think the pertinent performance metric is percentage time saved, which even at interleave of three is only 0.5% (3072 cycles every 600,000 - and that's assuming no errors, and either perfectly aligned tracks or out of order loading).

Either shrydar stepping or spartan stepping is a huge improvement over the old "wait until you're about to try and read the next block, then spend 60+ bycles on stepping and stabilisation" mind. The biggest win is almost certainly from allowing the head to settle during the transfer.

I'm still undecided about when to do the second step in Marmaload; my own loader development's been on hold while I've been distracted by crunchers and demo effects.

At this rate I suspect that'll remain the case until I've at least one production out the door using Krill's instead - we'll see :)

2018-08-29 11:30

Krill

Registered: Apr 2002
Posts: 2845

Quoting Sparta

I respectfully disagree with your second point. The computer-side code does not need to know whether the currently transferred block is the final block of a track.

You're probably right there. But the first and third points alone seem to prohibit Spartan Stepping in my case. And yes, what Shrydar aka ChristopherJam said. :)

2018-08-29 16:32

Sparta

Registered: Feb 2017
Posts: 39

Quoting ChristopherJam

The biggest win is almost certainly from allowing the head to settle during the transfer.

Agreed on this. In my loader Sparkle, which will never be as versatile as Krill's, I think I am going to settle (huh) with the best of both worlds. I.e. I will time the second half-track step in the transfer loop about 12 bycles after the first one to allow enough time for the head to settle. Call it the Spartan Shrydar Step. :))

P.S. I was aware of the mysterious Shrydar's identity. Google knows everything. :)

2018-08-30 10:10

bubis
Account closed

Registered: Apr 2002
Posts: 18

Quote: Quoting ChristopherJam
The biggest win is almost certainly from allowing the head to settle during the transfer.

Agreed on this. In my loader Sparkle, which will never be as versatile as Krill's, I think I am going to settle (huh) with the best of both worlds. I.e. I will time the second half-track step in the transfer loop about 12 bycles after the first one to allow enough time for the head to settle. Call it the Spartan Shrydar Step. :))

P.S. I was aware of the mysterious Shrydar's identity. Google knows everything. :)

We just don't know who you are, my Hungarian fellow. :)

2019-03-13 06:39

map

Registered: Feb 2002
Posts: 27

Quoting Groepaz

python has the same problems, i think (is it a single exe requiring no install?) (the point is: right now "our" dev environment is completely freestanding requiring no msys or cygwin or any of that, and no installing either - breaking that is not an option :))

the problem with ca65 are at least nucrunch and tinycrunch, those will not build.

One possibility might here to use Pyinstaller with option --onefile to create an .exe from the .py.
https://pypi.org/project/PyInstaller/
Using the UPX packer you can minimize the filesize of the .exe.

2019-03-14 08:43

Krill

Registered: Apr 2002
Posts: 2845

Quoting Groepaz

python has the same problems, i think (is it a single exe requiring no install?) (the point is: right now "our" dev environment is completely freestanding requiring no msys or cygwin or any of that, and no installing either - breaking that is not an option :))

Apparently, this does seem to exist: http://winpython.github.io/ - "The easiest way to run Python [...] out of the box on any Windows PC, without installing anything!", "WinPython lives entirely in its own directory, without any OS installation" and similar claims. There's some small print, though, so YMMV*.

* "Your metrage may vary" in PAL-land.

Previous - 1 | 2 | 3 | 4 | 5 | 6 - Next

Refresh

Subscribe to this thread: