[CSDb] - User Forums - GCR decoding on the fly

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > GCR decoding on the fly

2013-03-31 12:46

lft

Registered: Jul 2007
Posts: 369

GCR decoding on the fly

Here's how to do it:

http://linusakesson.net/programming/gcr-decoding/index.php

... 149 posts hidden. Click here to view all posts....

2013-04-08 07:35

ChristopherJam

Registered: Aug 2004
Posts: 1424

I finally got around to reading lft's post. Amazing work, especially realising you can get down to just two tables. And yes, squeezing every last cycle out of a loop can be quite the brainworm :)

Well done!

2013-04-08 19:52

lft

Registered: Jul 2007
Posts: 369

I got some benchmark figures for my loader. The unit is the number of
revolutions needed to load a track, a.k.a. optimal interleave (although with
out-of-order loading you don't need to think about interleave). The test
conditions are: No sprites, no interrupts, 25 badlines. This models the most
optimal setup which is useful in practice, either for silent loading while
displaying the BASIC screen (or something else), or for loading with a blanked
screen and a normal sid playroutine being called every frame.

The first row represents the version of the loader used in Shards of Fancy, but
without any decrunching. This version verifies the checksum in a separate pass
after reading a sector, to detect read errors. Then the checksum is verified
again on the C64 side to detect transmission errors.

As I mentioned to Krill at Revision, I had an idea to combine these into a
single checksum verification performed on the C64 side, and then re-read
(possibly another sector) and re-transmit on error. This was implemented, and
corresponds to the second row in the table.

Finally, I optimised the transfer routine and got it down to 74 C64 cycles per
byte. This is a regular atn handshake protocol, with the checksum computed
during transfer. The correct checksum is transmitted as an extra byte at the
end. The performance of this version is shown in the third row.

For easy comparison, I also computed a rough loading speed for this last
version by dividing the number of bytes loaded by the time needed for the given
number of revolutions. This figure should not be confused with actual loading
speed, for which you'd also need to take into account such things as overhead
from the high-level format (necessary to compensate for out-of-order loading),
track stepping, motor spin-up time and skipping sectors that don't belong to
the file. But it provides a rough estimate, and a maximum.

I have verified that the latest version works on real hardware, but the
measurements were obtained using Vice.

                       
                        track:   1-17   18-24   25-30   31-35
                                -----------------------------
v1 (shards)                         4       4       4       3
v2 (combined checksum)              4       4       3       3
v3 (74-cycle transfer)              4       3       3       3
v3 raw loading speed (B/s)       6720    8107    7680    7253

2013-04-09 06:38

HCL

Registered: Feb 2003
Posts: 731

Interesting results! I made some tweaks to my own loader also last week, re-introduced SAX to get zero overhead for the reading loop. This however gives only a speed increase of less than 10% from before (Cycle loader, EoD etc..).

Fair enough, then i went on to the transfer loop, which is (just as LFT's loop) 74 cycles. Here there should be room for some optimizations, but when i cut cycles, the transfer screws up! There is probably some theoretical explanation to this, less than 18 cycles between each read of $dd00 makes it not work. That is when using ANT handshake of course, else it's possible to reduce it a lot..

Anyone got any ideas why 18 cycles seem to be the limit? If it's confirmed then i'm pretty much done.. 2 cycles left to optimize, and that's all, not even sure i'm going for those 2 in that case :).

2013-04-09 06:46

Fungus

Registered: Sep 2002
Posts: 752

Be sure to check it on some 1541-II and 1571 drives, later VIA revision sometimes need an extra cycle for handshake.

2013-04-09 06:50

HCL

Registered: Feb 2003
Posts: 731

I think 18 cycles is used by most loaders, at least on some of the 2-bit-pairs. I have 18+18+20+18, most others have more, but if it doesn't work with 18, then 90% of all modern demos would not work on 1571 or 1541-II. I have a 1541-II myself and it works there of course :).

Now this is on the computer side, i should say. On the drive you should of course go below 18, at least here and there, to be safe if the drive is running a fragment faster than the computer.

2013-04-09 10:45

Krill

Registered: Apr 2002
Posts: 3098

Interesting results! I get close to them with my planned speed-ups, but don't quite reach them yet. Must hurry now to optimize a bit more and push the next release out the door i guess :)

I have added a new experimental protocol reaching 70 cycles per byte (including loop and store overhead) a while ago, this has a few strange-seeming limitations though (like 0 or 5-8 sprites are okay, but not 1-4). No sprite limitations gets it to a whopping 82 cycles. There might be room for improvement in both versions, but this is yet to be explored.

As for the 18-cycle limit with plain 2bit+ATN, which i confirm: My explanation is that waiting for ATN flip in a loop is 6 cycles minimum, then 7 cycles for a miss which does happen, then 4 cycles to set next bitpair, then another cycle due to slightly different clocks, wire delay, missed sampling windows and whatnot. Makes 6+7+4+1=18 cycles.

HCL: I might be wrong, but the drive being slightly faster actually gives you more than 18 cycles here and there on the drive side, according to my understanding.

2013-04-09 11:43

HCL

Registered: Feb 2003
Posts: 731

Quote:

HCL: I might be wrong, but the drive being slightly faster actually gives you more than 18 cycles here and there on the drive side, according to my understanding.

Oh, yes of course :). So, in case the drive is a fragment slower, you need to go below 18 cycles here and there. The drive is after all waiting for the computer when needed. I tend to believe my transfer loop is actually working since it has been around for ~10 years by now in numerous of demos. Don't know if i have done loading while displaying a sprite multiplexer though, with loading *on* the sprites :). Perhaps the AFLI-zoomer in EoD?..

2013-04-09 11:54

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quoting Krill

As for the 18-cycle limit with plain 2bit+ATN, which i confirm: My explanation is that waiting for ATN flip in a loop is 6 cycles minimum, then 7 cycles for a miss which does happen, then 4 cycles to set next bitpair, then another cycle due to slightly different clocks, wire delay, missed sampling windows and whatnot. Makes 6+7+4+1=18 cycles.

I think I've managed 16 cycles actually (66.5 per byte in practice with 2x unrolling.)

The trick is to reduce the delay between reading the bits and flipping ATN by combining both in a single RMW instruction (e.g. SLO/SRE.)

2013-04-09 12:18

Krill

Registered: Apr 2002
Posts: 3098

Hmm, how does that speed up the drive side, which is the bottleneck here, as it has to wait for the C-64 and respond to ATN flips asap?

2013-04-09 12:28

HCL

Registered: Feb 2003
Posts: 731

i would say the computer side is the bottle neck, at least i have NOPs in my transfer loop on the computer side.

@Doynax: Hehe.. cool. And you were actually able to do something useful with that data you got from those instructions also.. Impressing!

Previous - 1 | ... | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | ... | 16 - Next

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

Guests online: 350

Top Demos

1 Next Level  (9.7)
2 13:37  (9.7)
3 Codeboys & Endians  (9.7)
4 Mojo  (9.6)
5 Coma Light 13  (9.6)
6 Edge of Disgrace  (9.6)
7 Signal Carnival  (9.6)
8 Wonderland XIV  (9.5)
9 Uncensored  (9.5)
10 Comaland 100%  (9.5)

Top onefile Demos

1 Nine  (9.7)
2 Layers  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.5)
6 Scan and Spin  (9.5)
7 Onscreen 5k  (9.5)
8 Grey  (9.5)
9 Dawnfall V1.1  (9.5)
10 Rainbow Connection  (9.5)

Top Groups

1 Artline Designs  (9.3)
2 Booze Design  (9.3)
3 Performers  (9.3)
4 Oxyron  (9.3)
5 Censor Design  (9.3)

Top Webmasters

1 Slaygon  (9.7)
2 Perff  (9.6)
3 Sabbi  (9.5)
4 Morpheus  (9.4)
5 CreaMD  (9.1)

Page generated in: 0.079 sec.