[CSDb] - User Forums - Shortest code for stable raster timer setup

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Shortest code for stable raster timer setup

2020-01-20 16:20

Krill

Registered: Apr 2002
Posts: 2980

Shortest code for stable raster timer setup

While working on my ICC 2019 4K entry (now postponed to ICC 2020, but i hope it'll be worth the wait), i came up with this (14 bytes):

initstabilise   lda $d012
                ldx #10          ; 2
-               dex              ;   (10 * 5) + 4
                bpl -            ; 54
                nop              ; 2
                eor $d012 - $ff,x; 5 = 63
                bne initstabilise; 7 = 70

                [...]; timer setup

The idea is to loop until the same current raster line is read at the very beginning (first cycle) and at the very end (last cycle) of a raster line, implying 0 cycles jitter.

With 63 cycles per line on PAL, the delay between the reads must be 63 cycles (and not 62), reading $d012 at cycle 0 and cycle 63 of a video frame's last line (311), which is one cycle longer due to the vertical retrace.

The downside is that effectively only one line per video frame is attempted, so the loop may take a few frames to terminate, and the worst case is somewhere just beyond 1 second.

The upside is that it always comes out at the same X raster position AND raster line (0), plus it leaves with accu = 0 and X = $ff, which can be economically re-used for further init code.

Now, is there an even shorter approach, or at least a same-size solution without the possibly-long wait drawback?

... 177 posts hidden. Click here to view all posts....

2020-07-02 18:19

JackAsser

Registered: Jun 2002
Posts: 2014

Quote: This is an idea I got after talking to Copyfault.
At least in the cycle-correct version of Vice (i.e., x64sc) this seems to work. Haven't tried on a real machine.

* = $0f00 ; Some address with (H+1)&1 = 0 and (H+1)&$10 = $10 ldy #$00 loop: ldx #$11 shx cont, y cont: bpl loop

It uses the fact that we will AND the written value with H+1 unless a badline pauses the CPU between the third and fourth cycle of shx. The latter then changes the "bpl" into an "ora" and drops us out of the loop at horizontal position 61.

Haha! Wow!

2020-07-02 19:05

Burglar

Registered: Dec 2004
Posts: 1101

Quoting Quiss

* = $0f00 ; Some address with (H+1)&1 = 0 and (H+1)&$10 = $10 ldy #$00 loop: ldx #$11 shx cont, y cont: bpl loop

wait what?? I need to look up SHX... at first glance this does not make any sense to me :)

2020-07-02 19:17

ChristopherJam

Registered: Aug 2004
Posts: 1409

Holy shit, that’s brilliant! Well found.

2020-07-02 19:19

Burglar

Registered: Dec 2004
Posts: 1101

even Crossbow cannot beat this!

2020-07-02 19:29

chatGPZ

Registered: Dec 2001
Posts: 11386

i so have to steal this and use as an example in my pdf :)

edit: quick test on C64 confirms it works :)

2020-07-02 20:16

JackAsser

Registered: Jun 2002
Posts: 2014

Quote: i so have to steal this and use as an example in my pdf :)

edit: quick test on C64 confirms it works :)

So at a controlled X pos but at a ”random” y*8+c pos depending on $d011, which is good enough to launch a 63c timer ofc.

2020-07-02 20:43

TWW

Registered: Jul 2009
Posts: 545

Damn, nice one.

2020-07-03 00:44

Copyfault

Registered: Dec 2001
Posts: 478

Quoting Quiss

This is an idea I got after talking to Copyfault.
At least in the cycle-correct version of Vice (i.e., x64sc) this seems to work. Haven't tried on a real machine.

* = $0f00 ; Some address with (H+1)&1 = 0 and (H+1)&$10 = $10 ldy #$00 loop: ldx #$11 shx cont, y cont: bpl loop

It uses the fact that we will AND the written value with H+1 unless a badline pauses the CPU between the third and fourth cycle of shx. The latter then changes the "bpl" into an "ora" and drops us out of the loop at horizontal position 61.

Lovely!!! Quiss, I knew you will come up with exactly this kind of brilliance sooner or later. Sooo good to have you back;)

If you want to "overdo" (optimize, erm) this, let's save another 2 bytes:

* = $0faa  ; _a very nice_ address with (H+1)&1 = 0 and (H+1)&$10 = $10

0FAA   loop:  ldx #$11
0FAC          shx cont, y
0FAF   cont:  bpl loop

with start adress $0FAD (you guess the operand bytes of the SHX ; ))

Branching directly to the SHX-opcode should also work (8-cycle loop instead of 10-cycle loop, both coprime to 63), though I'm not sure which one will be faster.

Only "drawback" is that you do not know at which raster position you end up, only that it will be (at the very end of) a badline. Not too bad for my taste :))

2020-07-03 08:47

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

Quoting Copyfault

If you want to "overdo" (optimize, erm) this, let's save another 2 bytes:

* = $0faa ; _a very nice_ address with (H+1)&1 = 0 and (H+1)&$10 = $10 0FAA loop: ldx #$11 0FAC shx cont, y 0FAF cont: bpl loop
with start adress $0FAD (you guess the operand bytes of the SHX ; ))

with start adress $0FAD (you guess the operand bytes of the SHX ; ))

Interesting idea, but I do not completely understand it. How does the Y register get the right value?

I was thinking about possibly saving one byte, if one could find a suitable start address and a ZP location with the right contents after entering from basic

* = $????     ;magic address that allows us to save 1 byte

      lax ZP  ;another one of those magic addresses
      tay
loop: shx cont,y
cont: bpl loop:

There might also exist variations where you let the SHX instruction change itself or change the value after the BPL into f.e. 0 (or another suitable value).

2020-07-03 17:28

Quiss

Registered: Nov 2016
Posts: 43

Neat! Right, no reason to make those two address bytes go to waste. :)

Another amusing thing to contemplate is how this code could be placed at, say, $08xx. Preferably without messing up the basic upstart.

Also, careful with the loop length. The number of CPU cycles between two badlines is 461, except when the loop's one write cycle (last cycle of SHX) sneaks into the three cycle RDY grace period. Then it's 462 ticks.
(Imagine a graph with n nodes, in which node i is connected to node (i+461)%n for 0 < i < n-1 and to (i+462)%n for i = 0. Node n-1 isn't connected to anything. You want that graph to be acyclic.)
In the range 5-20, the lengths that do work are 5, 10, 12, 16, 18 and 19. But note that in particular, length 8 (a.k.a. branching directly to the SHX) does not.