[CSDb] - User Forums - Shortest code for stable raster timer setup

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Shortest code for stable raster timer setup

2020-01-20 16:20

Krill

Registered: Apr 2002
Posts: 2980

Shortest code for stable raster timer setup

While working on my ICC 2019 4K entry (now postponed to ICC 2020, but i hope it'll be worth the wait), i came up with this (14 bytes):

initstabilise   lda $d012
                ldx #10          ; 2
-               dex              ;   (10 * 5) + 4
                bpl -            ; 54
                nop              ; 2
                eor $d012 - $ff,x; 5 = 63
                bne initstabilise; 7 = 70

                [...]; timer setup

The idea is to loop until the same current raster line is read at the very beginning (first cycle) and at the very end (last cycle) of a raster line, implying 0 cycles jitter.

With 63 cycles per line on PAL, the delay between the reads must be 63 cycles (and not 62), reading $d012 at cycle 0 and cycle 63 of a video frame's last line (311), which is one cycle longer due to the vertical retrace.

The downside is that effectively only one line per video frame is attempted, so the loop may take a few frames to terminate, and the worst case is somewhere just beyond 1 second.

The upside is that it always comes out at the same X raster position AND raster line (0), plus it leaves with accu = 0 and X = $ff, which can be economically re-used for further init code.

Now, is there an even shorter approach, or at least a same-size solution without the possibly-long wait drawback?

... 177 posts hidden. Click here to view all posts....

2020-01-31 23:50

Copyfault

Registered: Dec 2001
Posts: 478

Quoting Copyfault

[...]
But back to your INC-based solution: why does it take 9 frames at most?[...]

This kept me awake for quite some time now. Think I have an explanation for it - finally!

If I do the calculations correctly (read: set up my surrounding framework including those frame-counters right;)), the lda-based method takes at most 7 frames. Uh, why is it now 7, even less than those 9 frames maximum for the inc-based approach?

The answer lies in the respective entry points of the delay loops. Taking a look at the INC-method, we see that it starts with

waitline:
   inc $d012
   bne waitline
   ...

If this first waiting loop has finished, the delay part begins (that we decided upon to be filled with init code f.e.). To simplify things, let's hold the case of starting this code in the middle of line=$ff (it would instantly come true whilst being off more than 9 cycles from the start of that line) back for a moment. How many cycles are over when leaving the waiting loop? It's 4 cycles iff $d012=$ff on the fourth R-cycle of the INC, but it amounts to 12 cycles iff $d012=$ff happens one cycle later! So this gives a variance of 12-4=8.

Exactly this variance is what we need to get rid of to have a stable raster. The INC-&LDA-loops presented in this thread cancel one cycle of variance per frame. For the INC-approach, this means we need 8 frames for the worst case (i.e. 12 cycles off).
Now we still have that "bad case" I had ignored for the sake of simplification. In fact, it does not do too much harm: in case the loop really starts mid of the testing line ($ff in the INC-approach), the first delay loop run will go fail. As the loop construction ensures 71 cycles between each $d012-checks at the start of each delay loop, with ggT(71,63)=1 (coprimeness) plus the fact that one run of the waiting loop is 9, the next start of the delay loop will be at a cycle c of type c = 9*k + 71 = 9*(k+1) - 1 #= -1 (mod 9) [mind that 9 is a factor of 63=7*9, thus skipping a multiple of 9 will get you to the exact same cycle position of any other line (or the same in the next frame); that -1(mod9) ensures that the position is changed!]
This means, from the second run of the delay loop onwards till the end, we step through the cycles of the first nine cycles of the line.
So back to counting the no. of frames that is needed at most: this "bad case" adds one to this frame count. So the INC-approach has a max frame count of 9.

Looking at the LDA-based method, we have a waiting loop like this:

waitline:
   lda $d012
   bne waitline
   ...

This part is finished 2-8 cycles after the beginning of line=$00. Following the above arguments, this approach needs at most 6+1(for the "bad case")=7 frames. Interesting fact is that the waiting loop here also needs a factor of 63 (=7*9), i.e. 7 cycles for one run. So here we have that c-formula like this: c = 7*k + 71 = 7*(k+10) + 1 #= 1 (mod 7). Thus we deal with a 7-cycle window in this case.
One other thing to mention is that with that lda, there's no chance to check explicitly for a unique rasterline (or you use compare opcodes, but it'll take more bytes!!!). The fact that line=$000 consists only of 62 cycles and the construction of the delay loop ensure that the check if this line will always fail. This is no real problem either, as we hit line=$100 once per frame, so the overall approach will come to an end!

Maybe someone is interested enough to read this, maybe this was all clear to you. Anyway, I felt the urge to write it down now that I finally understood it (I think).

2020-02-01 00:06

Copyfault

Registered: Dec 2001
Posts: 478

Quote: sorry I did not construct it properly with fast I meant it stabilizes fast, with that I mean max ~0.3 seconds a time span that for us humans doesnt matter :) so 9 frames max will do. however looking at the new version and explanation: your skills at this are truly impressive sir.

Ah come on, I'm just too fond of playing aroung with things that seem to keep certain mathematical mysteries inside;) Does not really help to get things *done*
To the opposite: I'd say you are the one to adore here! Will never ever reach that level of coding that you simply own, Oswald! I mean it:)

But thanks for your kind words. Gives me the positive feeling that there are people like you out there that care about explanations'n'stuff!

2020-02-01 11:54

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

I find this problem surprisingly hard to understand. I think I get most of what you are saying, but aren't you neglecting the presence of badlines? The number of cycles available to the CPU is less on badlines and can even vary because of RMW instructions in the init code. So it seems there may be cases where neither of the approaches (INC, LAX) locks. Or am I mistaken?

2020-02-01 12:02

Copyfault

Registered: Dec 2001
Posts: 478

Quote: I find this problem surprisingly hard to understand. I think I get most of what you are saying, but aren't you neglecting the presence of badlines? The number of cycles available to the CPU is less on badlines and can even vary because of RMW instructions in the init code. So it seems there may be cases where neither of the approaches (INC, LAX) locks. Or am I mistaken?

I compared the approaches with having a clean setup before doing the stabilization routine, i.e. no badlines, no irqs.

If you allow badline f.e., my reasoning of the large posting above does not hold true anymore. I did some measurements yesterday that show both approaches take more than 9(resp. 7)frames with badlines enabled. I have to admit that I had no motivation to do the calculations respecting the badlines inbetween, but it *could* be done...

2020-02-01 12:13

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

I guess it can be easily fixed by blanking the screen in the init code. This is often required anyway when setting up the graphics, so this is not really a constraint.

I have tried to analyze my timer-based approach.

One loop takes 18 cycles. Between the same cycle of two consecutive badlines there are 461 available cycles. If the STA ZP starts on a certain cycle of a badline (and there is no lock), it will start 7 cycles later on the next badline, because 461 = 26*18 - 7. Since a non-locking badline has 20 cycles which is not coprime with 7, the algorithm will always lock.

What are your thoughts about this?

2020-02-01 16:54

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

I can shave off one byte:

sync: lax $dc04
      sbx #51
      sta ZP      ;RMW instruction
      cpx $dc04
      bne sync:

The loop is 16 cycles and since 461 = 29*16 - 3, this also should always lock. It needs at most 20 consecutive badlines, so the very worst case is that the lower border is reached after 19 badlines and you have to start again at the first badline. So locking is guaranteed in less than 1.4 frames.

2020-02-01 21:36

JackAsser

Registered: Jun 2002
Posts: 2014

Quote: I can shave off one byte:

sync: lax $dc04 sbx #51 sta ZP ;RMW instruction cpx $dc04 bne sync:

The loop is 16 cycles and since 461 = 29*16 - 3, this also should always lock. It needs at most 20 consecutive badlines, so the very worst case is that the lower border is reached after 19 badlines and you have to start again at the first badline. So locking is guaranteed in less than 1.4 frames.

Exploiting kernel setup values in dc04 and dc05 (different on PAL and NTSC)?! But we're only in PAL domain in this thread anyways.

2020-02-01 22:24

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

See post #38 for what I have in mind. Do you think this could work? I'm always a little bit afraid that I missed something.

It should lock also on NTSC since 477 = 30*16 - 3, but the routine exits on a different cycle number.

2020-02-02 12:09

Copyfault

Registered: Dec 2001
Posts: 478

Quoting Rastah Bar

I guess it can be easily fixed by blanking the screen in the init code. This is often required anyway when setting up the graphics, so this is not really a constraint.[...]

Forgot to stress this detail, but I had this in mind: you even do not have to set it before the start of the stabilization loop (the first check of $d012 might be a "bad case" anyway), so it suffices to blank screen/kill irqs/etc in the init code blob.

Quoting Rastah Bar

I can shave off one byte:

sync: lax $dc04 sbx #51 sta ZP ;RMW instruction cpx $dc04 bne sync:

The loop is 16 cycles and since 461 = 29*16 - 3, this also should always lock. It needs at most 20 consecutive badlines, so the very worst case is that the lower border is reached after 19 badlines and you have to start again at the first badline. So locking is guaranteed in less than 1.4 frames.

This one looks quite clever, though I did not deep-check "all the math" behind it. One thing (besides the badline-timing) that might also cause a cycle-mismatch at the cpx $dc04-instruction is the behaviour of the timers: afair, it never reaches the $00-value, but gets initialized with the max-value (so $dc04 outputs the same value in two consequetive cycles, but never $00).
And as a sidenote: a STA ZP is just a write-instruction, no Read-Modify-Write (the RMW-comment in your code examples confused me a little;)). But the idea you posted with one write-cycle is correct and should work...

2020-02-02 18:21

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

Quoting Copyfault

Quoting Rastah Bar
I guess it can be easily fixed by blanking the screen in the init code. This is often required anyway when setting up the graphics, so this is not really a constraint.[...]
Forgot to stress this detail, but I had this in mind: you even do not have to set it before the start of the stabilization loop (the first check of $d012 might be a "bad case" anyway), so it suffices to blank screen/kill irqs/etc in the init code blob.

Yes, you are right.
Quoting Copyfault

This one looks quite clever, though I did not deep-check "all the math" behind it. One thing (besides the badline-timing) that might also cause a cycle-mismatch at the cpx $dc04-instruction is the behaviour of the timers: afair, it never reaches the $00-value, but gets initialized with the max-value (so $dc04 outputs the same value in two consequetive cycles, but never $00).
And as a sidenote: a STA ZP is just a write-instruction, no Read-Modify-Write (the RMW-comment in your code examples confused me a little;)). But the idea you posted with one write-cycle is correct and should work...

Thanks for your feedback. $dc04 can reach 0 because it is linked with $dc05. So as long as $dc05>0, $dc04 goes from 0 to $ff, and there is no problem. But when ($dc05,$dc04)=$0001 it goes directly to $4025 after that (on PAL), but that cannot cause an accidental lock. It only may delay the locking a bit. So there is no problem there, I think.

You are right, STA ZP is an RRW instruction, but the W-cycle at the end is important.

Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ... | 19 - Next

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

katon/Lepsi De
rambo/Therapy/ Resou..
LightSide
Martin Piper
Flashback
Alakran_64
Matt
Hein
LDX#40
Guests online: 113

Top Demos

1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.6)
5 Edge of Disgrace  (9.6)
6 What Is The Matrix 2  (9.6)
7 The Demo Coder  (9.6)
8 Uncensored  (9.6)
9 Comaland 100%  (9.6)
10 Wonderland XIV  (9.6)

Top onefile Demos

1 Layers  (9.6)
2 No Listen  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.6)
6 Rainbow Connection  (9.5)
7 Dawnfall V1.1  (9.5)
8 Onscreen 5k  (9.5)
9 Morph  (9.5)
10 Libertongo  (9.5)

Top Groups

1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Triad  (9.3)
5 Censor Design  (9.3)

Top NTSC-Fixers

1 Pudwerx  (10)
2 Booze  (9.7)
3 Stormbringer  (9.7)
4 Fungus  (9.6)
5 Grim Reaper  (9.3)

Page generated in: 0.073 sec.