| |
Krill
Registered: Apr 2002 Posts: 2940 |
Shortest code for stable raster timer setup
While working on my ICC 2019 4K entry (now postponed to ICC 2020, but i hope it'll be worth the wait), i came up with this (14 bytes):initstabilise lda $d012
ldx #10 ; 2
- dex ; (10 * 5) + 4
bpl - ; 54
nop ; 2
eor $d012 - $ff,x; 5 = 63
bne initstabilise; 7 = 70
[...]; timer setup The idea is to loop until the same current raster line is read at the very beginning (first cycle) and at the very end (last cycle) of a raster line, implying 0 cycles jitter.
With 63 cycles per line on PAL, the delay between the reads must be 63 cycles (and not 62), reading $d012 at cycle 0 and cycle 63 of a video frame's last line (311), which is one cycle longer due to the vertical retrace.
The downside is that effectively only one line per video frame is attempted, so the loop may take a few frames to terminate, and the worst case is somewhere just beyond 1 second.
The upside is that it always comes out at the same X raster position AND raster line (0), plus it leaves with accu = 0 and X = $ff, which can be economically re-used for further init code.
Now, is there an even shorter approach, or at least a same-size solution without the possibly-long wait drawback? |
|
... 177 posts hidden. Click here to view all posts.... |
| |
Krill
Registered: Apr 2002 Posts: 2940 |
Hah, that's pretty dirty. :)
I briefly considered DMA-based methods, but yeah, they usually come with visual artefacts or VSP hazards and the like.
One could argue that- nop
lda $d012
lsr
asl
[54 cycles worth of user code not touching the accu]
cmp $d012
bne - with 11 bytes net size as proposed by Frantic is shorter, though. =D |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1403 |
Haha well if we can pad with other code, just put something else that doesn't touch X in place of dex:bmi *-1, and we're down to 10 bytes :)
But yes, I'm not that keen on visible artefacts even for a frame. Easier to do a DMA on line $30 at the start of a blanked frame, and put all the init code somewhere that'll be overwritten by decrunched graphics or mainloop code. |
| |
Krill
Registered: Apr 2002 Posts: 2940 |
Quoting ChristopherJamHaha well if we can pad with other code, just put something else that doesn't touch X in place of dex:bmi *-1, and we're down to 10 bytes :) True, but it's quite hard to hit exactly, uhm, 1189 cycles. :)
Quoting ChristopherJamand put all the init code somewhere that'll be overwritten by decrunched graphics or mainloop code. In the usual size coding categories, you want the init code to be as small as possible as well, though, as the executable size counts. :) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1403 |
Well, I'm assuming more "just enough to pad the gap between comparison becoming true and being in the DMA enabled area", so just a couple dozen cycles should be safe.
Fair point on minimizing initcode. |
| |
Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
Quoting KrillWith 63 cycles per line on PAL, the delay between the reads must be 63 cycles (and not 62), reading $d012 at cycle 0 and cycle 63 of a video frame's last line (311), which is one cycle longer due to the vertical retrace.
Funny, didn't know that. Does Vice emulate it? Hoxs doesn't. |
| |
Krill
Registered: Apr 2002 Posts: 2940 |
Quoting Rastah BarQuoting KrillWith 63 cycles per line on PAL, the delay between the reads must be 63 cycles (and not 62), reading $d012 at cycle 0 and cycle 63 of a video frame's last line (311), which is one cycle longer due to the vertical retrace.
Funny, didn't know that. Does Vice emulate it? Hoxs doesn't. From https://sourceforge.net/p/vice-emu/code/HEAD/tree/trunk/vice/sr.. /* Line 0 is 62 cycles long, while line (SCREEN_HEIGHT - 1) is 64
cycles long. As a result, the counter is incremented one
cycle later on line 0. */ |
| |
Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
Oh, I see. Thanks. |
| |
Copyfault
Registered: Dec 2001 Posts: 468 |
Quoting KrillAh, bummer. This is the correct one with even-numbered lines only. =)initstabilise ldx #10
lda $d012
lsr ; 2
asl ; 2
- dex ; (10 * 5) + 4
bpl - ; 54
cmp $d012 ; 4 = 62
bne initstabilise; 9 = 71 Absolutely fantastic, Krill! It feels tempting to shave off another byte by exchanging [lsr:asl] by an and-instruction, i.e.newline_loop:
ldx #$f7
lda $d012-$f7,x
and inc_opcode:#$e8
//-p-a-g-e-b-r-e-a-k
bne inc_opcode
cmp $d012
bne newline_loop Unfortunately, rasline $00 (or $100 resp.) breaks this :(
Also gave it a try to utilise the operand of the [ldx #10] in your code as $0a -> asl, but it all ends up with >=15 bytes total netweight.
Anyway, thanks for sharing and "en passant" giving something to ponder about ;) |
| |
Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
I tried something like this:
start: ldx $d011
bpl start:
loop: pha
cpx $d012
pla
inc safe_mem,x
bcs loop:
But it can stabilize either on the PHA or on the INC safe_mem,x so no cigar. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1403 |
Like the mathematician's hypothetical can opener, let us assume we have 50 cycles worth of init code that we are happy to run as many as seven times over.
Then we can sync with just ten bytes of code, in at most seven frames.
sync:
lda $d012 ; will be zero on cycles 0,1,2,3,4,5 or 6
bne sync
.res 25,$ea ; replace this with 50 cycles of init code
lsr $d012 ; if it's zero, either we're too early on line 0, or we're on line 256..
bcc sync ; fall through if we read on cycle 62, 56 after cycle 6
We use lsr instead of lda for the second test, as lda would result in a 63 cycle loop. |
Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ... | 19 - Next |