| |
Krill
Registered: Apr 2002 Posts: 2980 |
Shortest code for stable raster timer setup
While working on my ICC 2019 4K entry (now postponed to ICC 2020, but i hope it'll be worth the wait), i came up with this (14 bytes):initstabilise lda $d012
ldx #10 ; 2
- dex ; (10 * 5) + 4
bpl - ; 54
nop ; 2
eor $d012 - $ff,x; 5 = 63
bne initstabilise; 7 = 70
[...]; timer setup The idea is to loop until the same current raster line is read at the very beginning (first cycle) and at the very end (last cycle) of a raster line, implying 0 cycles jitter.
With 63 cycles per line on PAL, the delay between the reads must be 63 cycles (and not 62), reading $d012 at cycle 0 and cycle 63 of a video frame's last line (311), which is one cycle longer due to the vertical retrace.
The downside is that effectively only one line per video frame is attempted, so the loop may take a few frames to terminate, and the worst case is somewhere just beyond 1 second.
The upside is that it always comes out at the same X raster position AND raster line (0), plus it leaves with accu = 0 and X = $ff, which can be economically re-used for further init code.
Now, is there an even shorter approach, or at least a same-size solution without the possibly-long wait drawback? |
|
... 177 posts hidden. Click here to view all posts.... |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: This is an idea I got after talking to Copyfault.
At least in the cycle-correct version of Vice (i.e., x64sc) this seems to work. Haven't tried on a real machine.
* = $0f00 ; Some address with (H+1)&1 = 0 and (H+1)&$10 = $10
ldy #$00
loop: ldx #$11
shx cont, y
cont: bpl loop
It uses the fact that we will AND the written value with H+1 unless a badline pauses the CPU between the third and fourth cycle of shx. The latter then changes the "bpl" into an "ora" and drops us out of the loop at horizontal position 61.
Haha! Wow! |
| |
Burglar
Registered: Dec 2004 Posts: 1101 |
Quoting Quiss
* = $0f00 ; Some address with (H+1)&1 = 0 and (H+1)&$10 = $10
ldy #$00
loop: ldx #$11
shx cont, y
cont: bpl loop
wait what?? I need to look up SHX... at first glance this does not make any sense to me :) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Holy shit, that’s brilliant! Well found. |
| |
Burglar
Registered: Dec 2004 Posts: 1101 |
even Crossbow cannot beat this! |
| |
chatGPZ
Registered: Dec 2001 Posts: 11386 |
i so have to steal this and use as an example in my pdf :)
edit: quick test on C64 confirms it works :) |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: i so have to steal this and use as an example in my pdf :)
edit: quick test on C64 confirms it works :)
So at a controlled X pos but at a ”random” y*8+c pos depending on $d011, which is good enough to launch a 63c timer ofc. |
| |
TWW
Registered: Jul 2009 Posts: 545 |
Quote: This is an idea I got after talking to Copyfault.
At least in the cycle-correct version of Vice (i.e., x64sc) this seems to work. Haven't tried on a real machine.
* = $0f00 ; Some address with (H+1)&1 = 0 and (H+1)&$10 = $10
ldy #$00
loop: ldx #$11
shx cont, y
cont: bpl loop
It uses the fact that we will AND the written value with H+1 unless a badline pauses the CPU between the third and fourth cycle of shx. The latter then changes the "bpl" into an "ora" and drops us out of the loop at horizontal position 61.
Damn, nice one. |
| |
Copyfault
Registered: Dec 2001 Posts: 478 |
Quoting QuissThis is an idea I got after talking to Copyfault.
At least in the cycle-correct version of Vice (i.e., x64sc) this seems to work. Haven't tried on a real machine.
* = $0f00 ; Some address with (H+1)&1 = 0 and (H+1)&$10 = $10
ldy #$00
loop: ldx #$11
shx cont, y
cont: bpl loop
It uses the fact that we will AND the written value with H+1 unless a badline pauses the CPU between the third and fourth cycle of shx. The latter then changes the "bpl" into an "ora" and drops us out of the loop at horizontal position 61.
Lovely!!! Quiss, I knew you will come up with exactly this kind of brilliance sooner or later. Sooo good to have you back;)
If you want to "overdo" (optimize, erm) this, let's save another 2 bytes:
* = $0faa ; _a very nice_ address with (H+1)&1 = 0 and (H+1)&$10 = $10
0FAA loop: ldx #$11
0FAC shx cont, y
0FAF cont: bpl loop
with start adress $0FAD (you guess the operand bytes of the SHX ; ))
Branching directly to the SHX-opcode should also work (8-cycle loop instead of 10-cycle loop, both coprime to 63), though I'm not sure which one will be faster.
Only "drawback" is that you do not know at which raster position you end up, only that it will be (at the very end of) a badline. Not too bad for my taste :)) |
| |
Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
Quoting Copyfault
If you want to "overdo" (optimize, erm) this, let's save another 2 bytes:
* = $0faa ; _a very nice_ address with (H+1)&1 = 0 and (H+1)&$10 = $10
0FAA loop: ldx #$11
0FAC shx cont, y
0FAF cont: bpl loop
with start adress $0FAD (you guess the operand bytes of the SHX ; ))
with start adress $0FAD (you guess the operand bytes of the SHX ; ))
Interesting idea, but I do not completely understand it. How does the Y register get the right value?
I was thinking about possibly saving one byte, if one could find a suitable start address and a ZP location with the right contents after entering from basic
* = $???? ;magic address that allows us to save 1 byte
lax ZP ;another one of those magic addresses
tay
loop: shx cont,y
cont: bpl loop:
There might also exist variations where you let the SHX instruction change itself or change the value after the BPL into f.e. 0 (or another suitable value). |
| |
Quiss
Registered: Nov 2016 Posts: 43 |
Neat! Right, no reason to make those two address bytes go to waste. :)
Another amusing thing to contemplate is how this code could be placed at, say, $08xx. Preferably without messing up the basic upstart.
Also, careful with the loop length. The number of CPU cycles between two badlines is 461, except when the loop's one write cycle (last cycle of SHX) sneaks into the three cycle RDY grace period. Then it's 462 ticks.
(Imagine a graph with n nodes, in which node i is connected to node (i+461)%n for 0 < i < n-1 and to (i+462)%n for i = 0. Node n-1 isn't connected to anything. You want that graph to be acyclic.)
In the range 5-20, the lengths that do work are 5, 10, 12, 16, 18 and 19. But note that in particular, length 8 (a.k.a. branching directly to the SHX) does not. |
Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 19 - Next |