| | Krill
Registered: Apr 2002 Posts: 2980 |
Shortest code for stable raster timer setup
While working on my ICC 2019 4K entry (now postponed to ICC 2020, but i hope it'll be worth the wait), i came up with this (14 bytes):initstabilise lda $d012
ldx #10 ; 2
- dex ; (10 * 5) + 4
bpl - ; 54
nop ; 2
eor $d012 - $ff,x; 5 = 63
bne initstabilise; 7 = 70
[...]; timer setup The idea is to loop until the same current raster line is read at the very beginning (first cycle) and at the very end (last cycle) of a raster line, implying 0 cycles jitter.
With 63 cycles per line on PAL, the delay between the reads must be 63 cycles (and not 62), reading $d012 at cycle 0 and cycle 63 of a video frame's last line (311), which is one cycle longer due to the vertical retrace.
The downside is that effectively only one line per video frame is attempted, so the loop may take a few frames to terminate, and the worst case is somewhere just beyond 1 second.
The upside is that it always comes out at the same X raster position AND raster line (0), plus it leaves with accu = 0 and X = $ff, which can be economically re-used for further init code.
Now, is there an even shorter approach, or at least a same-size solution without the possibly-long wait drawback? |
|
... 177 posts hidden. Click here to view all posts.... |
| | Copyfault
Registered: Dec 2001 Posts: 478 |
Quoting Rastah BarNice, but it would be quite a coincidence that you would need exactly these presettings in the rest of the intro or demo. Perhaps there are ZP adresses that normally (I mean, after a cold start), have the required values. Did not dig deeper through the default zp settings, but since the sync-loop must be started by jumping inside, calling it after decrunching is mandatory more or less. So why not establish some special vector settings;))?
And though other combinations are possible, it's all quite rigid and every variant needs extra checks asf. Getting rid of the vectors completely would be awesome (without mem constraints & 7 bytes in total), but well - this whole problem had a black hole effect for far too long on my mind... and obviously still has :( |
| | Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
I know almost nothing about decrunchers, so I don't have a clue what they can do in terms of "initial conditions" of ZP-adresses or registers, etc.
If they can, for example, give you a desired value of X and Y (without increasing net code size), then perhaps a 6-byte loop is possible with something like this:
shx $HH00,y
BYTE any_value
bne *-4
The code location and $HH should be such that X & {H+1} is the opcode for instructions like TXA, TYA, while X should contain the opcode for a 3-byte instruction.
So without DMA, the byte after the SHX instruction is replaced by e.g. TYA ensuring the branch is taken, and with DMA the loop exits with the 3-byte instruction whose opcode was in X. But this is stretching it really far! |
| | Copyfault
Registered: Dec 2001 Posts: 478 |
Quoting Rastah BarI know almost nothing about decrunchers, so I don't have a clue what they can do in terms of "initial conditions" of ZP-adresses or registers, etc. With "after decrunch" I just mean that the all memory is initialised with values as needed and that the jump to whatever starting point belongs to the decruncher code.
Quoting Rastah BarIf they can, for example, give you a desired value of X and Y (without increasing net code size), then perhaps a 6-byte loop is possible with something like this:
shx $HH00,y
BYTE any_value
bne *-4
The code location and $HH should be such that X & {H+1} is the opcode for instructions like TXA, TYA, while X should contain the opcode for a 3-byte instruction.
So without DMA, the byte after the SHX instruction is replaced by e.g. TYA ensuring the branch is taken, and with DMA the loop exits with the 3-byte instruction whose opcode was in X. But this is stretching it really far! Yes this should work. But you're right, it's really shifting *a lot* of preparations to the reign of decruncher & init code. Still quite doable I think. Time to dig out the shortest-code-medal and polish it for the new owner;) |
| | Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
The code cannot be freely placed in memory, so you may keep that medal :-)
One example (there are probably a lot more):
X = $38 (opcode for SEC)
SHX $HH00,Y
CLC
BCC *-4
HH can be $17..$1E, $57..$5E, $97..9E, $D7..$DE. Without DMA, the CLC (opcode $18) does not change, with DMA it is replaced with SEC. |
| | Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
Quote: The STA $ZP instruction (see post #44) can be made part of the init code, which reduces the timer-based stabilization approach to effectively 10 bytes:
ldy #init_value ;Init code
sync: lax $dc04
sbx #51
sty ZP ;RRW instruction. Part of init code.
cpx $dc04
bne sync:
STY ABS is also allowed, in combination with SBX #52.
If I'm not mistaken, this should work on PAL, NTSC, and DREAN, but the loop exit cycle may depend on the system.
Correction: STY ABS is not guaranteed to lock(*), but STY ZP is, on all models (PAL, old and new NTSC, DREAN).
(*) Unless the border saves it, but I still have to check that. |
| | Copyfault
Registered: Dec 2001 Posts: 478 |
Quoting Rastah BarQuoting CopyfaultQuoting Rastah BarIf they can, for example, give you a desired value of X and Y (without increasing net code size), then perhaps a 6-byte loop is possible with something like this:
shx $HH00,y
BYTE any_value
bne *-4
[...] Yes this should work. But you're right, it's really shifting *a lot* of preparations to the reign of decruncher & init code. Still quite doable I think. Time to dig out the shortest-code-medal and polish it for the new owner;) The code cannot be freely placed in memory, so you may keep that medal :-)[...] Well, in large parts it's the same as what I proposed in post#86. But now that we entered the territory of over-stretching, here a version that does it in only 5 bytes (again putting all required reg settings on the decruncher's bill) :
$fdfc 9E D0 FD shx $fdd0,y
$fdff D0 FC bne $fdfd
Comes with all constraints one could think of: mem loc fixed, y=$2f fixed val mandatory, x=$d1 fixed val mandatory, setting of vector $fc/$fd has influence on the no. of cycle that are taken when the loop is left, to be started with z-flag=0, ... maybe more! Ok. it's possible to do it with any branch-opcode, but this doesn't really make it any better;) |
| | Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
Quoting CopyfaultWell, in large parts it's the same as what I proposed in post#86.
Yes, you are right. It's also very similar to what I wrote in post#71. I lost track a bit of all the variants.
Quote:
But now that we entered the territory of over-stretching, here a version that does it in only 5 bytes (again putting all required reg settings on the decruncher's bill) :
$fdfc 9E D0 FD shx $fdd0,y
$fdff D0 FC bne $fdfd
Comes with all constraints one could think of: mem loc fixed, y=$2f fixed val mandatory, x=$d1 fixed val mandatory, setting of vector $fc/$fd has influence on the no. of cycle that are taken when the loop is left, to be started with z-flag=0, ... maybe more! Ok. it's possible to do it with any branch-opcode, but this doesn't really make it any better;)
Very ingenious, but an 8-cycle loop doesn't work, doesn't it? See post #61. |
| | Copyfault
Registered: Dec 2001 Posts: 478 |
Quoting Rastah BarQuoting CopyfaultWell, in large parts it's the same as what I proposed in post#86.
Yes, you are right. I lost track a bit of all the variants.
Quote:
But now that we entered the territory of over-stretching, here a version that does it in only 5 bytes (again putting all required reg settings on the decruncher's bill) :
$fdfc 9E D0 FD shx $fdd0,y
$fdff D0 FC bne $fdfd
Comes with all constraints one could think of: mem loc fixed, y=$2f fixed val mandatory, x=$d1 fixed val mandatory, setting of vector $fc/$fd has influence on the no. of cycle that are taken when the loop is left, to be started with z-flag=0, ... maybe more! Ok. it's possible to do it with any branch-opcode, but this doesn't really make it any better;)
Very ingenious, but an 8-cycle loop doesn't work, doesn't it? See post #61. It's actually a 12-cycle loop, cause the first branch is 4-cycles long (page-break!), the branch in the operand of the SHX takes 3 cycles and the SHX itself 5 -> 12 cycles in total;)
It could even be done with just 4 bytes (continuing the abuse of the byte-counting):
loop: sha (vec),y
bne loop
If this is located at the end of a page s.t. the BNE comes with a pb, it's a 10-cycle-loop in total.
Still, too far-fetched, too many things must be configured correctly. Personally, I think the 7-bytes-solution (as in post#110) that "only" comes with requirements on zp-values set in a special way is the best compromise between flexibility and byte-count! |
| | Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
Quoting CopyfaultQuoting Rastah Bar
Very ingenious, but an 8-cycle loop doesn't work, doesn't it? See post #61. Quote:It's actually a 12-cycle loop, cause the first branch is 4-cycles long (page-break!), the branch in the operand of the SHX takes 3 cycles and the SHX itself 5 -> 12 cycles in total;)
Yes, I misread the branch. I thought it was to $FDFC.
Quote:
It could even be done with just 4 bytes (continuing the abuse of the byte-counting):
loop: sha (vec),y
bne loop
If this is located at the end of a page s.t. the BNE comes with a pb, it's a 10-cycle-loop in total.
Awesome! With SHA(vec),y even 3 bytes is possible for a 12-cycle loop. One example:
$5f00 SHA (VEC),y
$5f02 RTS
If we assume that the decruncher provides the following initial conditions: {A&X} = $EA (opcode of NOP), Y = 2, the ZP addresses VEC and VEC+1 point to $5F00 and the stack is completely filled with the return address $5F00. Without DMA the SHA writes $EA & {$5F+1} = $60 (opcode for RTS) and repeats that until a DMA makes it write an NOP.
Quote:
Still, too far-fetched, too many things must be configured correctly. Personally, I think the 7-bytes-solution (as in post#110) that "only" comes with requirements on zp-values set in a special way is the best compromise between flexibility and byte-count!
I'll leave that judgement to the people who want to use any of the variants. |
| | Copyfault
Registered: Dec 2001 Posts: 478 |
Quoting Rastah Bar
Awesome! With SHA(vec),y even 3 bytes is possible for a 12-cycle loop. One example:
$5f00 SHA (VEC),y
$5f02 RTS
If we assume that the decruncher provides the following initial conditions: {A&X} = $EA (opcode of NOP), Y = 2, the ZP addresses VEC and VEC+1 point to $5F00 and the stack is completely filled with the return address $5F00. Without DMA the SHA writes $EA & {$5F+1} = $60 (opcode for RTS) and repeats that until a DMA makes it write an NOP.
Yeah, already told you that I like this approach for its level of insanity alone :)) Maybe instead of $5f00 one could choose $5f5f as "start adress" so the whole stack can be filled with the same byte and no matter at which position the SP will be, it will always return to the right spot! |
Previous - 1 | ... | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | ... | 19 - Next | |