| | Hermit
Registered: May 2008 Posts: 208 |
shortest CIA-stable raster
<Post edited by moderator on 4/4-2009 14:47>
Hi, Guys :)
While preparing for compo I've developed maybe the shortest
CIA-type stable raster solution (fits in 64 bytes, 24 asm-rows).
If you can do even shorter, I'm curious :)
It works fine in practice, don't have to type novels to achieve
stable raster, and no need for raster-IRQ,CMPd012 method is enough.
If you find it useful for fast & short demo-writing, we may implement it
into codebase64.
;setting the CIA1-timerA to beam in the program beginning:
-----------------------------------------------------------
sei ;we don't want lost cycles by IRQ calls :)
sync cmp $d012 ;scan for begin rasterline (A=$11 after first return)
bne *-3 ;wait if not reached rasterline #$11 yet
ldy #8 ;the walue for cia timer fetch & for y-delay loop
sty $dc04 ;CIA Timer will count from 8,8 down to 7,6,5,4,3,2,1
dey ;Y=Y-1 (8 iterations: 7,6,5,4,3,2,1,0)
bne *-1 ;loop needed to complete the poll-delay with 39 cycles
sty $dc05 ;no need Hi-byte for timer at all (or it will mess up)
sta $dc0e,y ;forced restart of the timer to value 8 (set in dc04)
lda #$11 ;value for d012 scan and for timerstart in dc0e
cmp $d012 ;check if line ended (new line) or not (same line)
sty $d015 ;switch off sprites, they eat cycles when fetched
bne sync ;if line changed after 63 cycles, resyncronize it!
.... the rest (this is also a stable-timed point, can be used for sg.)
B;EXAMPLE-using timerA to stabilize 7 cycle jitter when using CMPd012:
-----------------------------------------------------------------------
scan ldx #$31 ;a good value that's not badline, in border and 1=white
cpx $d012 ;scan rasterline
bne *-3 ;wait until rasterline will be $31
lda $dc04 ;check timer A, here it jitters between 7...1
eor #7 ;A=7-A so jitter will be 0...6 in A
sta corr+1 ;self-writing code, the bpl jump-address = A
corr bpl *+2 ;the jump to timer (A) dependent byte
cmp #$c9 ;if A=0, cmp#$c9; if A=1, cmp #$c9 again 2 cycles later
cmp #$c9 ;if A=2, cmp#$c9, if A=3, CMP #$EA 2 cycles later
bit $ea24 ;if A=4,bit$ea24; if A=5, bit $ea, if A=6, only NOP
stx $d020 ;x was 1 so border is white at the stable cycle
sty $d020 ;y ended in 0 in sync routine, so border black after 4 cycles
jmp scan ;go to the raster again (or can go new raster)
-----------------------------------------------------------------------
Opinions?
Hermit Software Hungary |
|
... 20 posts hidden. Click here to view all posts.... |
| | spider-j
Registered: Oct 2004 Posts: 498 |
Sorry to dig this up, but I also stumbled over this lda $dc04/$dd04 returns $08 and therefore eor #7 produces $0f "thingy". I did some experiments with NMI (CIA2 TIMER B counting PAL cycles and TIMER A to "stabilize") and Krill loader instead of IRQ and made a "dirty fix" like this to help myself:
pha
lda $dd04
eor #7
sta *+4
bpl *+2
cmp #$c9
cmp #$c9
bit $ea24
bit $ea24
jmp *+8
nop
nop
jmp *+3
txa
pha
tya
pha
Yes, I know it's a lot more bytes & cycles "wasted". I just saw while linking Trafolta that Achim used this code snippet and played around a little bit.
Maybe someone who is a bit more creative than me should update the codebase64 page with a proper solution. Or at least there should be some kind of warning ... or whatsoever... |
| | Repose
Registered: Oct 2010 Posts: 225 |
lol so I come back exactly 6 years later and this thread is still going.
What I meant was, I wanted to answer two questions, 1) What is a methodical approach to finding the shortest or quickest sync code 2) how to tell if you've found the best possible solution.
Instead of playing around with ideas and guessing, I was thoroughly going through every opcode to see how they could be combined to make various delays. By using that method, it can be proven the best way to do this and then say it's done and forever. I guess no one really understood it, but I still found my own post very interesting of course it makes sense to me.
I'll have to look over the latest proposed segments and decide if I feel any of them are probably the last answer. |
| | Repose
Registered: Oct 2010 Posts: 225 |
Ok so the conclusion of my msg is, "we can use just 3 bytes to write a delay between 4 to 12 cycles". I mean two instructions of the right type, a one byte and two byte one, together can add up to any possibility of time from 4 (nop:nop) to 12 (unspecified 4 cycle and 8 cycle instruction) cycles.
If we jump into each segment, that's one approach.
So what I'm saying is, for that approach this is the smallest code you could ever write.
Then I give two other approaches that could be shorter, but not faster overall (if that's important). |
| | ChristopherJam
Registered: Aug 2004 Posts: 1408 |
Repose, so it looks like you were searching for the smallest number of bytes for which there are a set of at least eight routines of that length that between them cover a run of eight different durations?
Nine delay states easily confirmable in Python thusly:
>>> b1={2,3,4}
>>> b2={2,3,4,5,6,8}
>>> set([x+y for x in b1 for y in b2])
{4, 5, 6, 7, 8, 9, 10, 11, 12}
I'm not clear how closely related that is to finding the shortest possible anti jitter routine mind, as multiple fragments would ordinarily introduce the cost of dispatching to them and returning. (also note that the jmp ($dc03) approach doesn't require same-length delay routines)
It does raise the entertaining possibility of this construct, mind:
ldy $dc04
lda frag1,y
sta rna+1
lda frag2,y
sta rna+2
rna:
.byt 0,0,0
24 to 32 cycles, depending on the content of $dc04. A 9 cycle timer wouldn't work because of the duplicated-8 issue, but a 63 cycle timer should be fine if the alignment is appropriate (alternately, avoid undefined opcodes in main). |
| | lft
Registered: Jul 2007 Posts: 369 |
That is an excellent idea, and it can be taken further!
We can do something like this:
ldy $dc04
lda opcodes,y
sta mod
mod .byt $00,$1b,$a9,$13,$ea
; 18-27 cycles later...
So that's six cycles faster, and one more cycle of jitter supported.
Here is a clarifying table:
y code cycles trashes
1 a9 1b|a9 13|ea 6
2 a5 1b|a9 13|ea 7
3 b5 1b|a9 13|ea 8
4 06 1b|a9 13|ea 9 1b
5 a1 1b|a9 13|ea 10
6 ea|1b a9 13|ea 11 13a9,y
7 ad 1b a9|13 ea 12 (ea),y
8 99 1b a9|13 ea 13 a91b,y (ea),y
9 0e 1b a9|13 ea 14 a91b (ea),y
10 1b 1b a9|13 ea 15 a91b,y (ea),y
This can be varied according to taste, to trash different memory locations. Note that the value of Y is known, so the exact address of the trashed location is also known. |
| | ChristopherJam
Registered: Aug 2004 Posts: 1408 |
Ooh, very nice indeed.
I guess the next question is, what's the fewest cycles required for each jitter length; lft's 18 cycle minimum is likely optimal for ten jitter states; if there's only two (eg we know we're interrupting NOPs) one could just
ldy $dc04
lda $xxnn,y
(eight or nine cycles, depending whether $dc04 is greater than 255-nn)
I'm not wrapping my brain around the sets of bcX *+n above at the moment; it's been a long day. (oh, and I should have been doing STA to rna+0 and rna+1 two comments ago too, but you've probably guessed that already) |
| | Repose
Registered: Oct 2010 Posts: 225 |
Apparently I worked on this 5 years ago.
http://forum.6502.org/viewtopic.php?p=18148#18148
This does 14+A in the range A=(1,8) or 13 with A=0.
;A=1..8
*=$1000
clc
adc #$ff-8;A=8-A so result will be 7
0 in A
eor #$ff
sta corr+1 ;self-writing code, the bpl jump-address = A
corr bpl *+2 ;the jump to (A) dependent byte (13 cycles so far)
cmp #$c9 ;A=8->A=0->BPL +2
cmp #$c9 ;
cmp #$c9 ;
cmp $ea ;3 =9 (13+9=22 max delay)
Nothing innovative, just different idea. |
Previous - 1 | 2 | 3 - Next | |