Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in 
CSDb User Forums

Forums > C64 Coding > shortest CIA-stable raster
2009-04-04 14:39

Registered: May 2008
Posts: 205
shortest CIA-stable raster

<Post edited by moderator on 4/4-2009 14:47>

Hi, Guys :)

While preparing for compo I've developed maybe the shortest
CIA-type stable raster solution (fits in 64 bytes, 24 asm-rows).
If you can do even shorter, I'm curious :)

It works fine in practice, don't have to type novels to achieve
stable raster, and no need for raster-IRQ,CMPd012 method is enough.
If you find it useful for fast & short demo-writing, we may implement it
into codebase64.

;setting the CIA1-timerA to beam in the program beginning:

     sei                   ;we don't want lost cycles by IRQ calls :)
sync cmp $d012             ;scan for begin rasterline (A=$11 after first return)
     bne *-3       ;wait if not reached rasterline #$11 yet
     ldy #8        ;the walue for cia timer fetch & for y-delay loop
     sty $dc04     ;CIA Timer will count from 8,8 down to 7,6,5,4,3,2,1
     dey           ;Y=Y-1 (8 iterations: 7,6,5,4,3,2,1,0)
     bne *-1       ;loop needed to complete the poll-delay with 39 cycles
     sty $dc05     ;no need Hi-byte for timer at all (or it will mess up)
     sta $dc0e,y   ;forced restart of the timer to value 8 (set in dc04)
     lda #$11      ;value for d012 scan and for timerstart in dc0e
     cmp $d012     ;check if line ended (new line) or not (same line)
     sty $d015     ;switch off sprites, they eat cycles when fetched
     bne sync      ;if line changed after 63 cycles, resyncronize it!
     .... the rest (this is also a stable-timed point, can be used for sg.)

B;EXAMPLE-using timerA to stabilize 7 cycle jitter when using CMPd012:
scan ldx #$31    ;a good value that's not badline, in border and 1=white
     cpx $d012   ;scan rasterline
     bne *-3     ;wait until rasterline will be $31
     lda $dc04   ;check timer A, here it jitters between 7...1
     eor #7      ;A=7-A so jitter will be 0...6 in A
     sta corr+1  ;self-writing code, the bpl jump-address = A
corr bpl *+2     ;the jump to timer (A) dependent byte
     cmp #$c9    ;if A=0, cmp#$c9; if A=1, cmp #$c9 again 2 cycles later
     cmp #$c9    ;if A=2, cmp#$c9, if A=3, CMP #$EA 2 cycles later
     bit $ea24   ;if A=4,bit$ea24; if A=5, bit $ea, if A=6, only NOP

     stx $d020   ;x was 1 so border is white at the stable cycle
     sty $d020   ;y ended in 0 in sync routine, so border black after 4 cycles
     jmp scan    ;go to the raster again (or can go new raster)


Hermit Software Hungary
... 17 posts hidden. Click here to view all posts....
2009-04-05 00:15

Registered: Dec 2001
Posts: 195
Quote: Good news, the CIA setter-routine (first part) works well also for raster IRQ. :)
Set the IRQ then,
The form after IRQ entry (when IRQ calld) should be something like this:

lda $dc04
eor #7
sta *+4
bpl *+2
lda #$a9
lda #$a9
lda $eaa5
...the rest processes in the IRQ routine
asl $d019

I'll refresh Codebase64 with this info:)

Hermit Software Hungary

Hmm, the first part of your code can be "optimized" ;)

What about...

lda $dc04
bcs *+2
asr #$07
bcc *+6
bcs *+4
bne .end
bne *-2

eats up the same amount of cycles but saves one byte (unless you're running the code within the zp, which makes it even more unflexible)

Maybe one can even cut it down further by two bytes... it's tackling to try to get rid of one of these branch opcodes, but up to now I didn't see a way to do it.

2009-04-05 10:14

Registered: May 2008
Posts: 205
Good to see this approach, I was thinkig about a similar delayer with branch-commands but couldn't realize yet.
As I could, I avoided to use illegal opcodes, because some assemblers (or machine-types?) make it hardly, and also the beginners need a clear code to understand on Codebase64.

I've tried this routine and really works, and GREAT NEWS: no need for ASR#7, an LSR is pretty enough. Why? Because our CIA is counting only in 9 cycles, and at the LDA DC04 only 7..1 appears, no need to turn off any bits. 1 byte saved again.

lda $dc04
bcs *+2
bcc *+6
bcs *+4
bne .end
bne *-2

Although, If you can advise a C64 turbo assembler that accepts illegals, I would be happy.

Other idea is, we could do this bne-bcc-bcs..etc like delayer with a halving method, that may reduce steps..

My other approach is to load dc04 to X or Y, and make an indexed jump to a delay-routine. So no need to invert (EOR) the dc04 which is unfortunately counting BACK from 7 to 1 (8 to 0 (8) to be true). Or "JMP ($dc03)" method can be useful to reduce rows.

Hermit Software Hungary
2009-04-05 15:56

Registered: Dec 2001
Posts: 195
I really wonder how a simple LSR instead of ASR #$07 can work. The timing registers NEVER go down to #$00. After #$01, instead of counting down to #$00 the regs are directly reset to the initial value (here $3e most probably). So without masking, we can not make sure that the bne-commands work as intended.

Be careful with the jitter range: it is true that a (legal) RMW-Opcode eats max. 7 cycles, but in combination with branch-opcodes the number of cycles to be considered for jitter can be even longer. This also depends on page_breaks. I once started a thread here about this... will post a link later when I found it.

If desired I can send some acme-source code which clearly shows that there are 8 possible jitter states (value read by $dc04 goes from e.g. $10 to $17). You can do these tests yourself by experimenting with some code like

inc abs,x
bpl *-3
bmi *-5

as main routine.

The jmp ($dc03)-approach has already been done before. Ninja took the idea behind it to perfection. IIRC there was some article out there in VN.

2009-04-05 16:47

Registered: Dec 2001
Posts: 195
Me again,

the discussion I mentioned above was about Stable Raster via Timer.

2009-04-05 20:49

Registered: Jan 2002
Posts: 382
Nice to see such a coding thread on CSDb \o/

While nothing beats the experience gained by doing your own timing stuff, I see some practical issues with this routine:

- As Copyfault mentioned, there is not taken care of 8-cycle jitter.

- It was often useful to me to have a counter synced to a rasterline (i.e. counting 63 cycles not 9). That makes it easier to abuse it more than once IMHO. Might be a personal preference, though.

- This routine could need several frames to reach a stable raster. Some code generation or depacking might have happened meanwhile.

- If you want to be really short, your approach won't beat $d013-based techniques, I am afraid.

Still, nice to see you playing around with it. While I see the above issues, there are still nice ideas in this one.
2009-04-05 22:13

Registered: Dec 2001
Posts: 195
Hey Ninja, greetings my friend,

when seeing that you posted a reply here I first thought you found a way to further optimize this branch-approach...

Maybe this is the limit (considering the used bytes/cycles-ratio).

2010-11-26 16:44

Registered: May 2008
Posts: 205
I'd like to emphasize the fact what Copyfault notified me. The 8 cycles jitter is really something that we have to pay attention to...
I'm coding a program, and there were some weird things when the main program (out of the irq) executed more commands, than a simple jmp or so. (especially when irq loader started to operate).
I had to slightly modify the stableraster-waiter routine in the irq. Adding a new line of bit $ea24 seems to prevent any issues coming from 8 cycle jitter... (even 9)

lda $dc04 ;check timer A, here it jitters between 7...1
eor #7 ;A=7-A so jitter will be 0...6 in A
sta corr+1 ;self-writing code, the bpl jump-address = A
corr bpl *+2 ;the jump to timer (A) dependent byte
cmp #$c9 ;if A=0, cmp#$c9; if A=1, cmp #$c9 again 2 cycles later
cmp #$c9 ;if A=2, cmp#$c9, if A=3, CMP #$EA 2 cycles later
bit $ea24 ;if A=4,bit$ea24; if A=5, bit $ea, if A=6, only NOP
bit $ea24 ;IMPORTANT to handle 8th cycle jitter

Not a big effort though, but from now I have to start writing stableraster-irq keeping this in mind.
(Not the best solution, but a simple NOP did the work too, however that may not be as stable in 8th cycle..)
Thanks for telling me this 8-9 cycle jitter thingy..

Hermit Software Hungary
2011-03-26 00:54

Registered: Oct 2010
Posts: 75
I'm going to try to develop a mathematical proof of the shortest/quickest cde.
According to http://visual6502.org/wiki/index.php?title=6502_all_256_Opcodes
We have these possibilities:
1 byte: 2-4 cycles
2 bytes: 2-6, 8 cycles
3 bytes: 4-7 cycles

And we are trying to create 8 delay states.
Now here's the table of delay states:
1 byte: 3 states (2, 3 or 4 cycles)
2 byte: 5 states (2, 3, 4, 5, or 6)*
3 byte: 4 states (4, 5, 6, or 7 cycles)
*We'll have to special case this later for the 8 cycles instructions;
there's no way to use two of these to get 15 cycles.
You can see that 1 byte instructions are most efficient for consuming
Combining instructions doesn't double the states! I think of it this way;
at the longest delay, each instruction has the same number
of cycles, which loses a state possibility. The formula is:
total states=(state)*(n)-(n-1), where n is the number of times the instruction
is repeated, and state is the number of cycles possibilities it has.
Here's a table:
states n total
3 1 3
3 2 5
3 3 7
3 4 9
5 1 5
5 2 9
4 1 4
4 2 7
4 3 10

So which combinations gives at least 8 states?
size n total states total bytes
1 4 9 4
2 2 9 4
3 3 10 9
In this table, size is the bytes in the opcode, n is the number of times
an instruction of that length appears in a row.
But 4 bytes isn't the best because we're overdoing it, if we use
a combination of 2 byte and 1 byte opcodes can we get 8 states in less
Combining the 1 byte and 2 byte opcodes, we get (2,3,4)cycles+(2,3,4,5,6)cycles
which is 4-10cycles or by our formula, (3 states+5 states)-(2-1)=7 states.
This isn't quite enough. However, now we consider the special case of 8 cycles,
which turns out to be very special!
We get (2,3,4)cycles+(2,3,4,5,6,8)cycles=4-12 cycles or 9 states.
Notice that there is just enough overlap here, i.e. (4+6)=10cycles and
(3+8)=11 cycles and (4+8)=12 cycles.
So theoretically, we can use just 3 bytes to write a delay between 4 to 12
These formulas can also be used for e.g. the Z80 in the C128 to set a limit
on optimized code.

The Table of Fixed Delays
So how does this help us write the shortest delay routine? The only use
for this in short code is to use it with a computed branch. The obviously
only way to do it with with something like:
lda timer1;4 cycles
asl;2 cycles
sta *+3;4 cycles
bne *+2; selfmod to branch into delay fragments (3 cycles)
xxx;3 state opcode
xxx;6 state opcode (4-12 cycles)
bne continue;do raster processing (3 cycles)
This is obviously horrible for code size but for quickest sync it's promising;
it's 20-28 cycles.

The "multi-threaded" code trick
Using BIT followed by data which happens to be a valid opcode, you can do
a computed branch into coincidently up to 3 different instructions giving
you 3 states in 3 bytes. This seems obviously the most efficient way to make
a short delay; you are trading multiple custom code fragments for a single
code thread array. The code array is indexed by a byte at a time so it
can only consume one state. A first estimate of such techniques is 8 bytes
to consume 8 states. While it saves memory, it can't possibly sync quicker
than the delay above.

The Computed Delay Trick
The last way to make a delay is by calculations which take varying amounts
of time. You can use branches to add 1 cycle of delay based on each of
the flags, N, Z, and C. This can create 4 states of delay. We'll have
to do another calculation to create more delay states. It should take
two calculations and a whole bunch of branches to do it.

Combining Methods
What if you used something like bne $EA to combine threading with computation?
I think you've squeezed an extra state in there somewhere. I think this
might work but your code would be scattered all over the place; it's still
valid though as you can write other code in between.

I haven't fully worked out all the ideas but I believe I've generalized the

This reminds me of the 3 or 4 ways of speed optimizing; you can make a loop,
unroll a loop, use "decision tree optimization" where every decision leads
to it's own code fragment, or of course the easier way of doing it which is
a table of subroutines for every possible argument. This is a way to
make a two argument table but with less memory.
Has anyone made a multiply routine for every possible multiplier? I looked
at this and it's about 32 cycles max and sometimes quite less, much faster
than the table of squares method.
2011-03-28 00:37

Registered: Dec 2001
Posts: 195
@Repose: what exactly do you want to prove? The smallest number of bytes needed for a de-jitter-routine?

If scattering of routine fragments is allowed I guess Ninja's approach used in his 2x2-FLI-routines is the shortest possible.

Maybe I didn't fully get the idea behind your lines but don't we need smth like 'axomatic semantics' to do a correct mathematical proof?
2011-07-19 16:00

Registered: Feb 2003
Posts: 423
Quoting name

lda $dc04 ;check timer A, here it jitters between 7...1
eor #7 ;A=7-A so jitter will be 0...6 in A
sta corr+1 ;self-writing code, the bpl jump-address = A
corr bpl *+2 ;the jump to timer (A) dependent byte
cmp #$c9 ;if A=0, cmp#$c9; if A=1, cmp #$c9 again 2 cycles later
cmp #$c9 ;if A=2, cmp#$c9, if A=3, CMP #$EA 2 cycles later
bit $ea24 ;if A=4,bit$ea24; if A=5, bit $ea, if A=6, only NOP
bit $ea24 ;IMPORTANT to handle 8th cycle jitter

@Hermit: this code doesn't patch the previous one when encoutering the 8-cycle jitter. Just check this: $dc04=8, EOR #7 produces $0f and with bpl you end up out of your code.
Previous - 1 | 2 | 3 | 4 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Users Online
Leon/Singular Crew /..
Mr. Mouse/XeNTaX
Guests online: 48
Top Demos
1 Uncensored  (9.7)
2 The Shores of Reflec..  (9.7)
3 Edge of Disgrace  (9.7)
4 Coma Light 13  (9.6)
5 Lunatico  (9.6)
6 Comaland 100%  (9.6)
7 Incoherent Nightmare  (9.5)
8 Wonderland XII  (9.5)
9 Comaland  (9.5)
10 Wonderland XIII  (9.5)
Top onefile Demos
1 Dawnfall V1.1  (9.5)
2 Daah, Those Acid Pil..  (9.4)
3 Treu Love [reu]  (9.4)
4 Dawnfall  (9.3)
5 Tunnel Vision  (9.3)
6 One-Der  (9.2)
7 Goatbeard  (9.2)
8 Globe 2016 [reu]  (9.2)
9 Hardware Accelerated..  (9.2)
10 Safe VSP  (9.1)
Top Groups
1 Booze Design  (9.4)
2 Oxyron  (9.4)
3 Censor Design  (9.4)
4 Crest  (9.3)
5 SHAPE  (9.2)
Top Mega Swappers
1 Nightshade  (9.4)
2 Aslive  (9.3)
3 Calypso  (9.2)
4 Dishy  (9.0)
5 R.C.S.  (8.8)

Home - Disclaimer
Copyright © No Name 2001-2017
Page generated in: 0.705 sec.