| |
TWW
Registered: Jul 2009 Posts: 545 |
Timing Challenge
Hello everyone. Figured I'd give you a small challenge before the weekend in terms of making a timing delay routine with the least amount of bytes and without destroying any registers for the following amounts of cycles:
26 cycles delay:
pha // 3
pha // 3
pha // 3
nop // 2
bit $00 // 3
pla // 4
pla // 4
pla // 4 <- 26 cycles | 9 bytes
27 cycles delay:
pha // 3
pha // 3
pha // 3
nop // 2
nop // 2
nop // 2
pla // 4
pla // 4
pla // 4 <- 27 cycles | 9 bytes
31 cycles delay:
pha // 3
lda #%00001000 // 2
lsr // 2 2 2 2
bcc *-1 // 3 3 3 2
bit $00 // 3
pla // 4 <- 31 cycles | 9 bytes
32 cycles delay:
pha // 3
lda #%00001000 // 2
lsr // 2 2 2 2
bcc *-1 // 3 3 3 2
nop // 2
nop // 2
pla // 4 <- 32 cycles | 9 bytes
31 cycles delay:
pha // 3
lda #%00001000 // 2
lsr // 2 2 2 2
bcc *-1 // 3 3 3 2
nop // 2
nop // 2
nop // 2
pla // 4 <- 34 cycles | 10 bytes
I have posted my solutions for reference and hope to see creative (hehe got it?) solutions ;-) Have a nice weekend^^ |
|
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Obviosly SP, Flags and stack may be clobbered |
| |
soci
Registered: Sep 2003 Posts: 480 |
.1000 20 03 20 jsr wait26
.2003 20 07 20 wait26 jsr +
.2006 ea nop
.2007 60 + rts
|
| |
soci
Registered: Sep 2003 Posts: 480 |
.1000 18 wait27 clc
.1001 b0 gcc +
.1002 38 - sec
.1003 08 + php
.1004 28 plp
.1005 90 fb bcc -
.1007 ea nop
|
| |
soci
Registered: Sep 2003 Posts: 480 |
.1000 20 07 28 jsr $2807 jsr wait31
.2807 20 08 28 jsr $2808 wait31 jsr wait31+1
.280a 60 rts rts
Same idea can be used to shorten the 27 cycle delay above:
.1000 20 23 28 jsr $2823 jsr wait27
.2823 20 24 28 jsr $2824 wait27 jsr wait27+1
.2826 60 rts rts
Or the 26 cycle one:
.1000 20 c8 28 jsr $28c8 jsr wait26
.28c8 20 c9 28 jsr $28c9 wait26 jsr wait26+1
.28cb 60 rts rts
Etc. |
| |
Bob
Registered: Nov 2002 Posts: 71 |
cool... but what is the point for throwing away cycles ?
I'll rather use them ;) |
| |
TWW
Registered: Jul 2009 Posts: 545 |
Quoting JackAsserObviosly SP, Flags and stack may be clobbered
Sorry, should have been more specific, Flags can be messed with although the SP should be intact but the stack may be used as long as SP is restored upon exit. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Standard opcodes only, or are illegals allowed?
Can the routine require a page crossing at a given point within the sequence, or should we assume the sequence is all on the same page? |
| |
TWW
Registered: Jul 2009 Posts: 545 |
IOPs and page alignment trickery allowed ;-) |
| |
HCL
Registered: Feb 2003 Posts: 728 |
I would go for Soci:s method, or something with JMP to share the code. Like..:
Wait10
jmp Wait7
Wait9
jmp Wait6
Wait8
jmp Wait5
Wait7
jmp Wait4
Wait6
jmp Wait3
Wait5
jmp Wait2
Wait4
nop
Wait2
nop
rts
Wait3
bit 0
rts
|
| |
Trash
Registered: Jan 2002 Posts: 122 |
I wouldnt consider size at all...
wait63 .byte $c9 ; cmp#
wait62 .byte $c9 ; cmp#
...
wait3 .byte $24 ; bit $
wait2 .byte $ea ; nop
rts ; 63 bytes for all delays between 69 and 8 cycles
...unless I had a really specific case |
| |
soci
Registered: Sep 2003 Posts: 480 |
Exactly, just JSR into the slide and each new delay costs only 3 bytes. For large enough number of different delays it might worth it. |
| |
lft
Registered: Jul 2007 Posts: 369 |
What if we disallow stack usage? Then the JSR approach doesn't work anymore. Is that an interesting challenge? |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: What if we disallow stack usage? Then the JSR approach doesn't work anymore. Is that an interesting challenge?
27 cycle delay in 10 bytes, no stack usage:
stx :++ +1
ldx #4
:dex
bne :-
:ldx #0
|
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
1 rasterline is about ldx #7-8. memories from 26 years ago :) |
| |
HCL
Registered: Feb 2003 Posts: 728 |
It is somewhat unclear what the challenge really is.. except for wasting cycles in different ways :) |
| |
Pex Mahoney Tufvesson
Registered: Sep 2003 Posts: 52 |
I would use the REU for doing nothing. At program init, write #0 to $df08, then assume number of (cycles-10) we want to waste in accumulator:
sta $df07
lda #$b0
sta $df01
Done! Anything between approximately 10 - 265 cycles wasted, in 8 bytes of code.
@HCL, the real challenge is to make a demo for X'2018. :) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Some shorter delays..; minimal bytes, no IOPS
; preserves a,x,y,sp
; may clobber stack and flags
; 2 cycles (1 byte)
nop
; 3 cycles (2 bytes)
bit 3
; 4 cycles (2 bytes)
nop
nop
; 5 cycles (3 bytes)
nop
bit 3
; 6 cycles (3 bytes)
nop
nop
nop
; 7 cycles (2 bytes)
pha
pla
; 8 cycles (4 bytes)
nop
nop
nop
nop
; 9 cycles (3 bytes)
pha
nop
pla
;10 cycles (4 bytes)
pha
bit 3
pla
;11 cycles (4 bytes)
pha
nop
nop
pla
;12 cycles (5 bytes)
pha
nop
bit 3
pla
;13 cycles (5 bytes)
pha
nop
nop
nop
pla
;14 cycles (4 bytes)
pha
pha
pla
pla
|
| |
MagerValp
Registered: Dec 2001 Posts: 1078 |
You can replace three nops with cmp ($00,x) and save a byte. |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: You can replace three nops with cmp ($00,x) and save a byte.
Same goes for the 5c delay with nop+bit => cmp ($00),y |
| |
MagerValp
Registered: Dec 2001 Posts: 1078 |
Not for any Y though, it's 6 cycles on page crossing. If you can sacrifice a zp address you can use INC/DEC zp instead. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Quote: You can replace three nops with cmp ($00,x) and save a byte.
Oh, good point. For some reason I was only considering BIT, which doesn't have anywhere near as many addressing modes. |
| |
Kruthers
Registered: Jul 2016 Posts: 21 |
Slight tangent, but this thread reminds me of the timing hell I was in working on Sidistic. I had to constantly adjust timing all over the place, which would cause code to adjust and cross pages, changing timing.... it was driving me nuts.
So I wrote a macro that would always use 8 bytes to burn any amount of cycles (except 2 or 4, grrr!) Always wondered if that would be useful to anyone else, though it needs to trash a register and/or a ZP location for some amounts of delay.
But what still nags me: is there some way to burn 4 cycles in 8 bytes that I missed? Obviously, without using page crossing... |
| |
Frantic
Registered: Mar 2003 Posts: 1648 |
Quote: Slight tangent, but this thread reminds me of the timing hell I was in working on Sidistic. I had to constantly adjust timing all over the place, which would cause code to adjust and cross pages, changing timing.... it was driving me nuts.
So I wrote a macro that would always use 8 bytes to burn any amount of cycles (except 2 or 4, grrr!) Always wondered if that would be useful to anyone else, though it needs to trash a register and/or a ZP location for some amounts of delay.
But what still nags me: is there some way to burn 4 cycles in 8 bytes that I missed? Obviously, without using page crossing...
The only solution (to burn 4 cycles in 8 bytes) I can think of is one of the bxx branch instructions precisely when the branch is always taken due to some known register/flag state AND crossing a page boundary, since you could then skip some bytes in 4 cycles. ....so this clearly does not adhere to your specification, but my point is just that this is the only way I can think of. |
| |
Kruthers
Registered: Jul 2016 Posts: 21 |
Quote: The only solution (to burn 4 cycles in 8 bytes) I can think of is one of the bxx branch instructions precisely when the branch is always taken due to some known register/flag state AND crossing a page boundary, since you could then skip some bytes in 4 cycles. ....so this clearly does not adhere to your specification, but my point is just that this is the only way I can think of.
Yeah, pretty much what I figured. At first I held out hope that some illegal instruction would help, but after reading groepaz' doc was surprised that there are no weird branches out of all those unused opcodes. I guess a "branch slowly" was wishful thinking. ;) |