| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Optimizing tricks
Hi folks,
I put together a few optimizing tricks for 6502, including a section about illegal opcodes. Anything else i could mention there? Especially the illegal opcode section could need some more examples and opcodes discussed i guess? Any mistakes?
http://www.codebase64.org/doku.php?id=base:advanced_optimizing
Bitbreaker |
|
... 15 posts hidden. Click here to view all posts.... |
| |
doynax Account closed
Registered: Oct 2004 Posts: 212 |
Personally I'd be interested to know if anyone has found any clever uses for the more esoteric illegals opcodes.
I managed to squeeze SRE/SLO into a 2-bit IRQ loader. The idea being the minimize the time on the C64 side between reading new bits and sending the acknowledgment.
Bitbreaker: Unfortunately the ALR #$FE trick for clearing carry won't work with ARR, for some bizarre reason it actually copies the MSB to carry instead (e.g. after the AND but before the shift.) You may want to replace the note in the wiki with a warning for the unwary. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Thanks for the info on ARR, that happens when you rely on all those articles on the net that contain only half of the truth :-) So i corrected the note and give some extra infos now.
As for SRE/SLO i happened to use them for a line routine, to be more precise to shift a a bit counter that is at the same time the pixelmask to be or'ed with the memory. Depending on if you step x-- or x++ you can then either use SRE or SLO. There are however faster means of doing lines, still:
lda #$80
sta pix
...
lda (zp),y
sre pix
bcs advance_column
sta (zp),y
...
as for making use of them within irq-loaders, could you be more precise on that? As you might have seen i also started a thread about drivecode recently, so this might be of high interest there :-) |
| |
JAC
Registered: Aug 2002 Posts: 57 |
Nice compilation of tricks!
>Comparisons/Faster loops
I think the compile time version would be more helpful here.
ldy #$18-$10
-
sta $1000-$10,y
dey
bne -
plus a hint that an additional cycle for crossing the page boundary is require then unless base address is chosen wisely.
> When using BMI/BPL or BVS/BVC (need then to test bits with BIT however) you might even count to 1, 2, 6 or 7.
Why? You can LDA #$40 to count to 7, $20 to count to 6 etc. without changing anything.
> lda xposl ;load some value
and #$06 ;either jump 0, 2, 4 or 6 bytes far
sec ;force upcoming jump, can be saved if beq or bne is applicable
sta .jt1+1 ;setup jump
lda (zp),y ;load value to be shifted
.jt1 bcs *+2 ;jump into code with right offset
Wouldn't that be use-case for this nice ANC #$06 => BCC thing?
>SBX
Cool, using it as simple implicit A&X is nice.
|
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Quoting JAC
>Comparisons/Faster loops
I think the compile time version would be more helpful here.
ldy #$18-$10
-
sta $1000-$10,y
dey
bne -
plus a hint that an additional cycle for crossing the page boundary is require then unless base address is chosen wisely.
right that is :-)
Quoting JAC
> When using BMI/BPL or BVS/BVC (need then to test bits with BIT however) you might even count to 1, 2, 6 or 7.
Why? You can LDA #$40 to count to 7, $20 to count to 6 etc. without changing anything.
Yes, that would be possible as well, but using other types of branches will make use of the same content in A. That is a good thing if you want to reuse the value of A as well. If A shall only work as counter, your approach is okay.
Quoting JAC
lda xposl ;load some value
and #$06 ;either jump 0, 2, 4 or 6 bytes far
sec ;force upcoming jump, can be saved if beq or bne is applicable
sta .jt1+1 ;setup jump
lda (zp),y ;load value to be shifted
.jt1 bcs *+2 ;jump into code with right offset
Wouldn't that be use-case for this nice ANC #$06 => BCC thing?
Yes, could be made even faster with that! :-) Thanks! |
| |
Cruzer
Registered: Dec 2001 Posts: 1048 |
Quote:>Comparisons/Faster loops
I think the compile time version would be more helpful here.
ldy #$18-$10
-
sta $1000-$10,y
dey
bne -
plus a hint that an additional cycle for crossing the page boundary is require then unless base address is chosen wisely.
Sta abs,y doesn't take an extra cycle when crossing page boundaries. Only lda does. |
| |
Bago Zonde
Registered: Dec 2010 Posts: 29 |
LDA $1000,Y will takes 4+1 and STA $1000,Y allways takes 5 cycles.
-------------------------------------------------------------
www.commocore.com |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Oh right, it is STA, than it was right to not add a hint on that :-) Penalty cycle of course only applies to loading operations that add an index to a 16 bit address. Means ($xx),y $xxxx,x $xxxx,y
Applies for AND, ADC, SBC, ORA, EOR, CMP, LDA, LDX, LDY.
Same goes for branches if they cross a page boundary. |
| |
Skate
Registered: Jul 2003 Posts: 494 |
am i the only one who thinks Cruzer's reminder was enough. :) |
| |
Jak T Rip
Registered: Feb 2002 Posts: 39 |
Excellent stuff, Bitbreaker!
I included my favourite BIT trick to ignore upcoming commands:
beq +
lda #$04
.byte $2c
+ lda #$05
sta somewhere
rts
I learned this from the Omikron reassembler that uses this technique. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
@Jak T Rip:
this trick also occurs in the kernal, however it does not save cycles (what the article is aiming for), but making things slower. It is a good thing however if you need to make your code small, no doubt. Depending on the range of values also a simple lookup table might do the trick. |
Previous - 1 | 2 | 3 - Next |