[CSDb] - User Forums - Optimizing tricks

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Optimizing tricks

2012-03-14 08:52

Bitbreaker

Registered: Oct 2002
Posts: 508

Optimizing tricks

Hi folks,

I put together a few optimizing tricks for 6502, including a section about illegal opcodes. Anything else i could mention there? Especially the illegal opcode section could need some more examples and opcodes discussed i guess? Any mistakes?
http://www.codebase64.org/doku.php?id=base:advanced_optimizing

Bitbreaker

2012-03-14 09:11

yago

Registered: May 2002
Posts: 333

Very nice and big :-)

I dont understand the part in Zeropage about X/Y.. (why TXA, you can STX everywhere)

Clobbering Registers has offset wrong (should be +1, not +2)

2012-03-14 09:16

andym00

Registered: Jun 2009
Posts: 45

A little thing I started using a lot recently for 16bit negates..

    lax #$00
    sbx #lo
    sbc #hi

Comes in very handy at times if it's what you need.. Although that need is probably very small :)

2012-03-14 09:29

Bitbreaker

Registered: Oct 2002
Posts: 508

@yago
Thanks, fixed! made the things about zeropage more clear.

2012-03-14 09:38

Bitbreaker

Registered: Oct 2002
Posts: 508

@andym00

Hmm, that can also be used to negate two 8-bit values at a time, though sbx value is immediate, what would mean extra overhead of setting the value by code manipulation:

       stx .val1+1
       lax #$00
.val1  sbx #$00
       sec
       sbc .val2

2012-03-14 11:16

LHS

Registered: Dec 2002
Posts: 66

Good work!

one tip for the DCP opcode:

x1
.byte $7
x2
.byte $1a

-
;an effect
dec x2
lda x2
cmp x1
bne -

can be written as

-
;an effect
lda x1
dcp x2
bne -

2012-03-14 14:11

Bitbreaker

Registered: Oct 2002
Posts: 508

Added the new examples, thanks for that!
the dcp could then also be used like:

-
    ...some code
    lda #end
    dcp counter
    bne -

instead of

-
    ...some code
    dec counter
    lda counter
    cmp #end (of course silly if this would be #$00)
    bne -

2012-03-14 20:37

Testa
Account closed

Registered: Oct 2004
Posts: 197

thanks for sharing at codebase... i learned a few new tricks!, always interesting how creative you can be with legal and illigal opcodes..

2012-03-14 21:45

Peiselulli

Registered: Oct 2006
Posts: 81

Perhaps too simple, but

	cmp #$80
	ror

for doing a arithmetic shift right.

2012-03-14 21:53

JackAsser

Registered: Jun 2002
Posts: 2014

Quote: Perhaps too simple, but

cmp #$80 ror

for doing a arithmetic shift right.

Yay, my favorite! :)

2012-03-15 08:20

Bitbreaker

Registered: Oct 2002
Posts: 508

i added the arithemtic shift right to the maths section some time ago ;-) can also be combined with the illegal opcode anc instead of cmp to force carry to be always clear afterwards.

2012-03-16 18:47

tlr

Registered: Sep 2003
Posts: 1790

Perhaps this?

  lda ptr
  cmp eptr
  lda ptr+1
  sbc eptr+1
  lda ptr+2
  sbc eptr+2
  bcc ptr_lt_eptr
ptr_ge_eptr:

2012-03-17 22:45

Peiselulli

Registered: Oct 2006
Posts: 81

I have also often used the following:

     lsr
     asr #3*2

instead of

     lsr
     lsr
     and #3

2012-03-19 10:10

Bitbreaker

Registered: Oct 2002
Posts: 508

@Peiselulli:
ah, short but nice. I guess i'll spend a subsection for shifting tricks then, and include also the arithmetic right shift in there.

@tlr:
Where's the trick? If it is about substituting a sbc by cmp to save a sec, that is handled already in the article :-)

2012-03-19 16:02

tlr

Registered: Sep 2003
Posts: 1790

Quoting Bitbreaker

@tlr:
Where's the trick? If it is about substituting a sbc by cmp to save a sec, that is handled already in the article :-)

Yes, although I consider it more like using sbc's to extend a cmp.

I guess it is a bit to rudimentary to be called a trick, but it's useful nevertheless. :)

2012-03-21 10:03

Bitbreaker

Registered: Oct 2002
Posts: 508

Okay, now added a section about shifting and by that mean also slipped into the subject of jumpcode (or how to name that?)

2012-03-29 18:52

doynax
Account closed

Registered: Oct 2004
Posts: 212

Personally I'd be interested to know if anyone has found any clever uses for the more esoteric illegals opcodes.

I managed to squeeze SRE/SLO into a 2-bit IRQ loader. The idea being the minimize the time on the C64 side between reading new bits and sending the acknowledgment.

Bitbreaker: Unfortunately the ALR #$FE trick for clearing carry won't work with ARR, for some bizarre reason it actually copies the MSB to carry instead (e.g. after the AND but before the shift.) You may want to replace the note in the wiki with a warning for the unwary.

2012-03-30 08:28

Bitbreaker

Registered: Oct 2002
Posts: 508

Thanks for the info on ARR, that happens when you rely on all those articles on the net that contain only half of the truth :-) So i corrected the note and give some extra infos now.

As for SRE/SLO i happened to use them for a line routine, to be more precise to shift a a bit counter that is at the same time the pixelmask to be or'ed with the memory. Depending on if you step x-- or x++ you can then either use SRE or SLO. There are however faster means of doing lines, still:

lda #$80
sta pix

...

lda (zp),y
sre pix
bcs advance_column
sta (zp),y


...

as for making use of them within irq-loaders, could you be more precise on that? As you might have seen i also started a thread about drivecode recently, so this might be of high interest there :-)

2012-04-23 21:44

JAC

Registered: Aug 2002
Posts: 57

Nice compilation of tricks!

>Comparisons/Faster loops
I think the compile time version would be more helpful here.
ldy #$18-$10
-
sta $1000-$10,y
dey
bne -

plus a hint that an additional cycle for crossing the page boundary is require then unless base address is chosen wisely.

> When using BMI/BPL or BVS/BVC (need then to test bits with BIT however) you might even count to 1, 2, 6 or 7.
Why? You can LDA #$40 to count to 7, $20 to count to 6 etc. without changing anything.

> lda xposl ;load some value
and #$06 ;either jump 0, 2, 4 or 6 bytes far
sec ;force upcoming jump, can be saved if beq or bne is applicable
sta .jt1+1 ;setup jump
lda (zp),y ;load value to be shifted
.jt1 bcs *+2 ;jump into code with right offset

Wouldn't that be use-case for this nice ANC #$06 => BCC thing?

>SBX
Cool, using it as simple implicit A&X is nice.

2012-04-24 11:35

Bitbreaker

Registered: Oct 2002
Posts: 508

Quoting JAC

>Comparisons/Faster loops
I think the compile time version would be more helpful here.
ldy #$18-$10
-
sta $1000-$10,y
dey
bne -

plus a hint that an additional cycle for crossing the page boundary is require then unless base address is chosen wisely.

right that is :-)

Quoting JAC

> When using BMI/BPL or BVS/BVC (need then to test bits with BIT however) you might even count to 1, 2, 6 or 7.
Why? You can LDA #$40 to count to 7, $20 to count to 6 etc. without changing anything.

Yes, that would be possible as well, but using other types of branches will make use of the same content in A. That is a good thing if you want to reuse the value of A as well. If A shall only work as counter, your approach is okay.

Quoting JAC

lda xposl ;load some value
and #$06 ;either jump 0, 2, 4 or 6 bytes far
sec ;force upcoming jump, can be saved if beq or bne is applicable
sta .jt1+1 ;setup jump
lda (zp),y ;load value to be shifted
.jt1 bcs *+2 ;jump into code with right offset

Wouldn't that be use-case for this nice ANC #$06 => BCC thing?

Yes, could be made even faster with that! :-) Thanks!

2012-04-24 16:19

Cruzer

Registered: Dec 2001
Posts: 1048

Quote:

>Comparisons/Faster loops
I think the compile time version would be more helpful here.
ldy #$18-$10
-
sta $1000-$10,y
dey
bne -

plus a hint that an additional cycle for crossing the page boundary is require then unless base address is chosen wisely.

Sta abs,y doesn't take an extra cycle when crossing page boundaries. Only lda does.

2012-04-25 07:21

Bago Zonde

Registered: Dec 2010
Posts: 29

LDA $1000,Y will takes 4+1 and STA $1000,Y allways takes 5 cycles.

-------------------------------------------------------------
www.commocore.com

2012-04-25 09:14

Bitbreaker

Registered: Oct 2002
Posts: 508

Oh right, it is STA, than it was right to not add a hint on that :-) Penalty cycle of course only applies to loading operations that add an index to a 16 bit address. Means ($xx),y $xxxx,x $xxxx,y
Applies for AND, ADC, SBC, ORA, EOR, CMP, LDA, LDX, LDY.
Same goes for branches if they cross a page boundary.

2012-04-25 13:44

Skate

Registered: Jul 2003
Posts: 494

am i the only one who thinks Cruzer's reminder was enough. :)

2012-06-10 20:53

Jak T Rip

Registered: Feb 2002
Posts: 39

Excellent stuff, Bitbreaker!

I included my favourite BIT trick to ignore upcoming commands:

        beq +
        lda #$04
        .byte $2c
+       lda #$05
        sta somewhere
        rts

I learned this from the Omikron reassembler that uses this technique.

2012-06-11 08:42

Bitbreaker

Registered: Oct 2002
Posts: 508

@Jak T Rip:
this trick also occurs in the kernal, however it does not save cycles (what the article is aiming for), but making things slower. It is a good thing however if you need to make your code small, no doubt. Depending on the range of values also a simple lookup table might do the trick.

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

Linus/MSL
deeL/DD
Magic/Nah-Kolor
JonEgg
grip
rikib80
rexbeng
Guests online: 102

Top Demos

1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.6)
5 Edge of Disgrace  (9.6)
6 What Is The Matrix 2  (9.6)
7 The Demo Coder  (9.6)
8 Uncensored  (9.6)
9 Comaland 100%  (9.6)
10 Wonderland XIV  (9.6)

Top onefile Demos

1 No Listen  (9.6)
2 Layers  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.6)
6 Dawnfall V1.1  (9.5)
7 Rainbow Connection  (9.5)
8 Onscreen 5k  (9.5)
9 Morph  (9.5)
10 Libertongo  (9.5)

Top Groups

1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Triad  (9.3)
5 Censor Design  (9.3)

Top Original Suppliers

1 Derbyshire Ram  (9.7)
2 Fungus  (9.3)
3 Black Beard  (9.2)
4 Baracuda  (9.2)
5 hedning  (9.1)

Page generated in: 0.048 sec.