| |
mhindsbo
Registered: Dec 2014 Posts: 51 |
fastest or smallest 'switch' statment
I use the following code a lot to switch between two values and was wondering what others do and if there is a faster, smaller or simply more elegant way someone has come up with.
lda #value1 ; default value: AR = value1
ldy switch ; get value of switch in YR
cpy #case1 ; compare switch
beq @cont
lda #value2 ; if switch != #case1 then AR = value2
@cont ... ; AR = value1/value2 depending on switch
|
|
| |
iAN CooG
Registered: May 2002 Posts: 3194 |
a table, use y as index
ldy switch
lda table,y
table
byte value1, value2, etc |
| |
mhindsbo
Registered: Dec 2014 Posts: 51 |
Absolutely ... That is probably the most elegant, especially for multiple values.
But what about the case where you have two switch values that are not sequential. E.g. two different screen address (hi byte), where switch value is not 0,1,2,3,... But e.g. $c0 and $d0.
Any elegant alternatives in that specific case? |
| |
iAN CooG
Registered: May 2002 Posts: 3194 |
if all you have is a 8bit index and want to return a 8bit value, a $100 bytes table is always the fastest solution, just place the values where needed, unless you don't have a free page ($100 bytes) in memory for this lookup.
But if you just need to select 2 values, a cmp is enough.. |
| |
Fungus
Registered: Sep 2002 Posts: 686 |
Just 2 values?
mod1
lda value
eor #$xx
sta mod1+1
where xx is the eor to get the other value you want to toggle between. |
| |
soci
Registered: Sep 2003 Posts: 480 |
Fungus: There was at least a '#' missing.
mod1 lda #value1
eor #value1 ^ value2
sta mod1 + 1
But this is not what he wants to do.
For $c0 and $d0 only a shorter table could do it fast.
ldy switch
lda table - $c0,y
table .fill $d0 - $c0, value1
.byte value2
|
| |
lft
Registered: Jul 2007 Posts: 369 |
Quoting iAN CooGa table, use y as index
ldy switch
lda table,y
table
byte value1, value2, etc
Sometimes this could be faster:
switch = * + 1
lda table
|
| |
mhindsbo
Registered: Dec 2014 Posts: 51 |
Thanks for all the input. good stuff. I dont have $100 to spend on a table unfortunately.
I use it in a number of functions where the switch values are different (but all to switch between two states)... so it would lead to multiple tables.
The 'cmp' is probably the best in this use case, but it just felt so inelegant using it all the time ;-) |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
can you cite a real world example where you need to use this ? maybe it can be done algorithmically better. |
| |
mhindsbo
Registered: Dec 2014 Posts: 51 |
In my game I have a number of objects (enemies, bullets, ...) and each has a specific identifier (0-255). In many of the object routines I check for a specific object or a specific state and take one of two actions or set one of two parameters.
Hope this helps explain it. Its always a balance of speed and size for a game. Tables can be obvious as can unrolled code or specific code for specific objects ... but with level graphics, music, etc. I find myself often in a compromise between the two.
E.g. the table lookup is faster, but cant do too many $100 tables. Specific code for each object is faster, but similar ends up eating up memory quickly.
Hope that gives some context. I am decently happy with what I mentioned originally in terms of compromise to switch between to values depending on a parameter ... but just thought I would seek some inspiration as well.
Thanks to everyone who chimed in! |
| |
soci
Registered: Sep 2003 Posts: 480 |
Quoting mhindsboThe 'cmp' is probably the best in this use case, but it just felt so inelegant using it all the time ;-)
Well, then use the one below for a change ;)
lda switch ; get value of switch in AR.
eor #case1 ; compare switch.
beq @cont
lda #value2^value1 ; if switch != #case1 then AR = value2
@cont eor #value1 ; default value: AR = value1
; AR = value1/value2 depending on switch
It's not faster or shorter, but it's only using the accumulator and does not destroy the carry. |
| |
mhindsbo
Registered: Dec 2014 Posts: 51 |
nice ... I like it :-) |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Depending on the values you wish as result, there's also the option to go like this:
cmp #switch
arr #$00
this would work if you need $80 and $00 as resulting values, so it would be interesting what the input and output values are, for some, there can be nice code constructs to generate the resulting values with a few mnemonics and without a branch.
Also, if the result would be just 2 values, the table approach woud just need 2 values strayed into the code somewhere, should be bearable in regards of memory footprint.
Another example that sets bit 4 depending on carry and toggling bit 5 on every call:
cmp #switch
and #$ef ;clear bit 4
adc #$20 ;toggle bit 5 and set bit 4 depending on carry -> adc #$20/21
ora #$0f ;set all lower bits again (might be omitted)
|
| |
lft
Registered: Jul 2007 Posts: 369 |
Quoting mhindsboIn my game I have a number of objects (enemies, bullets, ...) and each has a specific identifier (0-255). In many of the object routines I check for a specific object or a specific state and take one of two actions or set one of two parameters.
This will again depend on circumstances, but sometimes it is useful to encode information about the objects in a flag table. If you have 256 objects, and you need to switch one way or the other depending on whether an object is edible, held, a key, dangerous etc., then you might encode that as flags in a table. Some flags will be static, and some will change during gameplay. Then you have a one-page table that tracks eight such flags per object.
; object number in y
lda flags1,y
and #$40 ; is this a bullet?
beq ...
|
| |
Frantic
Registered: Mar 2003 Posts: 1648 |
http://codebase64.org/doku.php?id=base:dispatch_on_a_byte |
| |
mhindsbo
Registered: Dec 2014 Posts: 51 |
thanks all! some good input. I'm glad I asked. |
| |
Fred
Registered: Feb 2003 Posts: 285 |
There is also a way of doing this without using any branch instruction or jump table.
Certainly not the fastest and also not the smallest code on a 6510 CPU:
lda switch
cmp #case1
php
pla
lsr
and #$01
eor #$ff
adc #$00
and #value1 - value2
clc
adc #value2 This can be optimized a bit by using an undocumented instruction:
lda switch
cmp #case1
php
pla
asr #$02
eor #$ff
adc #$01
and #value1 - value2
clc
adc #value2
For the 6510 cpu it doesn't matter a lot to avoid branches. For e.g. Intel based CPUs it does matter in certain cases. Some compilers will optimize the following:
if (condition) {
value = 12;
} else {
value = 34;
}
into this:
value = - Integer(condition) and (12 - 34) + 34;
Btw, the example of Bitbreaker using ARR #$00 will not work since the zero flag isn't taken into account. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Quoting FredBtw, the example of Bitbreaker using ARR #$00 will not work since the zero flag isn't taken into account.
As said, depending on the values: If value1 < value2 then carry will be either set on equal or cleared on not equal, just fair if you need to differ two cases with static values only. |
| |
Fred
Registered: Feb 2003 Posts: 285 |
True. With the right values it is a nice and short solution. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Quoting Fred
lda switch
cmp #case1
php
pla
asr #$02
eor #$ff
adc #$01
and #value1 - value2
clc
adc #value2
value = - Integer(condition) and (12 - 34) + 34;
Wouldn't this also be the same?
lda switch
cmp #case1
php
pla
asr #$02
sbc #$00 ;results in either $00 or $ff
and #value2 - value1
clc
adc #value1
Where we take:
value = (Integer(condition) - 1) and (34 - 12) + 12;
|
| |
Fred
Registered: Feb 2003 Posts: 285 |
Nice one.
Another optimization that can be done is when the AND uses a value of less than 128, the ANC instruction can be used instead so that the CLC afterwards can be removed. |
| |
lft
Registered: Jul 2007 Posts: 369 |
Or, you know, don't use addition in the first place.
; accumulator is either 00 or ff
and #value2 ^ value1
eor #value1
|
| |
soci
Registered: Sep 2003 Posts: 480 |
BB/Fred: Ok, great. As mentioned earlier there are no pipeline stalls to avoid here, and that switching construct is suboptimal in every way.
What's next, how to avoid cache line bouncing on large multi processor 6502 systems? Various synchronization primitives for my threaded code? Pre-fetching? How to optimize unaligned access? Use of barriers for memory mapped I/O? |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Sure, but optimising is fun :-D |
| |
soci
Registered: Sep 2003 Posts: 480 |
Yes, no problem with that. But it seemed quite a bit pointless, and then it was pushed even further ;) |