| |
Trap
Registered: Jul 2010 Posts: 223 |
How to make efficient double-sine calculations
Hi,
I am trying to improve a little on my effect animation skills. To that purpose I'd like to hear how you guys solve the issue of double-sine table calculations. As I am by no means a math-guru - not even close, try to keep it at a practical level :)
Sure, there has to be some clever way around this. Usually I'd do something like the following mock-up code:
lda Counter1 // Copy counters to indexes
sta Index1
lda Counter2
sta Index2
ldx #TableSize
!CalcAnim: ldy Index1
lda SineWave1,y // Get first value
iny // Index1 Delta + 1
sty Index1
ldy Index2
clc
adc SineWave2,y // Add second value
iny // Index2 Delta + 1
sty Index2
tay
lda Lookuptable,y // Find the value and store it
sta Destinationtable,x
dex
bne CalcAnim-
lda Counter1
clc
adc #1 // Velocity 1
sta Counter1
lda Counter2
clc
adc #1 // Velocity 2
sta Counter2
Apart from unrolling the loop, I am short of good ideas on how to make this efficient. Use of ZP for the indexes saves a few cycles as well.
How do you guys approach this in your demos? |
|
| |
Mixer
Registered: Apr 2008 Posts: 452 |
Consider whether some of the maths give constant results and precalculate those. For instance if the velocities are the same all the time, then (sin(a)+sin(b)) could perhaps be precalculated to a single lookup.
Set sin tables to start on page boundary and use the lsb of address as the index, and run the code on zp.
If the add or substract is always 1 then inc/dec may be better.
Sometimes the second lookup can be coded to the sine data bits. Depends on what is desired. |
| |
Glasnost Account closed
Registered: Aug 2011 Posts: 26 |
If it is very time critical, i would use speedcode, and x and y for the 2 counters. The following code would require that the tables are duplicated to fill eg 2x256:
( i times)
lda sin1+i,x
*clc
adc sin2+i,y
sta destination+i
*clc is optional in some cases.. You know your sines if they mess up the carry or not.
If you want it looped you can init zp1 to sin1+counter1, zp2 to sin2+counter2. This example works only for max i=128.
ldy #(i-1)
!loop:
lda (zp1),y
*clc
adc (zp2),y
sta destination,y
dey
bpl !loop-
Last a bit about the sine addition. Note that if you want better precision you can use:
lda sin1,x
adc sin2,y
ror |
| |
Cruzer
Registered: Dec 2001 Posts: 1048 |
PROTIP: Code
looks
better
in
a
[code]
block. |
| |
Cruzer
Registered: Dec 2001 Posts: 1048 |
Quoting Glasnost lda sin1,x
adc sin2,y
ror Remember clc after ror. Alternatively, if the sum of the two sines is always < 256 you can use: lda sin1,x
adc sin2,y
alr #$fe //throw away least significant bit and then lsr
//(always results in cleared carry) |
| |
Digger
Registered: Mar 2005 Posts: 437 |
Great tip with bit shifting to smooth the sine, never though about that. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
So, if all your tables are page aligned, and if you also follow the sinewave table with a second copy of itself, the following
should have the same result:
lda Counter1 // Copy counters to indexes
sta rna0+1
lda Counter2
sta rna1+1
ldx #TableSize
ldy #0
!CalcAnim:
clc
!rna0
lda SineWave1,y // Get first value
!rna1
adc SineWave2,y // Add second value
sta rna2+1
!rna2
lda Lookuptable // Find the value and store it
sta Destinationtable,x
iny
dex
bne CalcAnim-
lda Counter1
clc
adc #1 // Velocity 1
sta Counter1
lda Counter2
clc
adc #1 // Velocity 2
sta Counter2
But the above loop only needs seperate indices for source and destination because Y is increasing and X is decreasing.
Also, as others have pointed out, you don't need the CLC if you know the results will never overflow (eg because your sine tables contain 64+63*sin(x*pi/128) )
So, you should be able to get the same effects from an inner loop like this
!loop:
lda sin+counter1,y
adc sin+counter2,y
tax
lda lut,x
sta dst,y
dey
bne loop
You can also halve the number of DEY/BNEs by a partial unroll, dividing the output into first/second half:
ldy#TableSize/2
!loop
lda sin+counter1,y
adc sin+counter2,y
tax
lda lut,x
sta dst,y
lda sin+counter1+TableSize/2,y
adc sin+counter2+TableSize/2,y
tax
lda lut,x
sta dst+TableSize/2,y
dey
bne loop
The ALR mentioned above is good to know about (it's news to me!), but if all you want is a straight divide by two of the eight bit result, you can fold that into the table lookup and save the cycles.
(Alternately, if you put a ROR after the ADC you can use 128+127*sin(x*pi/128) - the carry will contain noise, but even without a CLC before the ADC the result will still be more accurate than using the smaller scale factor on your tables.) |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
depending on what your lookup values are you may skip the entire lookup by already putting the lookup values into sin1/sin2 or fabricating them so that after adc you get the right values :P :) |
| |
lft
Registered: Jul 2007 Posts: 369 |
Or use character mode, and integrate the lookup table into the font. |
| |
Trap
Registered: Jul 2010 Posts: 223 |
So much awesome info. Thanks guys.
I think some of you are thinking a traditional movement system, but that is not what I was after. Pre-calculating the movements for a table would produce an awful lot of data - for every line, I'd have at least 128 bytes (for a semi-smooth experience). I am not looking for a follow-path, but creating dynamic waves.
With your input I've managed a 33% speed improvement and there's still some cycles left I can shave off.
The ALR/ROR tips were awesome <3
Lft, that's an interesting thought - could you elaborate? |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
the examples are not about precalculating movements into a table, the most extreme here is to precalculate your counter+velocity values into table lookups in an unrolled loop, and that unrolling (speedcode generation) also can happen realtime when changing effect movement.
unrolling your loop with your labels would give you:
ldx Counter1
ldy Counter2
for loopcount=0 to tablesize
lda sin1+LoopCount*Velocity1,x
clc
adc sin2+LoopCount*Velocity2,y
sta destinationtable+loopcount
next
(out of registers for lookuptable)
speed increase would be many fold instead of lowly 33% |
... 5 posts hidden. Click here to view all posts.... |
Previous - 1 | 2 - Next |