| |
Luke
Registered: Dec 2004 Posts: 19 |
8 or 16bit muls/divs
About multiply:
Bad one:
a*b=((a+b)/2)^2-(((a+b)/2-b))^2 with x*x matrix
It's near 30-40 cycles, but unstable with (a+b)bit0 bcoz ror after adc.
Some ppl using "nybble tables" $0n x $xy with $1000 size of tables and swap+add, but more than 50 cycles left.
And last bad is old asl bcc adc ror routine, but very slow
like 80+ cycles.
Now I trying to do any 16x16bit signed multiply, but it's really hard to make "short time" routine. Anyone got any idea?
About Divu/Divs
Classical lsr bcs sbc rol too slow.
But a/b= e^(lna-lnb) looks quite fast with e^x matrix, but
quite inaccurate.
Somebody got other "faster" idea for it? Particular 16/16bit routines.
|
|
| |
Graham Account closed
Registered: Dec 2002 Posts: 990 |
For 16 bit mul simply do this:
(256*x_hi + x_lo)*(256*y_hi + y_lo) = 65536*x_hi*y_hi + 256*x_hi*y_lo + 256*x_lo*y_hi + x_lo*y_lo
So a 16 bit mul ist just four 8 bit muls. Ofcourse, the sign is a problem, you have to remove it before mul and apply it afterwards again.
And div is always a problem. That's why DIV is much slower on all CPUs: There is no good way to optimize it. |
| |
Luke
Registered: Dec 2004 Posts: 19 |
I remember that, but after tests slower than classical asl bcc adc ror :) But I'll check it again.
edit. You are right, can be faster with tables. Damn, how to crunch that tables now , it's too much memory left :)
edit2. btw. Anyone got any fast arithmetical a=x*x method? |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
to my knowledge the best method is the a*b=((a+b)/2)^2-(((a+b)/2-b))^2 considering both speed and accuracy.
for divide it heavily depends on what you want to do. for getting line slopes the a/b= e^(lna-lnb) method is accurate enough, if you need accuracy you have to use multiply ((1/a)*b) imho. |
| |
Luke
Registered: Dec 2004 Posts: 19 |
I need 16/16 div for 3d engine routines. Particular for shots 3d world across observer's axes ("pyramid of visibility" ? :D ). It mean method when observer can fly between objects into 3dworld.
|
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
well if I were you I'd first do it the slow and accurate way, and only when it works would change for optimized shit.
btw do u know stephen judd's cool world ? google for commodore hacking, and find the issue with the article about it, it describes and demonstrates a true 3d engine for the c64. |
| |
Luke
Registered: Dec 2004 Posts: 19 |
ffd2.com rox :DDDD lol nice but why not jsrffd2.com ?:DDDD Thank you, I will check :) It looks very interesting :) Give hope nice evening today :)
btw. What you mean "getting line slopes" , you mean clasical (dy/dx)or something more?
|
| |
Skate
Registered: Jul 2003 Posts: 494 |
Like Oswald said, you can always use (1/a)*b for div routines. Use your coordinate system between 0 and 1 and divide this interval to 65536 pieces. So when you will never need to calculate sth like 43256/9721. Instead you'll always multiply these vaules and get the high byte of the result. For example;
39532*3424/65536 => (16bit x 16 bit) >> 16 bit
You will need a 16 bit x 16 bit = 32 bit multiply routine which is always faster than 16 bit/16 bit with other alternative algorithms (as I know). And you won't need low part (16 bit) of the 32 bits result. You will use only high part. So it will shorten the calculation a bit. |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
I mean for line slopes the dx/dy thingie. |
| |
ready.
Registered: Feb 2003 Posts: 441 |
Hi,
about division, in the demo Aurora (still final version has to be uploaded) I needed the division routine. A VERY fast method is base on the following:
Let's say you want to scale an x quantity. Do
lda X
lsr
sta d1 ;x/2
lsr
sta d2 ;x/4
lsr
sta d3 ;x/8
....
Then at some point you just add the d? you want:
d1+d2+d3=x*7/8
d1+d2 =x*3/4
d1+d3 =....
in this way just by adding terms, you can build the percentage of x you want. In the example the highest resolution is x/8 but do more lsr and you can get x/16, x/32,....
This method proved to be quite fast for zooming the flower shapes at the end of the demo. I'm sure it's also easy to implement it for 16 bit. Give it a try.
ciao,
Ready.
|
| |
WVL
Registered: Mar 2002 Posts: 902 |
Quote: I remember that, but after tests slower than classical asl bcc adc ror :) But I'll check it again.
edit. You are right, can be faster with tables. Damn, how to crunch that tables now , it's too much memory left :)
edit2. btw. Anyone got any fast arithmetical a=x*x method?
you can make a x^2 table by doing like this :
0,1,4,9,16,25,36
as you can see, the differences are
0,1,3,5,7,9,11,etc
so first number is 0, and then calc all numbers by nextnumber=previousnumber+1+2*positionoldnumber
|
... 29 posts hidden. Click here to view all posts.... |
Previous - 1 | 2 | 3 | 4 - Next |