[CSDb] - User Forums - 8 or 16bit muls/divs

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > 8 or 16bit muls/divs

2006-10-15 13:01

Luke

Registered: Dec 2004
Posts: 19

8 or 16bit muls/divs

About multiply:

Bad one:

a*b=((a+b)/2)^2-(((a+b)/2-b))^2 with x*x matrix

It's near 30-40 cycles, but unstable with (a+b)bit0 bcoz ror after adc.

Some ppl using "nybble tables" $0n x $xy with $1000 size of tables and swap+add, but more than 50 cycles left.

And last bad is old asl bcc adc ror routine, but very slow
like 80+ cycles.

Now I trying to do any 16x16bit signed multiply, but it's really hard to make "short time" routine. Anyone got any idea?

About Divu/Divs

Classical lsr bcs sbc rol too slow.

But a/b= e^(lna-lnb) looks quite fast with e^x matrix, but
quite inaccurate.

Somebody got other "faster" idea for it? Particular 16/16bit routines.

2006-10-15 13:09

Graham
Account closed

Registered: Dec 2002
Posts: 990

For 16 bit mul simply do this:

(256*x_hi + x_lo)*(256*y_hi + y_lo) = 65536*x_hi*y_hi + 256*x_hi*y_lo + 256*x_lo*y_hi + x_lo*y_lo

So a 16 bit mul ist just four 8 bit muls. Ofcourse, the sign is a problem, you have to remove it before mul and apply it afterwards again.

And div is always a problem. That's why DIV is much slower on all CPUs: There is no good way to optimize it.

2006-10-15 13:28

Luke

Registered: Dec 2004
Posts: 19

I remember that, but after tests slower than classical asl bcc adc ror :) But I'll check it again.

edit. You are right, can be faster with tables. Damn, how to crunch that tables now , it's too much memory left :)

edit2. btw. Anyone got any fast arithmetical a=x*x method?

2006-10-16 06:38

Oswald

Registered: Apr 2002
Posts: 5094

to my knowledge the best method is the a*b=((a+b)/2)^2-(((a+b)/2-b))^2 considering both speed and accuracy.

for divide it heavily depends on what you want to do. for getting line slopes the a/b= e^(lna-lnb) method is accurate enough, if you need accuracy you have to use multiply ((1/a)*b) imho.

2006-10-16 11:23

Luke

Registered: Dec 2004
Posts: 19

I need 16/16 div for 3d engine routines. Particular for shots 3d world across observer's axes ("pyramid of visibility" ? :D ). It mean method when observer can fly between objects into 3dworld.

2006-10-16 11:30

Oswald

Registered: Apr 2002
Posts: 5094

well if I were you I'd first do it the slow and accurate way, and only when it works would change for optimized shit.

btw do u know stephen judd's cool world ? google for commodore hacking, and find the issue with the article about it, it describes and demonstrates a true 3d engine for the c64.

2006-10-16 11:56

Luke

Registered: Dec 2004
Posts: 19

ffd2.com rox :DDDD lol nice but why not jsrffd2.com ?:DDDD Thank you, I will check :) It looks very interesting :) Give hope nice evening today :)

btw. What you mean "getting line slopes" , you mean clasical (dy/dx)or something more?

2006-10-16 11:59

Skate

Registered: Jul 2003
Posts: 494

Like Oswald said, you can always use (1/a)*b for div routines. Use your coordinate system between 0 and 1 and divide this interval to 65536 pieces. So when you will never need to calculate sth like 43256/9721. Instead you'll always multiply these vaules and get the high byte of the result. For example;

39532*3424/65536 => (16bit x 16 bit) >> 16 bit

You will need a 16 bit x 16 bit = 32 bit multiply routine which is always faster than 16 bit/16 bit with other alternative algorithms (as I know). And you won't need low part (16 bit) of the 32 bits result. You will use only high part. So it will shorten the calculation a bit.

2006-10-16 15:16

Oswald

Registered: Apr 2002
Posts: 5094

I mean for line slopes the dx/dy thingie.

2006-10-16 16:55

ready.

Registered: Feb 2003
Posts: 441

Hi,
about division, in the demo Aurora (still final version has to be uploaded) I needed the division routine. A VERY fast method is base on the following:

Let's say you want to scale an x quantity. Do
lda X
lsr
sta d1 ;x/2
lsr
sta d2 ;x/4
lsr
sta d3 ;x/8
....

Then at some point you just add the d? you want:

d1+d2+d3=x*7/8
d1+d2 =x*3/4
d1+d3 =....
in this way just by adding terms, you can build the percentage of x you want. In the example the highest resolution is x/8 but do more lsr and you can get x/16, x/32,....

This method proved to be quite fast for zooming the flower shapes at the end of the demo. I'm sure it's also easy to implement it for 16 bit. Give it a try.

ciao,
Ready.

2006-10-16 18:40

WVL

Registered: Mar 2002
Posts: 902

Quote: I remember that, but after tests slower than classical asl bcc adc ror :) But I'll check it again.

edit. You are right, can be faster with tables. Damn, how to crunch that tables now , it's too much memory left :)

edit2. btw. Anyone got any fast arithmetical a=x*x method?

you can make a x^2 table by doing like this :

0,1,4,9,16,25,36

as you can see, the differences are

0,1,3,5,7,9,11,etc

so first number is 0, and then calc all numbers by nextnumber=previousnumber+1+2*positionoldnumber

... 29 posts hidden. Click here to view all posts....

Previous - 1 | 2 | 3 | 4 - Next

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

ΛΛdZ
Mike
encore
WVL/Xenon
MWR/Visdom
grennouille
deetsay
rime/Fancy Rats
Peacemaker/CENSOR/Hi..
Mibri/ATL^MSL^PRX
Guests online: 125

Top Demos

1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.6)
5 Edge of Disgrace  (9.6)
6 What Is The Matrix 2  (9.6)
7 The Demo Coder  (9.6)
8 Uncensored  (9.6)
9 Comaland 100%  (9.6)
10 Wonderland XIV  (9.6)

Top onefile Demos

1 No Listen  (9.6)
2 Layers  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.6)
6 Dawnfall V1.1  (9.5)
7 Rainbow Connection  (9.5)
8 Onscreen 5k  (9.5)
9 Morph  (9.5)
10 Libertongo  (9.5)

Top Groups

1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Censor Design  (9.3)
5 Triad  (9.3)

Top Swappers

1 Derbyshire Ram  (10)
2 Jerry  (9.8)
3 Violator  (9.7)
4 Acidchild  (9.7)
5 Cash  (9.6)

Page generated in: 0.062 sec.