Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > Fast large multiplies
2012-06-09 19:45
Repose

Registered: Oct 2010
Posts: 222
Fast large multiplies

I've discovered some interesting optimizations for multiplying large numbers, if the multiply routine time depends on the bits of the mulitplier. Usually if there's a 1 bit in the multiplier, with a standard shift and add routine, there's a "bit" more time or that bit.
The method uses several ways of transforming the input to have less 1 bits. Normally, if every value appears equally, you average half 1 bits. In my case, that becomes the worst case, and there's about a quarter 1 bits. This can speed up any routine, even the one that happens to be in rom, by using pre- and post- processing of results. The improvement is about 20%.
Another speedup is optimizing the same multiplier applied to multiple multiplicands. This saves a little in processing the multiplier bits once. This can save another 15%.
Using the square table method will be faster but use a lot of data and a lot of code.
Would anyone be interested in this?

 
... 144 posts hidden. Click here to view all posts....
 
2017-04-08 12:18
ChristopherJam

Registered: Aug 2004
Posts: 1378
Thanks!

Of course, going to have to start forking off "best best case" vs "best worst case" vs "best average time" pretty soon.

Down to 699 cycles for 0*0, btw ;)
2017-04-08 14:09
Repose

Registered: Oct 2010
Posts: 222
I've thought about how to decide or statistically optimize by input, I think 0 and 1 would be good cases to be faster, but not at the expense of a lot of avg speed, which will vastly dominate in any sane situation.

If we finish this, next steps are signed, floating, and the big one is division. With a great multiply, you can use reciprocal division, but you still need remainder.

Ultimately I'd like to replicate all the basic arithmetic of a 32bit cpu, then it would be a complete library for the doom port (which compiles to an emulated cpu), C compilers, etc.
2017-04-10 09:53
ChristopherJam

Registered: Aug 2004
Posts: 1378
Yes, my average for multiplying 10 randomly selected pairs is around 760 cycles at the moment, ranging from around 740 to 780.

Floats only need 24bit x 24bit, so that should be a lot faster. The shifting for adds will be a bit of a hassle. Do you care about correct behaviour for NaNs etc? And how critical is exact rounding? I'm guessing IEEE standard would be considerably slower than "good enough for most purposes."
2017-04-10 09:55
ChristopherJam

Registered: Aug 2004
Posts: 1378
Also, if you're only planning on supporting 32bit ints, there's no way to access the high 32 bits of the result from C anyway - instant 2x speedup :D
2017-04-10 12:45
JackAsser

Registered: Jun 2002
Posts: 1989
Quote: I've thought about how to decide or statistically optimize by input, I think 0 and 1 would be good cases to be faster, but not at the expense of a lot of avg speed, which will vastly dominate in any sane situation.

If we finish this, next steps are signed, floating, and the big one is division. With a great multiply, you can use reciprocal division, but you still need remainder.

Ultimately I'd like to replicate all the basic arithmetic of a 32bit cpu, then it would be a complete library for the doom port (which compiles to an emulated cpu), C compilers, etc.


Funny, I just wrote my own doom-renderer using original WAD-files, albeit in C/SDL, not C64. :) But still, it was a fun exercise down memory lane. Coded a lot of such stuff back in the day.
2017-04-10 17:44
Oswald

Registered: Apr 2002
Posts: 5017
Quote: Funny, I just wrote my own doom-renderer using original WAD-files, albeit in C/SDL, not C64. :) But still, it was a fun exercise down memory lane. Coded a lot of such stuff back in the day.

cool, I hope its for some c64 demo thingie :)
2017-04-10 18:42
JackAsser

Registered: Jun 2002
Posts: 1989
Quote: cool, I hope its for some c64 demo thingie :)

No sorry. :) just curiosity.
2017-04-11 04:52
Oswald

Registered: Apr 2002
Posts: 5017
Quote: No sorry. :) just curiosity.

nah next time you make a true doom fx, or you'll be kicked out :)
2017-04-11 04:59
JackAsser

Registered: Jun 2002
Posts: 1989
Quote: nah next time you make a true doom fx, or you'll be kicked out :)

Hehe, wasn't the Andropolis vector enough? ;) there I only used 16-bit muls and divs (8.8 fixpoint) hence the coarse resolution of the level (thickness of walls and minimum stair step height). I also optimised to only have axis-aligned rooms. Moving to 16.16 muls and remove the axis alignment would perhaps slow it down by 30% but it would be able to render an untextured Doom level I'm sure.
2017-04-11 21:49
Repose

Registered: Oct 2010
Posts: 222
Very impressive. So if I can speed up a 16x16, that would be of practical benefit in speeding the framerate of Andropolis? Now there's a goal, would you be able to substitute my routine to see what happens? (If I can make a major speedup to your code)
Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ... | 16 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
DKT/Samar
iAN CooG/HVSC
Rare Candy
Airwolf/F4CG
Guests online: 166
Top Demos
1 Next Level  (9.8)
2 Mojo  (9.7)
3 Coma Light 13  (9.7)
4 Edge of Disgrace  (9.6)
5 Comaland 100%  (9.6)
6 No Bounds  (9.6)
7 Uncensored  (9.6)
8 The Ghost  (9.6)
9 Wonderland XIV  (9.6)
10 Bromance  (9.6)
Top onefile Demos
1 It's More Fun to Com..  (9.8)
2 Party Elk 2  (9.7)
3 Cubic Dream  (9.6)
4 Copper Booze  (9.5)
5 Rainbow Connection  (9.5)
6 TRSAC, Gabber & Pebe..  (9.5)
7 Onscreen 5k  (9.5)
8 Wafer Demo  (9.5)
9 Dawnfall V1.1  (9.5)
10 Quadrants  (9.5)
Top Groups
1 Oxyron  (9.3)
2 Nostalgia  (9.3)
3 Booze Design  (9.3)
4 Censor Design  (9.3)
5 Crest  (9.3)
Top Organizers
1 Burglar  (9.9)
2 Sixx  (9.8)
3 hedning  (9.7)
4 Irata  (9.7)
5 MWS  (9.6)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.066 sec.