Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > Fast large multiplies
2012-06-09 19:45
Repose

Registered: Oct 2010
Posts: 222
Fast large multiplies

I've discovered some interesting optimizations for multiplying large numbers, if the multiply routine time depends on the bits of the mulitplier. Usually if there's a 1 bit in the multiplier, with a standard shift and add routine, there's a "bit" more time or that bit.
The method uses several ways of transforming the input to have less 1 bits. Normally, if every value appears equally, you average half 1 bits. In my case, that becomes the worst case, and there's about a quarter 1 bits. This can speed up any routine, even the one that happens to be in rom, by using pre- and post- processing of results. The improvement is about 20%.
Another speedup is optimizing the same multiplier applied to multiple multiplicands. This saves a little in processing the multiplier bits once. This can save another 15%.
Using the square table method will be faster but use a lot of data and a lot of code.
Would anyone be interested in this?

 
... 144 posts hidden. Click here to view all posts....
 
2017-04-14 00:55
Repose

Registered: Oct 2010
Posts: 222
00 01 02 03 * 04 05 06 07 and manipulate the tables to what you want to test adds for every branch, and number of carries per column up to 14, think that should do it.
2017-04-14 15:15
ChristopherJam

Registered: Aug 2004
Posts: 1378
Had a thought this morning - the difference of squares is already well established, the only thing that really needs testing is the carry handling for each column. I'll post about that over at sets of add/sub shortly.
2017-04-14 20:36
Repose

Registered: Oct 2010
Posts: 222
That's basically what I just said - multiplying is just adding from a table. Test coverage would include each carry and each amount of carries per column.
2017-04-15 06:35
ChristopherJam

Registered: Aug 2004
Posts: 1378
Quote: That's basically what I just said - multiplying is just adding from a table. Test coverage would include each carry and each amount of carries per column.

Fair point - I guess I got distracted by your talk of table manipulation.

Posting some analysis of the individual carries in the other thread shortly.

But back to multiplies - I was curious as to how you got away with not offsetting the g() table, then it finally struck me - using SBC instead of ADC is exactly equivalent to doing an ADC of a $ffff-g() table.

Do you have working code yet? I would expect you too need a different offset for each column.
2017-04-15 06:45
Repose

Registered: Oct 2010
Posts: 222
Just about to work out the subs, though I'm sure it works in some equivalent way, I'm thinking at most a sec or clc when switching between runs of adds and runs of subs. You can do one fixup at the end. The way I'm doing it makes sense too. No offsets needed.
(ps why did Ice T suddenly flash in my mind singing, no beepers needed?)
Sounds like mine is gonna be a lot cleaner, not to mention faster but we'll see :)
2017-04-15 11:19
ChristopherJam

Registered: Aug 2004
Posts: 1378
OK, 16x16 done and tested. Minimum 205 cycles, mean of around 216, including 12 cycles for the JSR/RTS

(assuming multiplier, multiplicand and destination all in ZP). I've just modified the codegen for the 32x32 for now, will have a look later to see if I've missed any obvious optimisations.
2017-04-15 13:01
JackAsser

Registered: Jun 2002
Posts: 1989
Quote: OK, 16x16 done and tested. Minimum 205 cycles, mean of around 216, including 12 cycles for the JSR/RTS

(assuming multiplier, multiplicand and destination all in ZP). I've just modified the codegen for the 32x32 for now, will have a look later to see if I've missed any obvious optimisations.


How does this compare to my stuff on Codebase? Also unsigned?
2017-04-15 14:22
ChristopherJam

Registered: Aug 2004
Posts: 1378
Under the same conditions, your stuff averages ~241 cycles, with a minimum of 232. So, only about 10% faster?

Unsigned, yes.
2017-04-15 14:47
Frantic

Registered: Mar 2003
Posts: 1627
10% faster ain't bad!
2017-04-15 16:49
JackAsser

Registered: Jun 2002
Posts: 1989
Quote: Under the same conditions, your stuff averages ~241 cycles, with a minimum of 232. So, only about 10% faster?

Unsigned, yes.


Nice!!! Havn't checked in detail, same table space overhead?
Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | ... | 16 | 17 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
stephan-a
Guests online: 105
Top Demos
1 Next Level  (9.8)
2 Mojo  (9.7)
3 Coma Light 13  (9.7)
4 Edge of Disgrace  (9.6)
5 Comaland 100%  (9.6)
6 No Bounds  (9.6)
7 Uncensored  (9.6)
8 Wonderland XIV  (9.6)
9 Memento Mori  (9.6)
10 Bromance  (9.5)
Top onefile Demos
1 It's More Fun to Com..  (9.7)
2 Party Elk 2  (9.7)
3 Cubic Dream  (9.6)
4 Copper Booze  (9.5)
5 TRSAC, Gabber & Pebe..  (9.5)
6 Rainbow Connection  (9.5)
7 Wafer Demo  (9.5)
8 Dawnfall V1.1  (9.5)
9 Quadrants  (9.5)
10 Daah, Those Acid Pil..  (9.5)
Top Groups
1 Nostalgia  (9.3)
2 Oxyron  (9.3)
3 Booze Design  (9.3)
4 Censor Design  (9.3)
5 Crest  (9.3)
Top Fullscreen Graphicians
1 Carrion  (9.8)
2 Joe  (9.8)
3 Duce  (9.8)
4 Mirage  (9.7)
5 Facet  (9.7)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.053 sec.