[CSDb] - User Forums - Fast large multiplies

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Fast large multiplies

2012-06-09 19:45

Repose
Account closed

Registered: Oct 2010
Posts: 227

Fast large multiplies

I've discovered some interesting optimizations for multiplying large numbers, if the multiply routine time depends on the bits of the mulitplier. Usually if there's a 1 bit in the multiplier, with a standard shift and add routine, there's a "bit" more time or that bit.
The method uses several ways of transforming the input to have less 1 bits. Normally, if every value appears equally, you average half 1 bits. In my case, that becomes the worst case, and there's about a quarter 1 bits. This can speed up any routine, even the one that happens to be in rom, by using pre- and post- processing of results. The improvement is about 20%.
Another speedup is optimizing the same multiplier applied to multiple multiplicands. This saves a little in processing the multiplier bits once. This can save another 15%.
Using the square table method will be faster but use a lot of data and a lot of code.
Would anyone be interested in this?

... 145 posts hidden. Click here to view all posts....

2017-03-29 10:21

Repose
Account closed

Registered: Oct 2010
Posts: 227

Yes, that's the one I beat already.

2017-03-29 11:46

JackAsser

Registered: Jun 2002
Posts: 2038

Quote: Yes, that's the one I beat already.

Interesting approach! I'll try it myself someday!

2017-03-29 12:03

ChristopherJam

Registered: Aug 2004
Posts: 1424

Stop distracting me, I've got other stuff I'm meant to be working on!

2017-03-29 12:22

JackAsser

Registered: Jun 2002
Posts: 2038

Quote: Stop distracting me, I've got other stuff I'm meant to be working on!

Circle closed. You were the first to teach me multiplications using squares. I remember I found you on some forum ages ago. :D

2017-03-29 13:46

ChristopherJam

Registered: Aug 2004
Posts: 1424

Oh, sweet. I remember hammering that one out with Stephen Judd on IRC back in the day.

It was a long time ago now, but I think the difference of squares concept was one he introduced me to, I just helped optimise the implementation (I'm fairy sure the truncating divide by 4 was one of my contributions?)

2017-03-29 14:41

JackAsser

Registered: Jun 2002
Posts: 2038

Quote: Oh, sweet. I remember hammering that one out with Stephen Judd on IRC back in the day.

It was a long time ago now, but I think the difference of squares concept was one he introduced me to, I just helped optimise the implementation (I'm fairy sure the truncating divide by 4 was one of my contributions?)

Yeah maybe. I remember Graham did the implicit add-optimization to it. I.e. self-mod the highbyte with X and then use ,Y in the table lookup. That way the X+Y comes for "free"

2017-03-29 15:34

Repose
Account closed

Registered: Oct 2010
Posts: 227

That very idea I always thought was brilliant, because it made a useful calculation with the indexing at the same time as a lookup. As a result, this algorithm is faster than could ever be possible. I call it a 'super' algorithm.

And yes, Judd told me that at the time we were working on the C=Hacking article together.

Anyone remember that coding thread in comp.sys.cbm and later, the z80 vs 6502 'contest' ?

2017-03-29 15:48

Frantic

Registered: Mar 2003
Posts: 1661

Quote: Yes, that's the one I beat already.

Don't hesitate to write an article, or at least post some source code, at Codebase64. :) Would be much appreciated!

2017-04-07 22:12

ChristopherJam

Registered: Aug 2004
Posts: 1424

Repose, did you get around to completing a 32x32->64 routine? I'm trying out some of the ideas from this and the sets of add/sub threads; currently down to around 1300 cycles, though I've still got some optimising to go.

I'm using inx to track carries, but have yet to remove the CLCs after each branch.

2017-04-08 03:52

Repose
Account closed

Registered: Oct 2010
Posts: 227

Yes I wrote a 16x16 to test, and it takes 211 cycles (that's avg with equal branches). The one posted on codebase is 231 cycles. I am just testing it now but having some trouble finding a single step debugger that works for me, see thread.

The time to beat for 32x32 is at least 1300 as posted on codebase so you have to keep going...

I can derive the formula for my current version to estimate exact timing of 32x32 I'll have to get back to you in a bit.

Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ... | 16 - Next

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

Sokrates
CA$H/TRiAD
Didi/Laxity
Laddh
Spinball/Excess
DonChaos
Guests online: 137

Top Demos

1 Next Level  (9.7)
2 13:37  (9.7)
3 Codeboys & Endians  (9.7)
4 Mojo  (9.6)
5 Coma Light 13  (9.6)
6 Edge of Disgrace  (9.6)
7 Signal Carnival  (9.6)
8 Wonderland XIV  (9.5)
9 Uncensored  (9.5)
10 Comaland 100%  (9.5)

Top onefile Demos

1 Nine  (9.7)
2 Layers  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.5)
6 Scan and Spin  (9.5)
7 Onscreen 5k  (9.5)
8 Grey  (9.5)
9 Dawnfall V1.1  (9.5)
10 Rainbow Connection  (9.5)

Top Groups

1 Artline Designs  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Performers  (9.3)
5 Censor Design  (9.3)

Top Webmasters

1 Slaygon  (9.7)
2 Perff  (9.6)
3 Sabbi  (9.5)
4 Morpheus  (9.4)
5 CreaMD  (9.1)

Page generated in: 0.052 sec.