| |
Repose
Registered: Oct 2010 Posts: 225 |
Fast large multiplies
I've discovered some interesting optimizations for multiplying large numbers, if the multiply routine time depends on the bits of the mulitplier. Usually if there's a 1 bit in the multiplier, with a standard shift and add routine, there's a "bit" more time or that bit.
The method uses several ways of transforming the input to have less 1 bits. Normally, if every value appears equally, you average half 1 bits. In my case, that becomes the worst case, and there's about a quarter 1 bits. This can speed up any routine, even the one that happens to be in rom, by using pre- and post- processing of results. The improvement is about 20%.
Another speedup is optimizing the same multiplier applied to multiple multiplicands. This saves a little in processing the multiplier bits once. This can save another 15%.
Using the square table method will be faster but use a lot of data and a lot of code.
Would anyone be interested in this?
|
|
... 144 posts hidden. Click here to view all posts.... |
| |
Repose
Registered: Oct 2010 Posts: 225 |
Yes, that's the one I beat already. |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: Yes, that's the one I beat already.
Interesting approach! I'll try it myself someday! |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Stop distracting me, I've got other stuff I'm meant to be working on! |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: Stop distracting me, I've got other stuff I'm meant to be working on!
Circle closed. You were the first to teach me multiplications using squares. I remember I found you on some forum ages ago. :D |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Oh, sweet. I remember hammering that one out with Stephen Judd on IRC back in the day.
It was a long time ago now, but I think the difference of squares concept was one he introduced me to, I just helped optimise the implementation (I'm fairy sure the truncating divide by 4 was one of my contributions?) |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: Oh, sweet. I remember hammering that one out with Stephen Judd on IRC back in the day.
It was a long time ago now, but I think the difference of squares concept was one he introduced me to, I just helped optimise the implementation (I'm fairy sure the truncating divide by 4 was one of my contributions?)
Yeah maybe. I remember Graham did the implicit add-optimization to it. I.e. self-mod the highbyte with X and then use ,Y in the table lookup. That way the X+Y comes for "free" |
| |
Repose
Registered: Oct 2010 Posts: 225 |
That very idea I always thought was brilliant, because it made a useful calculation with the indexing at the same time as a lookup. As a result, this algorithm is faster than could ever be possible. I call it a 'super' algorithm.
And yes, Judd told me that at the time we were working on the C=Hacking article together.
Anyone remember that coding thread in comp.sys.cbm and later, the z80 vs 6502 'contest' ? |
| |
Frantic
Registered: Mar 2003 Posts: 1648 |
Quote: Yes, that's the one I beat already.
Don't hesitate to write an article, or at least post some source code, at Codebase64. :) Would be much appreciated! |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Repose, did you get around to completing a 32x32->64 routine? I'm trying out some of the ideas from this and the sets of add/sub threads; currently down to around 1300 cycles, though I've still got some optimising to go.
I'm using inx to track carries, but have yet to remove the CLCs after each branch. |
| |
Repose
Registered: Oct 2010 Posts: 225 |
Yes I wrote a 16x16 to test, and it takes 211 cycles (that's avg with equal branches). The one posted on codebase is 231 cycles. I am just testing it now but having some trouble finding a single step debugger that works for me, see thread.
The time to beat for 32x32 is at least 1300 as posted on codebase so you have to keep going...
I can derive the formula for my current version to estimate exact timing of 32x32 I'll have to get back to you in a bit. |
Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ... | 16 - Next |