Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > Fast large multiplies
2012-06-09 19:45
Repose

Registered: Oct 2010
Posts: 225
Fast large multiplies

I've discovered some interesting optimizations for multiplying large numbers, if the multiply routine time depends on the bits of the mulitplier. Usually if there's a 1 bit in the multiplier, with a standard shift and add routine, there's a "bit" more time or that bit.
The method uses several ways of transforming the input to have less 1 bits. Normally, if every value appears equally, you average half 1 bits. In my case, that becomes the worst case, and there's about a quarter 1 bits. This can speed up any routine, even the one that happens to be in rom, by using pre- and post- processing of results. The improvement is about 20%.
Another speedup is optimizing the same multiplier applied to multiple multiplicands. This saves a little in processing the multiplier bits once. This can save another 15%.
Using the square table method will be faster but use a lot of data and a lot of code.
Would anyone be interested in this?

 
... 144 posts hidden. Click here to view all posts....
 
2017-03-29 10:21
Repose

Registered: Oct 2010
Posts: 225
Yes, that's the one I beat already.
2017-03-29 11:46
JackAsser

Registered: Jun 2002
Posts: 2014
Quote: Yes, that's the one I beat already.

Interesting approach! I'll try it myself someday!
2017-03-29 12:03
ChristopherJam

Registered: Aug 2004
Posts: 1409
Stop distracting me, I've got other stuff I'm meant to be working on!
2017-03-29 12:22
JackAsser

Registered: Jun 2002
Posts: 2014
Quote: Stop distracting me, I've got other stuff I'm meant to be working on!

Circle closed. You were the first to teach me multiplications using squares. I remember I found you on some forum ages ago. :D
2017-03-29 13:46
ChristopherJam

Registered: Aug 2004
Posts: 1409
Oh, sweet. I remember hammering that one out with Stephen Judd on IRC back in the day.

It was a long time ago now, but I think the difference of squares concept was one he introduced me to, I just helped optimise the implementation (I'm fairy sure the truncating divide by 4 was one of my contributions?)
2017-03-29 14:41
JackAsser

Registered: Jun 2002
Posts: 2014
Quote: Oh, sweet. I remember hammering that one out with Stephen Judd on IRC back in the day.

It was a long time ago now, but I think the difference of squares concept was one he introduced me to, I just helped optimise the implementation (I'm fairy sure the truncating divide by 4 was one of my contributions?)


Yeah maybe. I remember Graham did the implicit add-optimization to it. I.e. self-mod the highbyte with X and then use ,Y in the table lookup. That way the X+Y comes for "free"
2017-03-29 15:34
Repose

Registered: Oct 2010
Posts: 225
That very idea I always thought was brilliant, because it made a useful calculation with the indexing at the same time as a lookup. As a result, this algorithm is faster than could ever be possible. I call it a 'super' algorithm.

And yes, Judd told me that at the time we were working on the C=Hacking article together.

Anyone remember that coding thread in comp.sys.cbm and later, the z80 vs 6502 'contest' ?
2017-03-29 15:48
Frantic

Registered: Mar 2003
Posts: 1648
Quote: Yes, that's the one I beat already.

Don't hesitate to write an article, or at least post some source code, at Codebase64. :) Would be much appreciated!
2017-04-07 22:12
ChristopherJam

Registered: Aug 2004
Posts: 1409
Repose, did you get around to completing a 32x32->64 routine? I'm trying out some of the ideas from this and the sets of add/sub threads; currently down to around 1300 cycles, though I've still got some optimising to go.

I'm using inx to track carries, but have yet to remove the CLCs after each branch.
2017-04-08 03:52
Repose

Registered: Oct 2010
Posts: 225
Yes I wrote a 16x16 to test, and it takes 211 cycles (that's avg with equal branches). The one posted on codebase is 231 cycles. I am just testing it now but having some trouble finding a single step debugger that works for me, see thread.

The time to beat for 32x32 is at least 1300 as posted on codebase so you have to keep going...

I can derive the formula for my current version to estimate exact timing of 32x32 I'll have to get back to you in a bit.
Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ... | 16 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
Flashback
MWR/Visdom
Sulevi/Virtual Dreams
A3/AFL
GI-Joe/MYD!
t0m3000/hf^boom!^ibx
Freeze/Blazon
Fred/Channel 4
Mr. Sex/Byterapers
Guests online: 107
Top Demos
1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.6)
5 Edge of Disgrace  (9.6)
6 What Is The Matrix 2  (9.6)
7 The Demo Coder  (9.6)
8 Uncensored  (9.6)
9 Comaland 100%  (9.6)
10 Wonderland XIV  (9.6)
Top onefile Demos
1 Layers  (9.6)
2 No Listen  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.6)
6 Rainbow Connection  (9.5)
7 Dawnfall V1.1  (9.5)
8 Onscreen 5k  (9.5)
9 Morph  (9.5)
10 Libertongo  (9.5)
Top Groups
1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Triad  (9.3)
5 Censor Design  (9.3)
Top NTSC-Fixers
1 Pudwerx  (10)
2 Booze  (9.7)
3 Stormbringer  (9.7)
4 Fungus  (9.6)
5 Grim Reaper  (9.3)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.044 sec.