Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > Fast large multiplies
2012-06-09 19:45
Repose

Registered: Oct 2010
Posts: 225
Fast large multiplies

I've discovered some interesting optimizations for multiplying large numbers, if the multiply routine time depends on the bits of the mulitplier. Usually if there's a 1 bit in the multiplier, with a standard shift and add routine, there's a "bit" more time or that bit.
The method uses several ways of transforming the input to have less 1 bits. Normally, if every value appears equally, you average half 1 bits. In my case, that becomes the worst case, and there's about a quarter 1 bits. This can speed up any routine, even the one that happens to be in rom, by using pre- and post- processing of results. The improvement is about 20%.
Another speedup is optimizing the same multiplier applied to multiple multiplicands. This saves a little in processing the multiplier bits once. This can save another 15%.
Using the square table method will be faster but use a lot of data and a lot of code.
Would anyone be interested in this?

 
... 150 posts hidden. Click here to view all posts....
 
2023-12-14 13:26
Repose

Registered: Oct 2010
Posts: 225
Oh, you definitely have added some great ideas!

However, I researched it before and there's a few people who came up with that idea before any of us, not c64 sceners either.
2024-02-13 08:13
Repose

Registered: Oct 2010
Posts: 225
I've finally published my new fastest routine, 2023 version. It's even faster now, 187.1.
https://codebase64.org/doku.php?id=base:fastest_multiplication_..
And independently benchmarked here
https://github.com/tobyLobster/multiply_test/?tab=readme-ov-fil..
Both faster and shorter, so not bad.
2024-02-13 13:37
ChristopherJam

Registered: Aug 2004
Posts: 1409
Nice work!
2024-02-13 20:58
Repose

Registered: Oct 2010
Posts: 225
Of course, only now do I realize a dumb trick:

x1 = p_sqr_lo

umult16
; set multiplier as x1
lda x1
;sta p_sqr_lo
sta p_sqr_hi
eor #$ff
sta p_neg_sqr_lo
sta p_neg_sqr_hi

; set multiplier as x0
; *x1 is no longer preserved

To easily save 3 cycles and 1 byte zp.
Going a bit further, if you add 6 zp, using two sets of pointers, then x0 and x1 can be p_sqr_lo1 and p_sqr_lo2. And one more step, make the caller set up both pointers, but that's a bit more difficult as we can't assume the caller can use A for the calculation eor #$ff (yes, A will be trashed when you call the umult16, but I mean when the caller has the other half of x_ ready).
So, save 3, 6 or 12 cycles.
One more hack is passing y0 in Y, but in the worst case, that's not convenient to the caller, if it happens to be in X, then it's easier to stx y0 rather than txa:tay.

So, I would say saving 3 or 6 cycles are good options, and 3 for sure. That brings the routine down to 180.1 :) Saving 14.5 cycles and being 10% faster definitely makes me happy, losing another 6 bytes zp not so much.
Previous - 1 | ... | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
Flashback
Holy Moses/Role
Beast/Crescent
Lead/House Designs
Technotron/I-I F
sachy/BOOM!
DeeKay/Crest
Guests online: 123
Top Demos
1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.6)
5 Edge of Disgrace  (9.6)
6 What Is The Matrix 2  (9.6)
7 The Demo Coder  (9.6)
8 Uncensored  (9.6)
9 Comaland 100%  (9.6)
10 Wonderland XIV  (9.6)
Top onefile Demos
1 Layers  (9.6)
2 No Listen  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.6)
6 Rainbow Connection  (9.5)
7 Dawnfall V1.1  (9.5)
8 Onscreen 5k  (9.5)
9 Morph  (9.5)
10 Libertongo  (9.5)
Top Groups
1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Triad  (9.3)
5 Censor Design  (9.3)
Top Graphicians
1 Mirage  (9.8)
2 Archmage  (9.7)
3 Pal  (9.6)
4 Carrion  (9.6)
5 Sulevi  (9.6)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.06 sec.