You are not logged in -
nap
CSDb User Forums
Forums
>
C64 Coding
>
Fast large multiplies
2012-06-09
19:45
Repose
Registered: Oct 2010
Posts: 225
Fast large multiplies
I've discovered some interesting optimizations for multiplying large numbers, if the multiply routine time depends on the bits of the mulitplier. Usually if there's a 1 bit in the multiplier, with a standard shift and add routine, there's a "bit" more time or that bit.
The method uses several ways of transforming the input to have less 1 bits. Normally, if every value appears equally, you average half 1 bits. In my case, that becomes the worst case, and there's about a quarter 1 bits. This can speed up any routine, even the one that happens to be in rom, by using pre- and post- processing of results. The improvement is about 20%.
Another speedup is optimizing the same multiplier applied to multiple multiplicands. This saves a little in processing the multiplier bits once. This can save another 15%.
Using the square table method will be faster but use a lot of data and a lot of code.
Would anyone be interested in this?
... 150 posts hidden. Click
here
to view all posts....
2023-12-14
13:26
Repose
Registered: Oct 2010
Posts: 225
Oh, you definitely have added some great ideas!
However, I researched it before and there's a few people who came up with that idea before any of us, not c64 sceners either.
2024-02-13
08:13
Repose
Registered: Oct 2010
Posts: 225
I've finally published my new fastest routine, 2023 version. It's even faster now, 187.1.
https://codebase64.org/doku.php?id=base:fastest_multiplication_..
And independently benchmarked here
https://github.com/tobyLobster/multiply_test/?tab=readme-ov-fil..
Both faster and shorter, so not bad.
2024-02-13
13:37
ChristopherJam
Registered: Aug 2004
Posts: 1409
Nice work!
2024-02-13
20:58
Repose
Registered: Oct 2010
Posts: 225
Of course, only now do I realize a dumb trick:
x1 = p_sqr_lo
umult16
; set multiplier as x1
lda x1
;sta p_sqr_lo
sta p_sqr_hi
eor #$ff
sta p_neg_sqr_lo
sta p_neg_sqr_hi
; set multiplier as x0
; *x1 is no longer preserved
To easily save 3 cycles and 1 byte zp.
Going a bit further, if you add 6 zp, using two sets of pointers, then x0 and x1 can be p_sqr_lo1 and p_sqr_lo2. And one more step, make the caller set up both pointers, but that's a bit more difficult as we can't assume the caller can use A for the calculation eor #$ff (yes, A will be trashed when you call the umult16, but I mean when the caller has the other half of x_ ready).
So, save 3, 6 or 12 cycles.
One more hack is passing y0 in Y, but in the worst case, that's not convenient to the caller, if it happens to be in X, then it's easier to stx y0 rather than txa:tay.
So, I would say saving 3 or 6 cycles are good options, and 3 for sure. That brings the routine down to 180.1 :) Saving 14.5 cycles and being 10% faster definitely makes me happy, losing another 6 bytes zp not so much.
Previous
-
1
| ... |
6
|
7
|
8
|
9
|
10
|
11
|
12
|
13
|
14
|
15
| 16 - Next
Refresh
Subscribe to this thread:
You need to be logged in to post in the forum.
Search the forum:
Search
All forums
C64 Coding
C64 Composing
C64 Pixeling
C64 Productions
CSDb Bug Reports
CSDb Development
CSDb Discussions
CSDb Entries
CSDb Feedback
CSDb Info
CSDb moderators
CSDb Questions
Messages to moderators
Requests
for
in
Writer & text
Text
Writer
All times are CET.
Search CSDb
All
Releases
Groups
Sceners
Events
BBS
SIDs
-------
Forum
Comments
Advanced
Users Online
Flashback
Holy Moses/Role
Beast/Crescent
Lead/House Designs
Technotron/I-I F
sachy/BOOM!
DeeKay/Crest
Guests online: 123
Top Demos
1
Next Level
(9.7)
2
13:37
(9.7)
3
Mojo
(9.7)
4
Coma Light 13
(9.6)
5
Edge of Disgrace
(9.6)
6
What Is The Matrix 2
(9.6)
7
The Demo Coder
(9.6)
8
Uncensored
(9.6)
9
Comaland 100%
(9.6)
10
Wonderland XIV
(9.6)
Top onefile Demos
1
Layers
(9.6)
2
No Listen
(9.6)
3
Cubic Dream
(9.6)
4
Party Elk 2
(9.6)
5
Copper Booze
(9.6)
6
Rainbow Connection
(9.5)
7
Dawnfall V1.1
(9.5)
8
Onscreen 5k
(9.5)
9
Morph
(9.5)
10
Libertongo
(9.5)
Top Groups
1
Performers
(9.3)
2
Booze Design
(9.3)
3
Oxyron
(9.3)
4
Triad
(9.3)
5
Censor Design
(9.3)
Top Graphicians
1
Mirage
(9.8)
2
Archmage
(9.7)
3
Pal
(9.6)
4
Carrion
(9.6)
5
Sulevi
(9.6)
Home
-
Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.06 sec.