Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
 Welcome to our latest new user CORE321-VPN ! (Registered 2018-05-25) You are not logged in 
CSDb User Forums

Forums > C64 Coding > Fast large multiplies
2012-06-09 19:45

Registered: Oct 2010
Posts: 138
Fast large multiplies

I've discovered some interesting optimizations for multiplying large numbers, if the multiply routine time depends on the bits of the mulitplier. Usually if there's a 1 bit in the multiplier, with a standard shift and add routine, there's a "bit" more time or that bit.
The method uses several ways of transforming the input to have less 1 bits. Normally, if every value appears equally, you average half 1 bits. In my case, that becomes the worst case, and there's about a quarter 1 bits. This can speed up any routine, even the one that happens to be in rom, by using pre- and post- processing of results. The improvement is about 20%.
Another speedup is optimizing the same multiplier applied to multiple multiplicands. This saves a little in processing the multiplier bits once. This can save another 15%.
Using the square table method will be faster but use a lot of data and a lot of code.
Would anyone be interested in this?

... 77 posts hidden. Click here to view all posts....
2017-04-12 08:42

Registered: Aug 2004
Posts: 786

I test the multiply using a set of 20 randomly selected 32 bit integers, comparing the product of each to a known result, and I time each multiply using CIA#2 with screen blanked and IRQs disabled. So, it's not comprehensive, but I'm reasonably confident.

Yes, generalising for a set of operand widths would probably be quite handy. Good idea instrumenting the generator to calculate the likely execution time. I've been meaning to do something similar with another project..

The post fixup is fine if it's the last addition performed, then the carry from that addition can propagate through to the next column.

Do you mean 65816? I started learning that once, but SuperCPU never really took off, and I tend to ignore most platforms between C64 and modern hardware these days, at least for hobby coding. REU last year was already something of a leap for me.
2017-04-12 12:14

Registered: Oct 2010
Posts: 138
Btw, one more idea: squaring and cubing based on this can be optimized significantly as well.
2017-04-13 08:42

Registered: Oct 2010
Posts: 138
Test your routine with these magic numbers:
00 00 54 56
00 00 03 03

If my theory is correct, that's the only case that will fail (not really, just the only one I'm bothering to solve for).
It's quite a subtle bug and random values won't catch it, you need 'whitebox' testing.

The failure is rare and of the form that the high byte is higher by 1.
2017-04-13 09:47

Registered: Aug 2004
Posts: 786
Still works :)

Thanks for the test case.

Here are the first two columns of code (and remember that my g(x)=0x4000-f(x-255) ):
    ldy mT2+0
    lda (zp_fl0),y
    adc (zp_gl0),y
    sta mRes+0

    lda (zp_fh0),y
    adc (zp_gh0),y
    bcc *+3
    adc (zp_fl1),y
    bcc *+3
    adc (zp_gl1),y
    bcc *+3
    ldy mT2+1
    adc (zp_fl0),y
    bcc *+3
    adc (zp_gl0),y
    bcc *+3
    sbc id+$3f,x
    sta mRes+1

The inverse borrow from the final SBC carries forward to the next column; the SBC itself corrects for the false carries while also compensating for the excess 0x40 in the high byte of the g() table.
2017-04-13 10:18

Registered: Oct 2010
Posts: 138
Oh, I know why it works, I constructed those special values for the normal sense, I mean
   54 56
   03 03
   01 xx
00 ff
00 ff

The whole point was to get those 3 partials to be added, ff+ff+01. Where you are adding with offset, I have to construct the multipliers differently. Not only that, but I'm doubly wrong here - I need to find multipliers which cause the f(x)'s to result that way (where my example works only on the production of f()-g()).

I'll have to finish this later. In the meantime, I suggest you test every possible 16x16. Not so easy I know, I had to write such things in a 6502 simulator in C, actually just simulated the few instructions I needed, but there's a source code out there you could use for a full simulator.
2017-04-13 13:24

Registered: Aug 2004
Posts: 786
I'm going to have to think some more about how to synthesise the equivalent test case.

I did start coding an exhaustive test in 6502 (can determine the required result just by repeated adds; a*(b+1)=a*b+b), then realised it wasn't 2**16 test cases but 2*32. Even at 30x realtime that would take VICE 28 hours assuming 700 cycles per iteration..
2017-04-14 00:55

Registered: Oct 2010
Posts: 138
00 01 02 03 * 04 05 06 07 and manipulate the tables to what you want to test adds for every branch, and number of carries per column up to 14, think that should do it.
2017-04-14 15:15

Registered: Aug 2004
Posts: 786
Had a thought this morning - the difference of squares is already well established, the only thing that really needs testing is the carry handling for each column. I'll post about that over at sets of add/sub shortly.
2017-04-14 20:36

Registered: Oct 2010
Posts: 138
That's basically what I just said - multiplying is just adding from a table. Test coverage would include each carry and each amount of carries per column.
2017-04-15 06:35

Registered: Aug 2004
Posts: 786
Quote: That's basically what I just said - multiplying is just adding from a table. Test coverage would include each carry and each amount of carries per column.

Fair point - I guess I got distracted by your talk of table manipulation.

Posting some analysis of the individual carries in the other thread shortly.

But back to multiplies - I was curious as to how you got away with not offsetting the g() table, then it finally struck me - using SBC instead of ADC is exactly equivalent to doing an ADC of a $ffff-g() table.

Do you have working code yet? I would expect you too need a different offset for each column.
Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Users Online
Knight Rider/Excess
Guests online: 23
Top Demos
1 Uncensored  (9.7)
2 Edge of Disgrace  (9.7)
3 Coma Light 13  (9.6)
4 The Shores of Reflec..  (9.6)
5 Comaland 100%  (9.6)
6 We Come in Peace  (9.6)
7 Lunatico  (9.6)
8 Incoherent Nightmare  (9.5)
9 Wonderland XII  (9.5)
10 Wonderland XIII  (9.5)
Top onefile Demos
1 FMX Music Demo  (9.5)
2 Pandemoniac Part 2 o..  (9.5)
3 Daah, Those Acid Pil..  (9.5)
4 Dawnfall V1.1  (9.5)
5 Treu Love [reu]  (9.5)
6 In Memoriam BHF  (9.5)
7 Merry Xmas 2017  (9.4)
8 Dawnfall  (9.4)
9 SWiRL  (9.4)
10 Synthesis  (9.4)
Top Groups
1 Oxyron  (9.4)
2 Booze Design  (9.4)
3 Censor Design  (9.4)
4 Finnish Gold  (9.4)
5 Crest  (9.3)
Top NTSC-Fixers
1 Pudwerx  (10)
2 Horizon  (9.8)
3 The Mind Slayer  (9.7)
4 The Shadow  (9.7)
5 Stormbringer  (9.6)

Home - Disclaimer
Copyright © No Name 2001-2018
Page generated in: 0.054 sec.