Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > Fast large multiplies
2012-06-09 19:45
Repose

Registered: Oct 2010
Posts: 225
Fast large multiplies

I've discovered some interesting optimizations for multiplying large numbers, if the multiply routine time depends on the bits of the mulitplier. Usually if there's a 1 bit in the multiplier, with a standard shift and add routine, there's a "bit" more time or that bit.
The method uses several ways of transforming the input to have less 1 bits. Normally, if every value appears equally, you average half 1 bits. In my case, that becomes the worst case, and there's about a quarter 1 bits. This can speed up any routine, even the one that happens to be in rom, by using pre- and post- processing of results. The improvement is about 20%.
Another speedup is optimizing the same multiplier applied to multiple multiplicands. This saves a little in processing the multiplier bits once. This can save another 15%.
Using the square table method will be faster but use a lot of data and a lot of code.
Would anyone be interested in this?

 
... 144 posts hidden. Click here to view all posts....
 
2022-01-03 22:48
JackAsser

Registered: Jun 2002
Posts: 2014
Quote: Thanks for the tip! I hadn't considered it in details. However, why use 2's complement? I was thinking to just keep the sign in a byte. When multiplying, it's just EORing the bytes then. Adding/subtracting could cause an overflow, which is the only case that needs a fix.

I didn't really though much about it and just used the hint from that article. EOR doesn't suffice. EOR+1 though..

And it also depends on if the inputs are both signed (no EOR+1 needed) or if just one of them is.
2022-01-03 23:23
Krill

Registered: Apr 2002
Posts: 2980
Quoting Repose
Jackasser is right, initially I am focused on pure speed, but I realize that smaller options would be useful.
Yeah, the reason i'm asking is drive-code, of course.

I've had a co-processor drive-code maths library on my list for years, and now that the plumbing (proper custom-drive code API in the IRQ loader) is ready, building stuff on top of that (like the saver plug-in) is quite easy.

Demos using the drive to speed up a certain class of algorithms are still quite rare, after all these years, but there may be a flower pot or two to win, still. =)
2022-01-04 05:15
Repose

Registered: Oct 2010
Posts: 225
Quote: Quoting Repose
Jackasser is right, initially I am focused on pure speed, but I realize that smaller options would be useful.
Yeah, the reason i'm asking is drive-code, of course.

I've had a co-processor drive-code maths library on my list for years, and now that the plumbing (proper custom-drive code API in the IRQ loader) is ready, building stuff on top of that (like the saver plug-in) is quite easy.

Demos using the drive to speed up a certain class of algorithms are still quite rare, after all these years, but there may be a flower pot or two to win, still. =)


Cool! I've thought of this as well, and it's always been my dream to make either a RAID array or CPU multiprocessing cluster. I put it aside though, when I realized how slow the communication was. It would work best for something like fractals where you do a lot of loops, with small data. Now, on an expanded drive, especially a 1571, that would be quite useful.
Even further, I thought of more database functions a drive could do, or searching, compressing files.
2022-01-04 10:24
Krill

Registered: Apr 2002
Posts: 2980
Quoting Repose
Cool! I've thought of this as well, and it's always been my dream to make either a RAID array or CPU multiprocessing cluster. I put it aside though, when I realized how slow the communication was. It would work best for something like fractals where you do a lot of loops, with small data. Now, on an expanded drive, especially a 1571, that would be quite useful.
Even further, I thought of more database functions a drive could do, or searching, compressing files.
I was also thinking of a computer cluster, with a raytracing/raycasting application to showcase.
It's trivially parallelisable in theory*, highly scalable on many dimensions, and you can put all kinds of programmable devices on the bus, including a C-128 (or C16/+4, etc.) on the other end. =)

* In practice, there's one big problem i've been thinking on and off about but not solved yet: the bus arbitration.
The computers on either end should not just poll all devices but do work themselves and be notified whenever some other work is completed, but that gives rise to race conditions and other problems.
2022-01-04 13:03
Repose

Registered: Oct 2010
Posts: 225
Quote: Quoting Repose
Cool! I've thought of this as well, and it's always been my dream to make either a RAID array or CPU multiprocessing cluster. I put it aside though, when I realized how slow the communication was. It would work best for something like fractals where you do a lot of loops, with small data. Now, on an expanded drive, especially a 1571, that would be quite useful.
Even further, I thought of more database functions a drive could do, or searching, compressing files.
I was also thinking of a computer cluster, with a raytracing/raycasting application to showcase.
It's trivially parallelisable in theory*, highly scalable on many dimensions, and you can put all kinds of programmable devices on the bus, including a C-128 (or C16/+4, etc.) on the other end. =)

* In practice, there's one big problem i've been thinking on and off about but not solved yet: the bus arbitration.
The computers on either end should not just poll all devices but do work themselves and be notified whenever some other work is completed, but that gives rise to race conditions and other problems.


This is a common problem which has been solved. For ethernet, it's called CSMA/CD. Then there's the method from CAN, the 2 wire car bus, which is a better protocol.
https://copperhilltech.com/blog/controller-area-network-can-bus..
2022-01-04 13:20
Krill

Registered: Apr 2002
Posts: 2980
Quoting Repose
This is a common problem which has been solved. For ethernet, it's called CSMA/CD. Then there's the method from CAN, the 2 wire car bus, which is a better protocol.
I am very well aware of that. But we're still talking about stock C-64 plus original periphery, aren't we?
2022-01-04 15:36
Repose

Registered: Oct 2010
Posts: 225
Sorry I wasn't clear, this CAN thing works on an I2C bus, to put it in simple terms, it works exactly the same way as the IEC bus. You should be able to translate the ideas regardless, there will always be something that works. If I get time I can explain it better. Besides which, the kernal itself does this, there is the c64 as the master and it can negotiate to talk to any number of drives.
2022-01-04 16:29
Krill

Registered: Apr 2002
Posts: 2980
Quoting Repose
If I get time I can explain it better.
Again, no need to. I've worked professionally with I2C and CAN for about a decade.
And those do bus arbitration in hardware, doing this in software efficiently is the problem.

Quoting Repose
Besides which, the kernal itself does this, there is the c64 as the master and it can negotiate to talk to any number of drives.
Afaik, bus arbitration is not a thing with single-master buses, like when only the C-64 acts as the master (the drives cannot assert the ATN line).
Different thing with a multi-master bus when any participant may wish to talk at any time.
2022-01-04 17:34
Repose

Registered: Oct 2010
Posts: 225
Continued in Multi-master on IEC bus
2022-01-07 07:36
Repose

Registered: Oct 2010
Posts: 225
Quoting JackAsser
I didn't really though much about it and just used the hint from that article. EOR doesn't suffice. EOR+1 though..

And it also depends on if the inputs are both signed (no EOR+1 needed) or if just one of them is.


Hi,
I didn't get back to you on this. I didn't mean to EOR the product, I meant to EOR the signs.
X Y P
+ + +
- + -
+ - -
- - +

So if the signs were stored with 0=+ and 1=-, then EORing the signs would give the sign of the result.
When multiplying, there is no fixing of the products; they always stay as a "positive" number and don't lose the sign bit.
Add/sub would be slower, you need to know their relative magnitudes.
Previous - 1 | ... | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
Acidchild/Padua
Unlock/Padua/Albion
csabanw
Scrap/Genesis Project
Guests online: 104
Top Demos
1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.6)
5 Edge of Disgrace  (9.6)
6 What Is The Matrix 2  (9.6)
7 The Demo Coder  (9.6)
8 Uncensored  (9.6)
9 Comaland 100%  (9.6)
10 Wonderland XIV  (9.6)
Top onefile Demos
1 Layers  (9.6)
2 No Listen  (9.6)
3 Party Elk 2  (9.6)
4 Cubic Dream  (9.6)
5 Copper Booze  (9.6)
6 Rainbow Connection  (9.5)
7 Dawnfall V1.1  (9.5)
8 Onscreen 5k  (9.5)
9 Morph  (9.5)
10 Libertongo  (9.5)
Top Groups
1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Triad  (9.3)
5 Censor Design  (9.3)
Top Swappers
1 Derbyshire Ram  (10)
2 Jerry  (9.8)
3 Violator  (9.7)
4 Acidchild  (9.7)
5 Cash  (9.6)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.048 sec.