Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > Fast large multiplies
2012-06-09 19:45
Repose

Registered: Oct 2010
Posts: 225
Fast large multiplies

I've discovered some interesting optimizations for multiplying large numbers, if the multiply routine time depends on the bits of the mulitplier. Usually if there's a 1 bit in the multiplier, with a standard shift and add routine, there's a "bit" more time or that bit.
The method uses several ways of transforming the input to have less 1 bits. Normally, if every value appears equally, you average half 1 bits. In my case, that becomes the worst case, and there's about a quarter 1 bits. This can speed up any routine, even the one that happens to be in rom, by using pre- and post- processing of results. The improvement is about 20%.
Another speedup is optimizing the same multiplier applied to multiple multiplicands. This saves a little in processing the multiplier bits once. This can save another 15%.
Using the square table method will be faster but use a lot of data and a lot of code.
Would anyone be interested in this?

 
... 144 posts hidden. Click here to view all posts....
 
2021-12-15 11:59
Repose

Registered: Oct 2010
Posts: 225
Me too, just yesterday!
https://media.digikey.com/pdf/Data%20Sheets/Parallax%20PDFs/uM-..

The uM-FPU is a 32-bit floating point coprocessor that can be easily interfaced with microcontrollers to provide
support for 32-bit IEEE 754 floating point operations and long integer operations. The uM-FPU is easy to
connect using either an I2C or SPI compatible interface.

Which can be connected to the user port. I believe the serial port would work, then it's 16 cycles per byte.
2021-12-15 13:23
Krill

Registered: Apr 2002
Posts: 2980
The only acceptable arithmetics accelerator is the 6502 in the disk drive. =)
2021-12-15 15:18
Martin Piper

Registered: Nov 2007
Posts: 722
Quote: Me too, just yesterday!
https://media.digikey.com/pdf/Data%20Sheets/Parallax%20PDFs/uM-..

The uM-FPU is a 32-bit floating point coprocessor that can be easily interfaced with microcontrollers to provide
support for 32-bit IEEE 754 floating point operations and long integer operations. The uM-FPU is easy to
connect using either an I2C or SPI compatible interface.

Which can be connected to the user port. I believe the serial port would work, then it's 16 cycles per byte.


I was thinking more like a MC68882 :)
But finding something similar operating at 5V, with parallel address/data rather than serial and still in production, is tough.
2021-12-15 19:52
Repose

Registered: Oct 2010
Posts: 225
Some kind of extra SPI to parallel circuit would help. Since the chip I mentioned has 4MHz serial speed, it's fast enough to read a parallel port at full speed.
https://highfieldtales.wordpress.com/2011/12/04/very-fast-spi-t..

In the end, what do we want though? You could make a GPU cart for the c64, which runs it's own 2d/3d graphic engine, where you simply define the objects. The motions, display updates, collisions, could all be handled automatically, all you need to to keep score, play music, run the display mode, and handle joystick. Just like Blue REU, the output could be stuffed directly to display memory using an advanced FLI mode with practically full color. It would be like a brand new game machine in 160x200x16.

Or, it could be an assist, using memory mapped registers as you describe, and leave the c64 more in control.

I think of this a lot - do I even want to buy a c64 ultimate? It's convenient and fun to play demos and games with full power, but I think I still want to do obscure hardware hacks and real assists. For example, I've always wanted to use the CIA to measure variations in 50/60 Hz power line frequency. I want to attach the serial port to an SD card and bit bang the storage.
2021-12-15 20:27
tlr

Registered: Sep 2003
Posts: 1790
Quoting Repose
Some kind of extra SPI to parallel circuit would help. Since the chip I mentioned has 4MHz serial speed, it's fast enough to read a parallel port at full speed.
https://highfieldtales.wordpress.com/2011/12/04/very-fast-spi-t..

You do already have two CIA synchronous serial ports though, maybe they could be used?
2021-12-16 00:53
Repose

Registered: Oct 2010
Posts: 225
That's what I suggested earlier - just to be clear, Martin wanted a fast, efficient ALU ability by writing arguments to memory, which would be latched until the LSB were written, then immediately read the product back. I digressed a little by suggesting a GPU cart, which takes the idea of assisted computing to it's logical conclusion, and I felt it's a bit pointless by then.

I then suggested to use the serial port to communicate with the SPI port of a proposed ALU (which is still in stock). This would be a bit slow, as I think the SP port could only clock out in 16 cycles.

To overcome that difficulty, one could use a circuit or chip to clock at the full 4MHz of the ALU chip, and leave the result at the parallel port of the CIA, on the Userport, somehow.

To be fair, many functions can be done fairly quickly with large tables, using either the Ramlink or REU, and I wrote sample code for both. There are also methods documented at https://wilsonminesco.com/16bitMathTables/
Large Look-up Tables for Hyperfast, Accurate, 16-Bit Scaled-Integer Math
Plus: The Unexpected Power of Scaled-Integer Math
2021-12-16 11:54
tlr

Registered: Sep 2003
Posts: 1790
Quoting Repose
That's what I suggested earlier - just to be clear, Martin wanted a fast, efficient ALU ability by writing arguments to memory, which would be latched until the LSB were written, then immediately read the product back. I digressed a little by suggesting a GPU cart, which takes the idea of assisted computing to it's logical conclusion, and I felt it's a bit pointless by then.

I then suggested to use the serial port to communicate with the SPI port of a proposed ALU (which is still in stock). This would be a bit slow, as I think the SP port could only clock out in 16 cycles.

I obviously should have read more of the thread. :)

I think it'll take 32 cycles. 250kbit/s max IIRC.
2021-12-16 14:59
Martin Piper

Registered: Nov 2007
Posts: 722
I was thinking a 74LS165 or MCP23S08 might be able to talk to the uM-FPU 32-bit floating point coprocessor and provide an interface to the C64. Either via user port or cartridge port.
2021-12-31 12:15
Repose

Registered: Oct 2010
Posts: 225
Regarding convolutions, I remembered the term I was looking for. There is a fast convolution credited to Winograd. It works out to a lot of additions. I think it might be slow. At the end, I think you have to reduce overflowed digits into a mod plus carry to the next digit.
2022-01-03 16:53
Repose

Registered: Oct 2010
Posts: 225
I've been working hard on my mathlib. I think I've worked out all the use cases and made completely flexible macros that work together in a system, almost like a macro language. These can produce fully optimal code in an easy way.
Covered:
-squaring
-constant multipliers
-re-using multipliers
-distributed multiplication (a*(b,c))
-chained multiplication (a*b*c)
-zp and self-mod versions; and they are the same speed
-use cases for multiple pointers
-cycle timing variables that respond to all variations
-automatic carry flag tracking/setting
-custom table generation for built-in constant add/multiply
-low-precision outputs with optional rounding

To do:
-update test program
-verify the 32x32
-add signed and other versions
-squaring
-polynomials

At least a few more weeks to release something I'm satisfied with :)
I also discovered xc_basic and I'd like to add my library to their source, which is using a simple shift/add loop.
Previous - 1 | ... | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
Steffan/BOOM!
Mike
iAN CooG/HVSC
deetsay
Twilight/Excess/Arcade
XmikeX
Airwolf/F4CG
Copyfault/Extend^tsn..
Walt/Bonzai
New Design/Excess
Linus/MSL
Bansai/BSILabs
Guests online: 145
Top Demos
1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.6)
5 Edge of Disgrace  (9.6)
6 What Is The Matrix 2  (9.6)
7 The Demo Coder  (9.6)
8 Uncensored  (9.6)
9 Comaland 100%  (9.6)
10 Wonderland XIV  (9.6)
Top onefile Demos
1 Layers  (9.6)
2 No Listen  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.6)
6 Dawnfall V1.1  (9.5)
7 Rainbow Connection  (9.5)
8 Onscreen 5k  (9.5)
9 Morph  (9.5)
10 Libertongo  (9.5)
Top Groups
1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Triad  (9.3)
5 Censor Design  (9.3)
Top Musicians
1 Rob Hubbard  (9.7)
2 Mutetus  (9.7)
3 Jeroen Tel  (9.7)
4 Linus  (9.6)
5 Stinsen  (9.6)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.064 sec.