Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > Fast large multiplies
2012-06-09 19:45
Repose

Registered: Oct 2010
Posts: 227
Fast large multiplies

I've discovered some interesting optimizations for multiplying large numbers, if the multiply routine time depends on the bits of the mulitplier. Usually if there's a 1 bit in the multiplier, with a standard shift and add routine, there's a "bit" more time or that bit.
The method uses several ways of transforming the input to have less 1 bits. Normally, if every value appears equally, you average half 1 bits. In my case, that becomes the worst case, and there's about a quarter 1 bits. This can speed up any routine, even the one that happens to be in rom, by using pre- and post- processing of results. The improvement is about 20%.
Another speedup is optimizing the same multiplier applied to multiple multiplicands. This saves a little in processing the multiplier bits once. This can save another 15%.
Using the square table method will be faster but use a lot of data and a lot of code.
Would anyone be interested in this?

 
... 145 posts hidden. Click here to view all posts....
 
2023-08-22 06:09
Repose

Registered: Oct 2010
Posts: 227
A diagram of the calculations:
               y1    y0
            x  x1    x0
               --------
            x0y0h x0y0l
      x0y1h x0y1l
      x1y0h x1y0l
x1y1h x1y1l
-----------------------
   z3    z2    z1    z0


A listing of the macros is quite long, as a lot of effort went into parsing the arguments the way I'm using them, plus some other features in my libary. Requires the latest C64 Studio 7.5, as there were a number of features I had added. Here is one of them:
!macro mult8_set_mier_snippet .mier {
;set multiplier as mier
;mier can be m/A
;requires .p_.sqr* and .p_.neg.sqr* pointers,
;(mathlib.p_sqr_lo, mathlib.p_neg_sqr_lo, mathlib.p_sqr_hi, mathlib.p_neg_sqr_hi)
;uses these macros from mathlib_util_macros.asm:
;reset_cycles, add_cycles_const
  !if .mier = "X" or .mier = "Y" {
    !error "multlib: mult8_snippet: mier must be m/A, not ", .mier
  } 
  +reset_cycles
  !if .mier != "A" {
    lda .mier
    !if .mier<$100 {
        +add_cycles_const 3
      } else {
        +add_cycles_const 4
      }
  }
  sta mathlib.p_sqr_lo ;3
  sta mathlib.p_sqr_hi ;3
  eor #$ff ;2
  sta mathlib.p_neg_sqr_lo ;3
  sta mathlib.p_neg_sqr_hi ;3; 3+3+3+2+3+3 = 17 cycles
  +add_cycles_const 14
}

Here is the order of multiplies I used:
mier = x1
(x1) * y0 -> x1y0l, x1y0h
(x1) * y1 -> x1y1l, z3
mier = x0
(x0) * (y1) -> x0y1l, X
(x0) * y0 -> z0, A
2023-08-22 13:32
Frantic

Registered: Mar 2003
Posts: 1661
Very good!
2023-08-23 04:05
ChristopherJam

Registered: Aug 2004
Posts: 1423
Nice work, Repose!

Going to have to take another run at this one of these days..
2023-12-11 09:35
Repose

Registered: Oct 2010
Posts: 227
I can now support my claim with some confidence that I have achieved a world record for a published 16x16=32 unsigned mult.

Even my old routine has won out of 120 routines tested independently.

Take a look at the entry mult15.a here:
https://github.com/TobyLobster/multiply_test#the-results

This was a thrilling moment for me :)

But in general, its interesting to read about the various approaches, and to be fair, if you need a smaller routine, there's better choices.
2023-12-11 10:00
Raistlin

Registered: Mar 2007
Posts: 757
Very nice write up, Repose!
2023-12-11 12:51
Frantic

Registered: Mar 2003
Posts: 1661
Quote: I can now support my claim with some confidence that I have achieved a world record for a published 16x16=32 unsigned mult.

Even my old routine has won out of 120 routines tested independently.

Take a look at the entry mult15.a here:
https://github.com/TobyLobster/multiply_test#the-results

This was a thrilling moment for me :)

But in general, its interesting to read about the various approaches, and to be fair, if you need a smaller routine, there's better choices.


Congratulations! I totally get that it must have been thrilling. :D
2023-12-11 13:57
Krill

Registered: Apr 2002
Posts: 3083
Quoting Repose
The time is 188.1 cycles, averaged over all possible inputs.
Pretty impressive. :)

It seems that most of the required memory of about 2 KB is spent by tables (which are generated), not so much the code itself, right?

If so, i guess this 16x16=32 routine could come in quite handy for something like https://github.com/exoticorn/upkr which still doesn't seem to have a C-64/plain 6502 unpacker (while 6502 unpackers using hardware-accelerated mul exist).
2023-12-11 14:09
Repose

Registered: Oct 2010
Posts: 227
The code seems to be 120 bytes and the tables are 2044 bytes. I think I need to add some 30 more to create the tables.
2023-12-12 09:31
JackAsser

Registered: Jun 2002
Posts: 2038
Very very nice write up @Repose!
2023-12-12 17:47
Repose

Registered: Oct 2010
Posts: 227
Bruh! You were my hero before I started this :) You held the record before me, and still we are both way ahead of most posts and published books. Respects :)
Previous - 1 | ... | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
iAN CooG/HVSC
Scrap/Genesis Project
JEZ
Enforcer/Deers
Didi/Laxity
master_hacker
Guests online: 219
Top Demos
1 Next Level  (9.7)
2 13:37  (9.7)
3 Codeboys & Endians  (9.7)
4 Mojo  (9.6)
5 Coma Light 13  (9.6)
6 Edge of Disgrace  (9.6)
7 Signal Carnival  (9.6)
8 Uncensored  (9.5)
9 Wonderland XIV  (9.5)
10 No Bounds  (9.5)
Top onefile Demos
1 Nine  (9.7)
2 Layers  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.5)
6 Scan and Spin  (9.5)
7 Onscreen 5k  (9.5)
8 Grey  (9.5)
9 Dawnfall V1.1  (9.5)
10 Rainbow Connection  (9.5)
Top Groups
1 Artline Designs  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Performers  (9.3)
5 Censor Design  (9.3)
Top Crackers
1 Mr. Z  (9.9)
2 OTD  (9.8)
3 Antitrack  (9.8)
4 Fungus  (9.8)
5 S!R  (9.8)

Home - Disclaimer
Copyright © No Name 2001-2025
Page generated in: 0.291 sec.