[CSDb] - User Forums - Fast division with constant dividend?

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Fast division with constant dividend?

2021-06-29 05:25

Strobe

Registered: Jun 2007
Posts: 6

Fast division with constant dividend?

Is there some trick way to quickly do division with a constant dividend?

i.e. result = constant / variable

There's oodles of info out there on dividing _by_ a constant but this is the opposite.

In my case my input variable is <= 14bits and useful values of output after clamping are <=8bits (7bits actually) so I _could_ burn 16KB RAM on a lookup table but I'd rather not..

Thanks.

2021-06-29 11:51

ChristopherJam

Registered: Aug 2004
Posts: 1378

Hrm. One of the things that makes constant divisor tractable is it's basically a multiply in disguise. Your question however still needs the reciprocal computed.

If you only need a 7bit result, that's only going to depend on the seven or eight bits below the most significant '1' bit in the divisor, and the position of that bit.. so you might be able to use a few one page tables.

Do you have some examples of the sorts of constants and input ranges you're looking at?

2021-06-29 12:03

chatGPZ

Registered: Dec 2001
Posts: 11108

I'd check if you cant transform into fixedpoint multiplication (and use the usual "very fast multiply" with tables of squares)

2021-06-29 12:03

Krill

Registered: Apr 2002
Posts: 2839

Afaik, apart from a lookup-table, there is no general optimisation for 1/N, regardless of a scaling factor (your constant to divide).

Depending on your desired accuracy, there are a number of possible approaches with smaller lookup tables than 16 KB and possibly some multiplication (which does have more optimisation potential), including hybrids.

E.g., you could use the top 8 bits of your variable for a table lookup (256 bytes), then linearly interpolate between the lookup result and the next entry in the table by scaling the low bits of your argument (those not used for the lookup) into that range and adding this to the looked-up value.

Then there might be some other approaches based on log/exp, such as using the argument's top-bit to scale the result from a smaller div lookup table with shifting.

Question is if any of these or similar approaches would be faster than your current solution.

2021-06-29 14:09

Peiselulli

Registered: Oct 2006
Posts: 81

Another option:
Build a table with 128 entries, each repesrenting one of the possible results. Each of this 128 entries represents the lowest value that results to its corresponding 7 bit value. Use "binsearch" with "variable" to the get the right index between this boundaries stored in the table. Maye the upper 6 bits of your 14 bit value can be used to speed up the binsearch with another small table.

2021-06-29 14:50

Strobe

Registered: Jun 2007
Posts: 6

Thanks cjam / gpz / krill

Constant is currently ~15000 but can change at compile time, variable is literally anything from 0 to ~$3fff.

A set of tables based on the significant bit is something that I haven't tried, I'll give that a go. Should be 1.75k or so of tables for 0-6 significant bits instead of 16k. Might even be able to get away with only 7 bits of the lower byte as significant to half the table size again.

I'm not sure interpolating is going to work, as the input values approach zero you wind up with a broader range of valid output values, so even using 8 bits of the input as an index leaves a lot of potentially valid values to interpolate through, and other than a straight average with adds/shifts that sounds like more maths which is the original problem..

My current "better but not great" approach is linearly searching a table of the thresholds of each resultant value, using the high byte of the input as an index to the starting point. Needs about 320 bytes of tables. But has a worst case searching ~60 values. That seemed like a really ugly hack and I thought there must be a faster and more elegant solution... will try cjam's set of tables one.

2021-06-29 16:09

Krill

Registered: Apr 2002
Posts: 2839

Quoting Strobe

I'm not sure interpolating is going to work, as the input values approach zero you wind up with a broader range of valid output values, so even using 8 bits of the input as an index leaves a lot of potentially valid values to interpolate through, and other than a straight average with adds/shifts that sounds like more maths which is the original problem..

Not sure whether i got my point across there. :)

I was proposing this:

Consider the near-zero divisor cases you mentioned, such as
15000 / 0x100 = 58 (floored) through
15000 / 0x1ff = 29 (floored), and also
15000 / 0x200 = 29 (floored).

Our final result will be anywhere between 58 (table[1]) and 29 (table[2]), depending on the divisor's low-byte.

Let's interpolate linearly:

The divisor's low-byte is multiplied by 58 - 29 = 29.
This is an 8.0*8.0=8.8 operation, but we're casually discarding the low-byte, so it's an 0.8*8.0=8.0 scaling mul.
See https://codebase64.org/doku.php?id=base:8bit_multiplication_16b.. (which might be sped up even further when neglecting the result's low-byte.)

Finally, subtract the mul result's high-byte from 58.

So, we're approximating the entire C/N function you require by piecing together straight lines, with the "joints" at the divisor argument's $xx00 points.

2021-06-29 16:19

chatGPZ

Registered: Dec 2001
Posts: 11108

perhaps using 16384 for the constant could also help?

2021-06-29 16:36

Krill

Registered: Apr 2002
Posts: 2839

Quoting Groepaz

perhaps using 16384 for the constant could also help?

This might improve accuracy on the log/shift-based approaches. Consider:

$4000 / $10 = $0400
$4000 / $18 = $02aa
$4000 / $20 = $0200
$4000 / $30 = $0155
$4000 / $40 = $0100
$4000 / $60 = $00aa
$4000 / $80 = $0080

2021-06-29 16:45

chatGPZ

Registered: Dec 2001
Posts: 11108

now i expect someone like Quiss to post a totally rad solution using ARR maybe? :)

2021-06-29 17:24

Quiss

Registered: Nov 2016
Posts: 37

Quoting Groepaz

now i expect someone like Quiss to post a totally rad solution using ARR maybe? :)

I wish. Not sure about ARR, though RRA feels like it would have the potential to speed up bit-wise multiplication/division. But I have never managed to make that work. That extra carry always spoils things.

Refresh

Subscribe to this thread: