 
Strobe
Registered: Jun 2007 Posts: 6 
Fast division with constant dividend?
Is there some trick way to quickly do division with a constant dividend?
i.e. result = constant / variable
There's oodles of info out there on dividing _by_ a constant but this is the opposite.
In my case my input variable is <= 14bits and useful values of output after clamping are <=8bits (7bits actually) so I _could_ burn 16KB RAM on a lookup table but I'd rather not..
Thanks. 

 
ChristopherJam
Registered: Aug 2004 Posts: 1174 
Hrm. One of the things that makes constant divisor tractable is it's basically a multiply in disguise. Your question however still needs the reciprocal computed.
If you only need a 7bit result, that's only going to depend on the seven or eight bits below the most significant '1' bit in the divisor, and the position of that bit.. so you might be able to use a few one page tables.
Do you have some examples of the sorts of constants and input ranges you're looking at? 
 
Groepaz
Registered: Dec 2001 Posts: 9970 
I'd check if you cant transform into fixedpoint multiplication (and use the usual "very fast multiply" with tables of squares) 
 
Krill
Registered: Apr 2002 Posts: 1973 
Afaik, apart from a lookuptable, there is no general optimisation for 1/N, regardless of a scaling factor (your constant to divide).
Depending on your desired accuracy, there are a number of possible approaches with smaller lookup tables than 16 KB and possibly some multiplication (which does have more optimisation potential), including hybrids.
E.g., you could use the top 8 bits of your variable for a table lookup (256 bytes), then linearly interpolate between the lookup result and the next entry in the table by scaling the low bits of your argument (those not used for the lookup) into that range and adding this to the lookedup value.
Then there might be some other approaches based on log/exp, such as using the argument's topbit to scale the result from a smaller div lookup table with shifting.
Question is if any of these or similar approaches would be faster than your current solution. 
 
Peiselulli
Registered: Oct 2006 Posts: 77 
Another option:
Build a table with 128 entries, each repesrenting one of the possible results. Each of this 128 entries represents the lowest value that results to its corresponding 7 bit value. Use "binsearch" with "variable" to the get the right index between this boundaries stored in the table. Maye the upper 6 bits of your 14 bit value can be used to speed up the binsearch with another small table. 
 
Strobe
Registered: Jun 2007 Posts: 6 
Thanks cjam / gpz / krill
Constant is currently ~15000 but can change at compile time, variable is literally anything from 0 to ~$3fff.
A set of tables based on the significant bit is something that I haven't tried, I'll give that a go. Should be 1.75k or so of tables for 06 significant bits instead of 16k. Might even be able to get away with only 7 bits of the lower byte as significant to half the table size again.
I'm not sure interpolating is going to work, as the input values approach zero you wind up with a broader range of valid output values, so even using 8 bits of the input as an index leaves a lot of potentially valid values to interpolate through, and other than a straight average with adds/shifts that sounds like more maths which is the original problem..
My current "better but not great" approach is linearly searching a table of the thresholds of each resultant value, using the high byte of the input as an index to the starting point. Needs about 320 bytes of tables. But has a worst case searching ~60 values. That seemed like a really ugly hack and I thought there must be a faster and more elegant solution... will try cjam's set of tables one. 
 
Krill
Registered: Apr 2002 Posts: 1973 
Quoting StrobeI'm not sure interpolating is going to work, as the input values approach zero you wind up with a broader range of valid output values, so even using 8 bits of the input as an index leaves a lot of potentially valid values to interpolate through, and other than a straight average with adds/shifts that sounds like more maths which is the original problem.. Not sure whether i got my point across there. :)
I was proposing this:
Consider the nearzero divisor cases you mentioned, such as
15000 / 0x100 = 58 (floored) through
15000 / 0x1ff = 29 (floored), and also
15000 / 0x200 = 29 (floored).
Our final result will be anywhere between 58 (table[1]) and 29 (table[2]), depending on the divisor's lowbyte.
Let's interpolate linearly:
The divisor's lowbyte is multiplied by 58  29 = 29.
This is an 8.0*8.0=8.8 operation, but we're casually discarding the lowbyte, so it's an 0.8*8.0=8.0 scaling mul.
See https://codebase64.org/doku.php?id=base:8bit_multiplication_16b.. (which might be sped up even further when neglecting the result's lowbyte.)
Finally, subtract the mul result's highbyte from 58.
So, we're approximating the entire C/N function you require by piecing together straight lines, with the "joints" at the divisor argument's $xx00 points. 
 
Groepaz
Registered: Dec 2001 Posts: 9970 
perhaps using 16384 for the constant could also help? 
 
Krill
Registered: Apr 2002 Posts: 1973 
Quoting Groepazperhaps using 16384 for the constant could also help? This might improve accuracy on the log/shiftbased approaches. Consider:$4000 / $10 = $0400
$4000 / $18 = $02aa
$4000 / $20 = $0200
$4000 / $30 = $0155
$4000 / $40 = $0100
$4000 / $60 = $00aa
$4000 / $80 = $0080 
 
Groepaz
Registered: Dec 2001 Posts: 9970 
now i expect someone like Quiss to post a totally rad solution using ARR maybe? :) 
 
Quiss
Registered: Nov 2016 Posts: 22 
Quoting Groepaz
now i expect someone like Quiss to post a totally rad solution using ARR maybe? :)
I wish. Not sure about ARR, though RRA feels like it would have the potential to speed up bitwise multiplication/division. But I have never managed to make that work. That extra carry always spoils things. 