| |
Krill
Registered: Apr 2002 Posts: 2982 |
Long division/modulo with byte-size divisor
So i needed something to divide a large integer (160 or more bits) by a small integer (5 bits) while also performing a modulo operation.
After a few rounds of optimisation, turns out the 6502 implementation is surprisingly compact and fast: ; dividend is in BIGNUM, little-endian
; divisor is in accu
sta DIVISOR
lda #0
ldx #BIGNUMBYTES
longdivmod ldy #8
asl BIGNUM - 1,x
- rol
cmp DIVISOR
bcc +
sbc DIVISOR
+ rol BIGNUM - 1,x
dey
bne -
dex
bne longdivmod
; quotient is in BIGNUM, little-endian
; remainder is in accu This runs in O(n) linear time. The routine can be executed continuously on the dividend/quotient to extract modulo values.
It's also possible to have a big-endian long integer argument/result by simply iterating over the byte array in reverse order.
Of course, can slightly optimise performance (2 cycles per output bit) by sacrificing a few bytes to self-modify the two DIVISOR arguments with immediate operands. Or unroll the entire thing.
This shall go to Codebase64 at some point, of course, but there might be some more optimisation opportunities.
Also i think there are some requirements on the arguments, such that their MSB must be clear or so. |
|
... 13 posts hidden. Click here to view all posts.... |
| |
Krill
Registered: Apr 2002 Posts: 2982 |
Yeah cool, but the point was having arbitrarily large dividends to divide by an 8-bit divisor. =) |
| |
Fred
Registered: Feb 2003 Posts: 287 |
Right, the routine I provided is 16 bits dividend only. |
| |
Fred
Registered: Feb 2003 Posts: 287 |
Here is my version of a long division, specified in bytes in DIV_IN_BYTES:
; div
; input:
; - n-bytes dividend, little-endian
; - 8-bit divisor
; output:
; - n-bytes result stored in dividend
; - AC: remainder
; - XR: 0
; - YR: 0
DIV_IN_BYTES = 20
div ldy #DIV_IN_BYTES * 8
lda #0
- clc
ldx #-DIV_IN_BYTES & $ff
- rol dividend + DIV_IN_BYTES - $100,x
inx
bmi -
rol
cmp divisor
bcc +
sbc divisor
inc dividend
+ dey
bne --
rts
divisor .byte $00
dividend .byte $00, $00, $00, $00, $00, $00, $00, $00
.byte $00, $00, $00, $00, $00, $00, $00, $00
.byte $00, $00, $00, $00
|
| |
Krill
Registered: Apr 2002 Posts: 2982 |
At a quick glance, this seems to run in O(n^2) quadratic time rather than O(n) linear.
With a growing input dividend, both the inner and the outer loops take more iterations.
So i guess this will run slower. |
| |
Monte Carlos
Registered: Jun 2004 Posts: 364 |
Using both cmp as well as sbc seems subject to possible optimization. |
| |
Krill
Registered: Apr 2002 Posts: 2982 |
Quoting Monte CarlosUsing both cmp as well as sbc seems subject to possible optimization. If the X register can be spared, replacing cmp DIVISOR
bcc +
sbc DIVISOR
+ with something like tax
sbx #DIVISOR
bcc +
txa
+ might be possible, but this hardly looks like it would save cycles on average. =) |
| |
Krill
Registered: Apr 2002 Posts: 2982 |
That said, the long division as in the OP is quite swift, but the long multiplication i also need in the same context turned out to be considerably slower. :) (I guess mostly because it's executed a lot more often, though.) |
| |
Fred
Registered: Feb 2003 Posts: 287 |
Quote: At a quick glance, this seems to run in O(n^2) quadratic time rather than O(n) linear.
With a growing input dividend, both the inner and the outer loops take more iterations.
So i guess this will run slower.
Correct. I now see the beauty of your version :-)
Some correction on your routine, I see that the BIGNUMBYTES isn't taken into account correctly. If you set it to e.g. 20, the number of bytes processed is 19. See my correction below.
To speed up the algorithm, I think it is best to skip zeros before going into the loop in case the number of bytes is always fixed and the value is low.
Replace:
ldx #BIGNUMBYTES
with:
ldx #BIGNUMBYTES + 1
- ldy BIGNUM - 1,x
bne longdivmod
dex
bne -
|
| |
Krill
Registered: Apr 2002 Posts: 2982 |
Quoting FredSome correction on your routine With BIGNUMBYTES = 1
ldx #2; BIGNUMBYTES + 1
- ldy BIGNUM - 1,x; BIGNUM + 1, BIGNUM + 0
bne longdivmod
dex
bne -
this now iterates over 2 bignum-bytes. Likewise one to many for all other settings.
So, no, i think my code was okay.
Quoting FredTo speed up the algorithm, I think it is best to skip zeros before going into the loop in case the number of bytes is always fixed and the value is low. Yes. =)
Quoting KrillAs i'm continuously extracting values, dividend and quotient are getting smaller and smaller. So the size of the byte array can be decreased whenever a most significant byte becomes zero. This should neatly halve overall execution time. But i didn't add this optimisation, as i need the bytes elsewhere and it's already quick enough for my purposes. :) |
| |
Fred
Registered: Feb 2003 Posts: 287 |
Quote: Quoting FredSome correction on your routine With BIGNUMBYTES = 1
ldx #2; BIGNUMBYTES + 1
- ldy BIGNUM - 1,x; BIGNUM + 1, BIGNUM + 0
bne longdivmod
dex
bne -
this now iterates over 2 bignum-bytes. Likewise one to many for all other settings.
So, no, i think my code was okay.
Quoting FredTo speed up the algorithm, I think it is best to skip zeros before going into the loop in case the number of bytes is always fixed and the value is low. Yes. =)
Quoting KrillAs i'm continuously extracting values, dividend and quotient are getting smaller and smaller. So the size of the byte array can be decreased whenever a most significant byte becomes zero. This should neatly halve overall execution time. But i didn't add this optimisation, as i need the bytes elsewhere and it's already quick enough for my purposes. :)
ok, my conclusion was too fast, sorry about that. I retested it and you're right, it works okay. |
Previous - 1 | 2 | 3 - Next |