Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > sets of add/sub
2017-03-18 12:02
Repose

Registered: Oct 2010
Posts: 225
sets of add/sub

I want to do a-b+c-d+e-f. I wonder if you can save some cycles somewhere? I looked at the stats of the actual numbers, no hopes there, any pair can carry/borrow with about 50% chance.

First I'll look at adding 3x 16bit adds and generalizing

Standard version
-----------------
We will keep the running sum in a register
n1,n2,n3 - numbers to add
r, r+1,... - the sum

;assume n1 stored in r
2 clc
3 lda r
3 adc n2
2 tax;n2+n1 save low byte in X
2 lda #0;small optimization first time when you know r+1 is 0.
2 adc #0; basically convert carry to 1
2 tay;16c save high byte in Y
;-- repeat for n>=3 (n is number of numbers to add)
;clc not needed
2 txa
3 adc n3
3 sta r;r+n3=n1+n2+n3 or tax until final add
2 tya
2 adc #0
3 sta r+1;16+15=31 or tay until final add
rts

In general,
Time is 16+(n-2)*13+2=18+(n-2)*13

n time, cycles
2 18 =18
3 18+13 =31
4 18+13+13 =44

Fast Version 1
--------------

2 clc

3 lda n2
3 adc r;n2+n1
2 bcs s1;2+9=11 if taken
3 adc n3
3 sta r
2 lda #0
2 adc #0
3 sta r+1;2+8+13=23 if no carry in n1+n2
rts;or bcc s2 =26
;11c
2 s1 clc;there was a carry in first add, A=n1+n2
3 adc n3
3 sta r
2 lda #1;representing the carry from n1+n2>255
2 adc #0
3 sta r+1;11+15=26 if carry in n1+n2
s2 rts;23/26, average 24.5, saving 6.5 cycles from standard (31c)
realistically always 26 or saving 5 cycles

Conclusions
Adding 3 numbers, I went from 31 to 26 cycles.

The statistics of adding 3 numbers drawn from multiplies
--------------------------------------------------------

In a partial sums routine, only some bytes of the product are added

case carries % total
l l h 2908442244 67.7 4294967296
h h l 2044355014 47.6 4294967296

Conclusions
High bytes of products are always less likely to create a carry when added to other numbers. The differences are so little that it hardly makes any difference.

ps took a while to simulate 4 billion multiplies and gather stats. I'd hate to figure out stats for larger mults, would have to actually think through it instead. But I can tell you they change quite a bit, in some cases there's hardly two carries in a row for example.
 
... 45 posts hidden. Click here to view all posts....
 
2017-04-13 20:43
Repose

Registered: Oct 2010
Posts: 225
Btw, that test case isn't valid for the way I'm adding the f(x) values in a row first (f(x)=x^2/256). So I really need to construct better values.

Sure, I can just avoid the buggy addition routine, but there's a big point here - randomized testing is very unlikely to ever show up the bug.

What we need is 100% test coverage of all the branches/carries to avoid any such bugs in the future. For example, getting all those additions in the right order without making a typo, for some 72 of them (considering a 32x32), would be very difficult.

Now comes the interesting problem of which values cause carries in each sub-addition.

This is a very clear case where 100% testing is impossible, randomized testing will fail, and only careful analysis will catch bugs.

So I want to work on this and I believe CJ is as well.
I think it lies in finding factors and doing small sets of trials around that.
2017-04-13 20:48
ChristopherJam

Registered: Aug 2004
Posts: 1409
Oh, just tried ff+ff+01. Sums to 01ff as expected
2017-04-13 20:49
ChristopherJam

Registered: Aug 2004
Posts: 1409
Quoting Repose
getting all those additions in the right order without making a typo, for some 72 of them (considering a 32x32), would be very difficult.


..exactly why I went for a code generator.

But you're right, getting good test coverage for edge cases is hardÂ…
2017-04-13 20:50
Repose

Registered: Oct 2010
Posts: 225
Ok, seems I took a wrong turn somewhere, and that when you went down the optimization path with sbc tab,x you avoided my particular bug.

My point about test coverage is still valid though.

I'll have to grok your post later.
2017-04-13 20:54
ChristopherJam

Registered: Aug 2004
Posts: 1409
Quoting Repose
Ok, seems I took a wrong turn somewhere, and that when you went down the optimization path with sbc tab,x you avoided my particular bug.

The key is to not set carry before doing the sbc high - that SEC was throwing away information.
Quote:
My point about test coverage is still valid though.

Agreed.
2017-04-13 21:01
Repose

Registered: Oct 2010
Posts: 225
Ok quick post yes I realized I need to keep the last carry which is why I did adc #0 before the subtract, I think we're on the page.
2017-04-14 15:54
ChristopherJam

Registered: Aug 2004
Posts: 1409
So, for the NxN multiply, as Repose pointed out it's only worthwhile skipping the CLCs for columns with a reasonable number of addends. Coincidentally, all of those columns also require the addition of a non-zero offset, to account for my excess $4000 representation of the g() table.

I end each of those columns with the following sequence:
sbc id+($ff-Kn),x
sta res+n
txa

where Kn is the additional constant for that column (must be at least as high as the maximum value X can reach, lest the indexing could cross the page boundary)

I've just run an exhaustive test in VICE, comparing a more traditionally computed reference value with the optimised code above, and it matches for all 16842752 valid sets of inputs. I've included the test code below, eliding the loop iterator and display routines for brevity.

; test parameters are
;  A  (00..ff)
;  C  (00..01)
;  X  (00..ff)
;  K  (X..ff)   ; routine requires that X+(0xff-K)<= 0xff,  ie X<=K
;
; result should be A+C+K+255*X

;-----------------------------
;  calculate reference value
;-----------------------------

    lda tC
    lsr
    lda tA
    adc tK
    sta ref+0
    lda tX
    adc#0
    sta ref+1  ; ref is now A+C+K+256*X

    sec
    lda ref+0
    sbc tX
    sta ref+0
    lda ref+1
    sbc#0
    sta ref+1   ; ref is now A+C+K+255*X
;-----------------------------
; prepare registers for test
;-----------------------------
    lda tK
    eor#$ff
    sta rn+1
    lda tC
    lsr
    lda tA
    ldx tX
;-----------------------------
; run test code
;-----------------------------
rn  sbc id+$3f,x
    sta res+0
    txa
    adc#0
    sta res+1

;-----------------------------
; compare result to reference value
;-----------------------------
    lda res+0
    cmp ref+0
    bne fail
    lda res+1
    cmp ref+1
    bne fail
2017-04-15 06:39
Repose

Registered: Oct 2010
Posts: 225
I got it to work now thanks :)
And good news, I found something even faster now! It always beats the clc version, even for 3 adds. I might be able to beat whatever you have :)

See if you can generate 16x16 your way, that will be my first working example for now.
2017-04-15 06:42
ChristopherJam

Registered: Aug 2004
Posts: 1409
And now proof that the adc:bcc:inx sequence retains sufficient information to calculate the sum of the column, regardless of when carries occur.

If we define T to be equal to the current value of A+C+255*X,
it's possible to prove that after each adc:bcc:inx sequence, T increases by the value added.

; before:
; T=A+C+255*X, so A=T-C-255*X

then we execute
    adc #V
    bcc *+3
    inx

Let C', X' and A' be the values of carry, accumulator and X after running the code sequence.

Now, either the add didn't set C,
in which case
  C'=0
  X'=X
  A'=A+V+C
    =(T-C-255*X)+V+C
    =T+V-255*X

  T'=A'+C'+255*X'
    =(T+V-255*X)+0+255*X
    =T+V

or, the add did set C, in which case
  C'=1
  X'=X+1          # as the increment happens
  A'=A+V+C-256    # as the add has overflowed into C
    =(T-C-255*X)+V+C-256
    =T+V-255*X-256

  T'=A'+C'+255*X'
    =(T+V-255*X-256)+1+255*(X+1)
    = T+V-255*X-255+255*X+255
    = T+V

Either way, T has increased by the correct amount
2017-04-15 06:48
Repose

Registered: Oct 2010
Posts: 225
Nice proof! Couldn't have said it better myself.
Previous - 1 | 2 | 3 | 4 | 5 | 6 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
csabanw
Twoflower/ΤRIΛD
Jangler/Artline Desi..
Chrx/Design/Chaos
Brataccas/HF
Genius/Xenon
Guests online: 115
Top Demos
1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.6)
5 Edge of Disgrace  (9.6)
6 What Is The Matrix 2  (9.6)
7 The Demo Coder  (9.6)
8 Uncensored  (9.6)
9 Comaland 100%  (9.6)
10 Wonderland XIV  (9.6)
Top onefile Demos
1 No Listen  (9.6)
2 Layers  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.6)
6 Dawnfall V1.1  (9.5)
7 Rainbow Connection  (9.5)
8 Onscreen 5k  (9.5)
9 Morph  (9.5)
10 Libertongo  (9.5)
Top Groups
1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Triad  (9.3)
5 Censor Design  (9.3)
Top Original Suppliers
1 Derbyshire Ram  (9.7)
2 Fungus  (9.3)
3 Black Beard  (9.2)
4 Baracuda  (9.2)
5 hedning  (9.1)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.047 sec.