Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
 Welcome to our latest new user maak ! (Registered 2024-04-18) You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > sets of add/sub
2017-03-18 12:02
Repose

Registered: Oct 2010
Posts: 222
sets of add/sub

I want to do a-b+c-d+e-f. I wonder if you can save some cycles somewhere? I looked at the stats of the actual numbers, no hopes there, any pair can carry/borrow with about 50% chance.

First I'll look at adding 3x 16bit adds and generalizing

Standard version
-----------------
We will keep the running sum in a register
n1,n2,n3 - numbers to add
r, r+1,... - the sum

;assume n1 stored in r
2 clc
3 lda r
3 adc n2
2 tax;n2+n1 save low byte in X
2 lda #0;small optimization first time when you know r+1 is 0.
2 adc #0; basically convert carry to 1
2 tay;16c save high byte in Y
;-- repeat for n>=3 (n is number of numbers to add)
;clc not needed
2 txa
3 adc n3
3 sta r;r+n3=n1+n2+n3 or tax until final add
2 tya
2 adc #0
3 sta r+1;16+15=31 or tay until final add
rts

In general,
Time is 16+(n-2)*13+2=18+(n-2)*13

n time, cycles
2 18 =18
3 18+13 =31
4 18+13+13 =44

Fast Version 1
--------------

2 clc

3 lda n2
3 adc r;n2+n1
2 bcs s1;2+9=11 if taken
3 adc n3
3 sta r
2 lda #0
2 adc #0
3 sta r+1;2+8+13=23 if no carry in n1+n2
rts;or bcc s2 =26
;11c
2 s1 clc;there was a carry in first add, A=n1+n2
3 adc n3
3 sta r
2 lda #1;representing the carry from n1+n2>255
2 adc #0
3 sta r+1;11+15=26 if carry in n1+n2
s2 rts;23/26, average 24.5, saving 6.5 cycles from standard (31c)
realistically always 26 or saving 5 cycles

Conclusions
Adding 3 numbers, I went from 31 to 26 cycles.

The statistics of adding 3 numbers drawn from multiplies
--------------------------------------------------------

In a partial sums routine, only some bytes of the product are added

case carries % total
l l h 2908442244 67.7 4294967296
h h l 2044355014 47.6 4294967296

Conclusions
High bytes of products are always less likely to create a carry when added to other numbers. The differences are so little that it hardly makes any difference.

ps took a while to simulate 4 billion multiplies and gather stats. I'd hate to figure out stats for larger mults, would have to actually think through it instead. But I can tell you they change quite a bit, in some cases there's hardly two carries in a row for example.
 
... 52 posts hidden. Click here to view all posts....
 
2017-04-16 15:49
ChristopherJam

Registered: Aug 2004
Posts: 1370
Ok, I think I follow how that last one works now.

A slight optimisation first, going to replace
    stx r1
    sbc r1
with
    sbc id,x

It's two cycles faster, only needs as many bytes of table as you have INXs (so, under a dozen), and makes it easier to
present the proof in the rest of this comment. Don't need to touch r1, as it gets overwritten later anyway.

Now, again defining T to be A+C+255*X
    ldx #0
a)  sec
     ;  initialise T to 1

    lda n1
     ;        now T is A+C+255*X = n1+1+255*0 = n1+1

    adc n2
    bcc *+3
    inx
     ;        now T is 1+n1+n2, cf proof in comment about the adc:bcc:inx sequence earlier in this thread

    adc n3
    bcc *+3
    inx
     ;        now T is 1+n1+n2+n3

  ; so all that's required now is adding A+C+255*X-1 and storing them

b)  sbc id,x  ; now sbc nn is the same as adc $ff-nn, so here we are adding $ff and subtracting X
    sta r0

    txa
c)  sbc #0    ; again, equivalent to adc $ff-0, so here we are adding x, $ff, and any carry from the low byte
    sta r1    ; ie, adding 256*X and $ff00 to the total in r1:r0

So, between points (a) (b) and (c), you've added 0001, 00ff, and ff00 to your total, which should result in carry being set
at the end of the routine, but no other effect on the result. Lines (b) and (c) contribute 255*x, leaving the sum in r1:r0

Usefully, the set carry flag is just what's needed for the start of the next column. Overall, nice work!
2017-04-16 16:28
Fresh

Registered: Jan 2005
Posts: 101
Looks like you've already reached your target, just want to share a similar approach that would avoid messing with the carry.
This would clearly need to be executed on ZP to be really efficient but, in that case, the overhead would be (a bit) less than yours.

	lax eorval+1
	ldy #$00
val1:
	axs #$00
	
val2:
	axs #$00
	bcs val3
	iny
val3:
	axs #$00
	bcs val4
	iny
val4:
	axs #$00
	bcs storehi
	iny	
	
storehi:
	sty r1
	txa
eorval:
	eor #$ff
	sta r0
2017-04-17 02:20
ChristopherJam

Registered: Aug 2004
Posts: 1370
It's certainly a fast way of adding a bunch of constant values, but I fear any speed advantage would more than counterbalanced by having to fetch the addends in an earlier code segment and writing them into the routine.

There are no registers left for addressing (eg) table entries.
Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
Mike
Didi/Laxity
Alakran_64
DJ Space
Mr. SID
rexbeng
Guests online: 112
Top Demos
1 Next Level  (9.8)
2 Mojo  (9.7)
3 Coma Light 13  (9.7)
4 Edge of Disgrace  (9.6)
5 Comaland 100%  (9.6)
6 No Bounds  (9.6)
7 Uncensored  (9.6)
8 Wonderland XIV  (9.6)
9 The Ghost  (9.6)
10 Bromance  (9.6)
Top onefile Demos
1 It's More Fun to Com..  (9.9)
2 Party Elk 2  (9.7)
3 Cubic Dream  (9.6)
4 Copper Booze  (9.5)
5 Rainbow Connection  (9.5)
6 Wafer Demo  (9.5)
7 TRSAC, Gabber & Pebe..  (9.5)
8 Onscreen 5k  (9.5)
9 Dawnfall V1.1  (9.5)
10 Quadrants  (9.5)
Top Groups
1 Oxyron  (9.3)
2 Nostalgia  (9.3)
3 Booze Design  (9.3)
4 Censor Design  (9.3)
5 Crest  (9.3)
Top Logo Graphicians
1 Sander  (10)
2 Facet  (9.7)
3 Mermaid  (9.4)
4 Pal  (9.4)
5 Shine  (9.3)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.042 sec.