Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in 
CSDb User Forums


Forums > C64 Coding > sets of add/sub
2017-03-18 13:02
Repose

Registered: Oct 2010
Posts: 75
sets of add/sub

I want to do a-b+c-d+e-f. I wonder if you can save some cycles somewhere? I looked at the stats of the actual numbers, no hopes there, any pair can carry/borrow with about 50% chance.

First I'll look at adding 3x 16bit adds and generalizing

Standard version
-----------------
We will keep the running sum in a register
n1,n2,n3 - numbers to add
r, r+1,... - the sum

;assume n1 stored in r
2 clc
3 lda r
3 adc n2
2 tax;n2+n1 save low byte in X
2 lda #0;small optimization first time when you know r+1 is 0.
2 adc #0; basically convert carry to 1
2 tay;16c save high byte in Y
;-- repeat for n>=3 (n is number of numbers to add)
;clc not needed
2 txa
3 adc n3
3 sta r;r+n3=n1+n2+n3 or tax until final add
2 tya
2 adc #0
3 sta r+1;16+15=31 or tay until final add
rts

In general,
Time is 16+(n-2)*13+2=18+(n-2)*13

n time, cycles
2 18 =18
3 18+13 =31
4 18+13+13 =44

Fast Version 1
--------------

2 clc

3 lda n2
3 adc r;n2+n1
2 bcs s1;2+9=11 if taken
3 adc n3
3 sta r
2 lda #0
2 adc #0
3 sta r+1;2+8+13=23 if no carry in n1+n2
rts;or bcc s2 =26
;11c
2 s1 clc;there was a carry in first add, A=n1+n2
3 adc n3
3 sta r
2 lda #1;representing the carry from n1+n2>255
2 adc #0
3 sta r+1;11+15=26 if carry in n1+n2
s2 rts;23/26, average 24.5, saving 6.5 cycles from standard (31c)
realistically always 26 or saving 5 cycles

Conclusions
Adding 3 numbers, I went from 31 to 26 cycles.

The statistics of adding 3 numbers drawn from multiplies
--------------------------------------------------------

In a partial sums routine, only some bytes of the product are added

case carries % total
l l h 2908442244 67.7 4294967296
h h l 2044355014 47.6 4294967296

Conclusions
High bytes of products are always less likely to create a carry when added to other numbers. The differences are so little that it hardly makes any difference.

ps took a while to simulate 4 billion multiplies and gather stats. I'd hate to figure out stats for larger mults, would have to actually think through it instead. But I can tell you they change quite a bit, in some cases there's hardly two carries in a row for example.
 
... 16 posts hidden. Click here to view all posts....
 
2017-03-22 07:33
ChristopherJam

Registered: Aug 2004
Posts: 530
Quoting Repose
The indexed addressing is doing the adds, and the timer is doing the carries, in parallel.

…which is exactly what Fresh is proposing \O/

@lft - all good, I missed that Repose was tracking the carry count in X in post #4…
2017-03-22 07:41
Repose

Registered: Oct 2010
Posts: 75
Even better:

clc
lda n1
adc n2
bcs *+2;basically do nothing as quickly as possible to make carry consume one more cycle
adc n3
bcs *+2
...
sta sum
;diff timer
sta sum+1

My first method was 6/7, this one is 5/6. The probability of a carry is just under half, so basically approaches 5.5 cycles per number added.

I love to see someone's face try to decode that though, confuse the @#@ out of them :)
2017-03-22 08:18
Repose

Registered: Oct 2010
Posts: 75
You can even add/sub in the same sequence

;now do subtractions
sec
sbc n5
bcs *+2; 'subtract' one cycle from timing if there's a borrow
sbc n6
bcs *+2
...
2017-03-22 08:21
Repose

Registered: Oct 2010
Posts: 75
Why stop there?

lda n
ror
bcs *+2
x8
;diff timer
sta popcnt

Population count in 4.5 cycles per bit. Of course a table would be much faster. What else useful could you do?
2017-03-22 09:29
Repose

Registered: Oct 2010
Posts: 75
bcs/bcc, bmi/bpl, bvc can all be used in this technique to accumulate the times those flags were in a certain state. By extension, you can accumulate times a certain bit was set or value was reached, like cmp #$40:bcc *+2:
So think of the timer as an extra register that you can increment or decrement (by using the opposite branch), using branches or page crossings, that is only fast to use over a longish sequence. It can do corrections after the fact or count things. There is also a free divide, IIRC you can set the speed the timer counts? Or is it fixed to decrement each clock cycle? No matter, use the SID voice 3 output and set the frequency for a free division.

Question, what use can this be put to? And can you use the flags as part of a calculation ,while using the accumulation of times the flags happened for a parallel computation using the timer (which has no sideeffects) to do double the work? (I mean, use the branch to do one thing, then at the end do another thing with the timer value).

In another direction, can you do some computation unrolled then interrupt it at a certain time to see the value and that's your result?

example,
rol
bcs +2
inx
... x8
interrupt after 24 cycles
X will be 4-(popcnt(A and $#f)/6) or something (the full formula is kinda long)

Another idea I've used to to make 'infinite' loops that can only be stopped by an interrupt
lda #0
sta $4000
sta $4001
..
decide which area of the bitmap you want to fill
jump into the code
stop it with an interrupt
of course you can 'plot' an RTS into the middle as well, then restore it.
Interrupt idea might turn out faster

This is getting off topic though.
2017-03-22 09:42
ChristopherJam

Registered: Aug 2004
Posts: 530
I've definitely considered using an interrupt to terminate an unrolled fill loop, but yes, plotting an RTS seemed like a saner option at the time.
2017-03-22 09:53
ChristopherJam

Registered: Aug 2004
Posts: 530
Could also use CIA to see count how many iterations a fractal takes to escape, freeing up registers for state :)
2017-03-22 09:55
Pex Mahoney Tufvesson

Registered: Sep 2003
Posts: 33
I love you guys! :-D
I've used timers a lot for stopping infinite loops. One example is the part in BitLive4-demo where I draw a full-screen all-border white/black line using $d020 and CPU only. No time for keeping track of where to stop. :)
---
Have a noise night!
http://mahoney.c64.org
2017-03-22 10:04
Repose

Registered: Oct 2010
Posts: 75
@cj Very useful idea, that would definitely save time. In the end, just use the Timer value as a colour and plot it. It doesn't really matter the values involved, you can always arrange them to be something different depending on escape "time" literally... then animate the 'palette' by starting one cycle later..
See what I mean? I'm saying if you plot a fractal in real time in one frame say, then the colours are determined by the times to iterate the formula, and that value from the timer directly translate to a colour, then starting later shifts the colour palette down one, then you have a free animated fractal.

@mahooney Nice, that's a very good use because it's the only way possible to create an effect. Will check out your demo thanks for the comment :)
2017-03-22 10:48
Repose

Registered: Oct 2010
Posts: 75
Another free calculation on the Timer, remember it's 16bit so you can cause a borrow from the high byte, so lda TimerA+1:bmi xx or bne is like asking was there at least n counts (carries, borrows, whatever you're doing).

Say my adds were 6 cycles each, I'm adding 4 numbers, I start timer at $124, then lda TimerA+1:bne means no carries. Don't know how it's useful either, just throwing out a lot of ideas. You gotta brainstorm and not hold yourself back, useful stuff will stick eventually.

There's also two timers, you can have them read different values.

You don't need the sbc #constant because you should set the timer to #constant + variation at the start, then at the end the timer reads exactly the high byte +1 (since Timer can't be zero)
so
ldx timerA
dex
stx sum+1
Previous - 1 | 2 | 3 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
Jailbird/Booze Design
Mike
iAN CooG/HVSC
Raffox/ASURA/Magic C..
Fred/Channel Four
Didi/Laxity
Avalanche/Atlantis
Zyron/GeNos¤ProjecTary
Guests online: 41
Top Demos
1 Uncensored  (9.7)
2 The Shores of Reflec..  (9.7)
3 Edge of Disgrace  (9.7)
4 Coma Light 13  (9.6)
5 Lunatico  (9.6)
6 Comaland 100%  (9.6)
7 Incoherent Nightmare  (9.5)
8 Wonderland XII  (9.5)
9 Comaland  (9.5)
10 Wonderland XIII  (9.5)
Top onefile Demos
1 Dawnfall V1.1  (9.5)
2 Daah, Those Acid Pil..  (9.4)
3 Treu Love [reu]  (9.4)
4 Dawnfall  (9.3)
5 Tunnel Vision  (9.3)
6 One-Der  (9.2)
7 Goatbeard  (9.2)
8 Globe 2016 [reu]  (9.2)
9 Hardware Accelerated..  (9.2)
10 Safe VSP  (9.1)
Top Groups
1 Booze Design  (9.4)
2 Oxyron  (9.4)
3 Censor Design  (9.4)
4 Crest  (9.3)
5 SHAPE  (9.2)
Top Hardware-Gurus
1 Soci  (9.9)
2 Wiesel  (9.9)
3 Grue  (9.8)
4 Zer0-X  (9.8)
5 Lemming  (9.7)

Home - Disclaimer
Copyright © No Name 2001-2017
Page generated in: 0.138 sec.