Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > 8 or 16bit muls/divs
2006-10-15 13:01
Luke

Registered: Dec 2004
Posts: 19
8 or 16bit muls/divs

About multiply:

Bad one:

a*b=((a+b)/2)^2-(((a+b)/2-b))^2 with x*x matrix

It's near 30-40 cycles, but unstable with (a+b)bit0 bcoz ror after adc.

Some ppl using "nybble tables" $0n x $xy with $1000 size of tables and swap+add, but more than 50 cycles left.

And last bad is old asl bcc adc ror routine, but very slow
like 80+ cycles.

Now I trying to do any 16x16bit signed multiply, but it's really hard to make "short time" routine. Anyone got any idea?


About Divu/Divs

Classical lsr bcs sbc rol too slow.

But a/b= e^(lna-lnb) looks quite fast with e^x matrix, but
quite inaccurate.

Somebody got other "faster" idea for it? Particular 16/16bit routines.


 
... 29 posts hidden. Click here to view all posts....
 
2006-10-17 16:32
Oswald

Registered: Apr 2002
Posts: 5094
ready, you need 256 different routines for something general this way.
2006-10-17 19:38
Danzig

Registered: Jun 2002
Posts: 440
@Oswald: but that 256 "unrolled" codesnippets would be smaller than a multiply-table-based routine I coded back in 1989 :D

Edit: Okay, that routine was 16bit (makes it not better at all ;) )
2006-10-17 21:23
Krill

Registered: Apr 2002
Posts: 2980
Ready: How would you divide a number, say 0..255, by 3 and get correct truncated or rounded integer results, using your method?
2006-10-17 22:01
ready.

Registered: Feb 2003
Posts: 441
My method does not always provide exact result. For example to divide by 3, you have to get a number as close as possible to 1/3=0.33333, by summing terms of 1/(2^n). So you don't get actually 1/3 but for example 1/4+1/16+1/64=0.328125

If you don't want to do too many ror (or lsr), just 1/4+1/16=0.3125

In my case I didn't need much accurancy, I just wanted to scale the original shape I had to do zoom effect, so I use 1/2,1/4,1/8, which gives me 7 possible levels of zoom:

1/8
1/8+1/4
1/8+1/4+1/2
1/4+1/2
1/8+1/2
1/2
1/4

Enogh to see the shape disappear into the screen.

Ready.
2006-10-17 22:51
Krill

Registered: Apr 2002
Posts: 2980
Yes, and the zooming was exceptionally smooth.
2006-10-18 00:38
White Flame

Registered: Sep 2002
Posts: 136
Quote: White Flame: check it out better, it's a division.

Each lsr divides by 2, right? Many by 2 divisions provide 1/2, 1/4, 1/8,.... then sum toghether the terms you need to get the percentage you want of your original number. Don't you agree?

Besides in this way, there's no need for tables.

ciao,
Ready.


Key word there is "percentage". You're multiplying a number by N/256, where N is 0-255, and the bits of N are lookups in the table. N is in the numerator of that ratio, so you're multiplying. Which table entries you add together come from the 0.8bit fixed point representation of N/256, and the number you LSRed into the table is the number you're multiplying by. It's an unrolling of a shift-add loop with one of the values hardcoded into the source.

Take the 'divide by 3' example. You add together the 1/2^n table portions to approximate 0.33333~. You can only do this because the percentage (1/3rd) was known when writing your source code, and you can add the bits from the table that add to 1/3. But what if you want to take 1/Nth of a number where N is only known at runtime? It takes a division to get the percentage, and your routine deals with only the percentage end. Sure, you can do an 8-bit table of 1/N and do a shift-add with your LSRed table, but that's horribly imprecise for general division. But this works very well for a zoomer where you're only dealing with percentages scaled to the range of 0-255.
2011-07-27 13:26
ready.

Registered: Feb 2003
Posts: 441
Today I had the need to zoom an 8-bit sampled waveform, with sampling rate around 1kHz, with real time scaling of each sample, so all calculations have to be done extremely fast for the powerful C64.
So I thought I could reuse the same concept used in my old zooming routine.
Using the well known fast multiplication routine, it doesn't take much for the C64 to execute:

K*L/16=M

where K is the original sample, L is the zooming factor (16=original waveform; ......; 1=fully reduced waveform; 0=no waveform), M is the reduced sample.
This code provides 16+1 levels of zoom, +1 is the "no waveform level":

lda K
ldy L
jsr mult ;K*L = (byte low, byte high) = (x reg, accu)
stx ZP
lsr ;/2
ror ZP
lsr ;/4
ror ZP
lsr ;/8
ror ZP
lsr ;/16
ror ZP

lda ZP
sta M


If more/less zooming levels are needed the general formula is K*L/2^n=M, giving 2^n+1 zooming levels.
2011-07-27 14:02
Oswald

Registered: Apr 2002
Posts: 5094
for 16+1 levels I'd use tables if there's enough memoriy.
2011-07-27 14:09
WVL

Registered: Mar 2002
Posts: 902
what oswald said :D

now i think of it..


another way is to add 2 sines and using the 'interference' to calculate the multiplication.

Basically what you're doing is multiplying your sample value (A) with a number (B) between 1/16 and 1. Result is C.

C = A * B

Now let's see what happens when we write this as sines ;)

A mathematical formula for multiplying sines is :

sin(X) * sin(Y) = 0.5 * (cos (x-y) - cos(x+y))

let's say that
A = sin (x)
B = sin (y)
C = sin (x) * sin(y)

now, instead of storing A in your sample data, we store x instead. Next, we have a small table (17 bytes) to get the value y if we want a multiplation of 0/16,1/16,...,16/16.

(this value only changes when you change the volume!)

Now, we can calculate the product very easily, since we know x (which is your sample data!) and y (we modify the code!)

C = 0,5 * (cos(x-y) - cos(x+y)

code for changing the volume level:
;this bit only when we change volume
  ldx #B  (volume 0-16)
  lda volumetable,x
  sta sbcadress+1
  eor #$ff
  clc
  adc #1
  sta ldaadress+1

this is the code for multiplying a sample value
;this is for multiplying one sample value
  ldx sampledata (stored as inverse sines)
ldaadress:
  lda COSTABLE+256-y,x (the -y is done by modifying the adress)
  sec
sbcadress:
  sbc COSTABLE+y,x (the +y is done by modifying the adress)
  ror

tadaa! multiplied :) The nice thing is that you can make almost 256 different volume levels ;)

Note : your COSTABLE table has to be twice as long. 2*256 bytes. But still a lot less memory use than 17 256-byte long tables for oswald's way.

The only 'problem' is that you lose around 1 bit of accuracy in the whole process, mainly because you cannot specify A exactly in terms of x (there's always numbers missing from your COSTABLE table, sometimes there's a gap of 2 or 3 instead of 1).

Damn.. what a long post.. I can has cookie?
2011-07-29 08:41
ready.

Registered: Feb 2003
Posts: 441
@WVL: I only saw your post now, you must have edited it after I read it the first time. Your method seems very interesting and for zooming it might be the fastest, if not using Oswald's tables.

Anyhow, just today I improved my routine greatly. I noticed that if I have 256 volume/zoom levels instead of 16, 32, 64,.. I can cut the code greately:

K*L/256=M
K: sampled data
L: volume/zoom level
M: scaled sample

lda K
ldy L
jsr mult ;K*L = (byte low, byte high) = (x reg, accu)
sta M


thus getting rid of all the ror lsr and having 256 levels of resolution! After all in the application I am working on 256 levels are better than 16.
Furthermore the X register, which is the least significant part of the multiplication, contains the reminder, which might come handy in some applications. In fact:

K*L=256*M+X

Still I think WVL's method is faster. I will give it a try sooner or later. Great idea!
Previous - 1 | 2 | 3 | 4 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
Acidchild/Padua
MWR/Visdom
Mike
Peacemaker/CENSOR/Hi..
Alakran_64
bexxx
Slajerek/Samar
Andy/AEG
E$G/HF ⭐ 7
lotus_skylight
iceout/Avatar/HF
Guests online: 117
Top Demos
1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.6)
5 Edge of Disgrace  (9.6)
6 What Is The Matrix 2  (9.6)
7 The Demo Coder  (9.6)
8 Uncensored  (9.6)
9 Comaland 100%  (9.6)
10 Wonderland XIV  (9.6)
Top onefile Demos
1 No Listen  (9.6)
2 Layers  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.6)
6 Dawnfall V1.1  (9.5)
7 Rainbow Connection  (9.5)
8 Onscreen 5k  (9.5)
9 Morph  (9.5)
10 Libertongo  (9.5)
Top Groups
1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Censor Design  (9.3)
5 Triad  (9.3)
Top Graphicians
1 Mirage  (9.8)
2 Archmage  (9.7)
3 Pal  (9.6)
4 Carrion  (9.6)
5 Sulevi  (9.6)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.057 sec.