| |
ws
Registered: Apr 2012 Posts: 251 |
Simple bilinear interpolation in assembler
Hello,
for some reason, with increasing age, i am no longer interested in spending much time on re-inventing the wheel.
Therefore, i'd like to ask if any of you wizards perhaps know of a close solution to this problem:
Lets say you have a "heatmap" in a matrix of 20x10 cells.
Does anyone perhaps already have a fast and simple routine for interpolating all the values within these cells, so that when you have some cells with very high values and some cells with very low values, that more or less smooth transitions (image blur) can be achieved in very few rasterlines?
Google just spat out hardcore math for me, which i feel unable to wrap my head around, when attempted to translate to a c64 assembler solution. (I code in assembly, directly, no c++ or the like, pretty please).
best regards
WS |
|
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Are you looking to interpolate between the values, to get higher resolution?
Or are you wanting to smooth the data without changing the number of cells?
Either way, a useful building block is to calculate a mean with
clc
lda value1
adc value2 ; computes a 9 bit sum in [carry, acc7..acc0]
lsr ; divides result by two, leaving 8 bit value in A
sta result
Unroll. |
| |
ws
Registered: Apr 2012 Posts: 251 |
I want to smooth the values - without changing the number of cells-, preferably using a lookup table (thats what is always the simplest solution i have in my head) but it must be 2-dimensional - and that is actually what gives me some sort of headache..
do i first process all lines horizontally and then vertically? is that the way to go?
because from what i saw it looks like "they" (the math people) are using a 4 point sampling grid (or something like that).
EDIT: and i dont only want to have a result like this:
0000F00000F = 00028200028
but
01248421248
(for example) |
| |
ws
Registered: Apr 2012 Posts: 251 |
i made a graphical example of what i want to achieve.
http://www.wertstahl.de/example.jpg
maybe this clears up things alot, please excuse that i find no other way to express what i am trying to do. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Quoting wertstahldo i first process all lines horizontally and then vertically? is that the way to go?
Yes. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
If you want nice long tails without summing multiple elements per output value, you might need to use an infinite impulse response filter, and probably run it left to right then right to left to make it symmetric.
Maybe something like this? (untested)
.for i in 0,39
adc screen+i
lsr
tax
lda times_3_over_2,x
sta screen+i
lda times_1_over_2,x
.endfor
(basically moves a quarter of the contents of each cell into the next one along, so any peaks get smeared over the next several cells, getting weaker as it goes along) |
| |
ws
Registered: Apr 2012 Posts: 251 |
Thank you so much for the immediate responses!
I have been discussing and chewing on it a little bit and in addition with your thoughts (which all in all is interpolation of thoughts, right?) i'll be playing around with some code soon and i'll post my results here. |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
Quote: i made a graphical example of what i want to achieve.
http://www.wertstahl.de/example.jpg
maybe this clears up things alot, please excuse that i find no other way to express what i am trying to do.
by the looks of this you need rather blur routine. Simply average 4 neighbour cells for each cell, or fire effect in other words.
The question is what you have in mind, how does that differ from blur ?
there have been "plasmas" out there wich work by averaging (interpolating if you like) out between a few hotspots, but they do look ugly. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11386 |
depending on what you are doing, you can also try randomly tweaking your own filters, perhaps throw some dithering into the mix, vary the number of taps you are using. (eg using 3 values only, with different factors). you can perhaps also speed it up a lot by limiting the range of the source values (again depends on what you are doing) so you can use table lookups for everything. |
| |
soci
Registered: Sep 2003 Posts: 480 |
Quote: Are you looking to interpolate between the values, to get higher resolution?
Or are you wanting to smooth the data without changing the number of cells?
Either way, a useful building block is to calculate a mean with
clc
lda value1
adc value2 ; computes a 9 bit sum in [carry, acc7..acc0]
lsr ; divides result by two, leaving 8 bit value in A
sta result
Unroll.
It's ROR and not LSR, right? Otherwise the usable value range is half. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Quoting sociIt's ROR and not LSR, right? Otherwise the usable value range is half.
Argh! Yes, you're right of course. Same applies to my code in comment #6, I meant to use ROR there, too.
Thanks for catching that. |
| |
ws
Registered: Apr 2012 Posts: 251 |
i am currently like half out of the door for xyz but this came to mind, it is just code, not tested.
datafield byte $f,$8,$0,$0, $0,$0,$5,$0
byte $0,$f,$0,$0, $2,$0,$0,$0
byte $f,$0,$0,$0
byte $0,$0,$0 ;simple headroom
gamma byte $0,$0,$0,$1, $1,$1,$1,$1
byte $2,$2,$3,$4, $5,$6,$7,$8
byte $f,$f,$f,$f, $f,$f,$f,$f
byte $f,$f,$f,$f, $f,$f,$f,$f ; peak clipping for demonstration
stor byte $0
;================
blur ldx #$00
do_blur ldy datafield,x
lda gamma,y ;get target value for base cell
sta datafield,x
;----------------
tay ;remember last shade of blur
lda gamma,y ;and blur again
clc
adc datafield+1,x ;combine values
tay
lda gamma,y ;and blur the combined value
sta datafield+1,x ;store
;----------------
tay ;remember last shade of blur
lda gamma,y ;and blur again
clc
adc datafield+2,x ;combine values
tay
lda gamma,y ;and blur the combined value
sta datafield+2,x ;store
;----------------
tay ;remember last shade of blur
lda gamma,y ;and blur again
clc
adc datafield+3,x ;combine values
tay
lda gamma,y ;and blur the combined value
sta datafield+3,x ;store
;----------------
inx
cpx #$14
bne do_blur
rts
;================
please excuse the fucked-up formatting. |
| |
lft
Registered: Jul 2007 Posts: 369 |
Quoting ChristopherJamQuoting sociIt's ROR and not LSR, right? Otherwise the usable value range is half.
Argh! Yes, you're right of course. Same applies to my code in comment #6, I meant to use ROR there, too.
Thanks for catching that.
Although, if half the value range is acceptable, then one could use ASR #$fe to clear carry in preparation for the next computation. |
| |
ws
Registered: Apr 2012 Posts: 251 |
i'd rather have some sort of gamma curve that i can adjust via table. but thanks for clearing up rol vs asl and lsr |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Nice approach :)
Small optimisation suggestion, which would also give you a little more control:
I notice you have snippets like
tay
lda gamma,y ; decay
sta datafield+2,x ; store
tay
lda gamma,y ; decay more
May I suggest instead:
tay
lda gamma,y ; decay
sta datafield+2,x ; store
lda gamma2,y ; decay more
where gamma2=gamma[gamma[i]], or some refinement thereof.
You can also drop all but the first CLC if your gamma/gamma2 values are all under 128. |
| |
ws
Registered: Apr 2012 Posts: 251 |
that is actually a very nice hint, using stairs of tables.
but i just implemented my first attempt and obviously it just dampens all the values instead of mixing them, so as a conclusion:
after the first load of a cell, it cannot just be destructively set to a fixed translation value without being compared to its neighbours.
the reduction value must be related to the neighbouring cells.
example
f000 --> destructive --> 8421 - ok
f200 --> destructive --> 8531 - not ok, must be A531
(just arbitrary example values)
workin on that.
[update]
by the way, this totally out of the blue, super far fetched noise generator which i utilize for creating data to be blurred, works surprisingly well (IN VICE!):
drawdom ldx #$00
paintle
lda $dc01
eor $d800
sta $db00
eor $db00
eor $0400,x
adc $d800,x
sta $0400,x
inx
bne paintle
rts |
| |
ws
Registered: Apr 2012 Posts: 251 |
[update 2 on noise generator: works surprisingly well on a real c64, too. actually best performance i had with bus noise to this day. duh. (it needs to run twice for good entropy.) |
| |
chatGPZ
Registered: Dec 2001 Posts: 11386 |
i strongly advice against using open i/o space for "noise". it is _not_ noise in the first place (what you are reading is what was left on the bus by the previous vic fetch) and on some C64s you will see just zeros or $ff. |
| |
ws
Registered: Apr 2012 Posts: 251 |
Youre probably right, i'd also confirm that a mechanism like this should not be used without prior testing on several "chipsets", i must admit that i am quite obsessed with that method, though :-) I only use it for testing purposes, because i am too lazy to setup the SID noise method (if i remember correctly that that was possible). |
| |
ws
Registered: Apr 2012 Posts: 251 |
okay. this really seems to be the fastest way:
do_blur clc
lda datafield,x
adc datafield+1,x
ror
sta datafield,x
inx
cpx #$00
bne do_blur
demo: http://dl.dataelephant.net/blur.zip (press space to...)
this is a 2 iterations blur, only thing i am going to add is looking back n pixels plus avoiding backshift.
and yes, i must really learn to think less wishful. took me quite some time to grasp the simpleness of the problem. too many images in my head.
ps: if anyone hints me towards formatting source code nicely in this csdb thing, i will happily comply. |
| |
ws
Registered: Apr 2012 Posts: 251 |
i am pretty satisfied already:
http://dl.dataelephant.net/blur_it4.zip
4 iterations simple 1 cell blur. i am impressed how easy this stuff is, compared to trying to imagine it.
(prg+src inside zip) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Quoting wertstahlps: if anyone hints me towards formatting source code nicely in this csdb thing, i will happily comply.
Just follow the Read more link on the comment entry form.
But in short, wrap your code segments in [ code]/[ /code] (only without the spaces after the "["s) |
| |
ws
Registered: Apr 2012 Posts: 251 |
;-------------------------------------------------
linenums = #$05
linecount byte $05 ;(+1)
blur lda linenums ;---
sta linecount ;---
ldx #$00
do_blurX ldy #$26
x_blurX lda $0400+1,x
lsr
lsr
lsr
sta pot
lda $0400-1,x
lsr
lsr
lsr
clc
adc pot
sta pot
lda $0400,x
lsr
clc
adc pot
bcc noclip
lda #$ff
noclip sta $0400,x
inx
dey
bne x_blurX
inx ;---
inx ;--- these and this mechanism is
;--- just there to provide a visual gap
;--- ofcourse the scanning could be seamless
dec linecount ;---
bne do_blurX ;---
rts
;================
pot byte $0
needs deeper thought, but it works, horizontally, though. |
| |
Digger
Registered: Mar 2005 Posts: 437 |
.prg would be good :) |
| |
Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
I can see some optimizations. For example, you can get rid of 2 LSRs by first adding ($0401,x)/2 and ($3ff,x)/2 and dividing the result by 4.
lda $401,x
lsr
sta pot
lda $03ff,x
lsr
clc
adc pot
lsr
lsr
sta pot
Also, does it ever clip? ($400,x)/2 <= 128 and ($3ff,x)/8+($401,x)/8 <= 64, so their sum will not exceed 192.
And I would use a ZP address for pot. |
| |
Sparta
Registered: Feb 2017 Posts: 49 |
(clc)
lda 03ff,x
adc 0401,x
ror
adc 0400,x
ror
sta 0400,x
adc 0402,x
ror
adc 0401,x
ror
sta 0401,x
Ps. Check out my fire effect in Tunnel Vision.
Edit. If unrolled, you can omit ,x indexing. |
| |
ws
Registered: Apr 2012 Posts: 251 |
sparta, that looks very sleek. i was afraid that something like that, which i do not fully understand, might work. how did you come up with this solution? |
| |
Sparta
Registered: Feb 2017 Posts: 49 |
Many many years of code optimization. :))
(JK, I am just a hobby coder)
Actually, I used something similar in my fire effect in Tunnel Vision. Except, I did not have enough memory to fully unroll the loop and I also modified the result using a cosine tab. In the fire effect, the 3rd addition comes from the char line below. Something like this:
0400=(((03ff+0401)/2)+0428)/2
When you add to 8-bit numbers, the result will be a 9-bit number with 1-8th bits in AR and the 9th bit in C. ROR will divide this 9-bit number by 2, rolling the C in AR. This will also modify C again, however, CLC can be safely omitted in most of the cases, because the difference in your result is <=1, and the next ROR will half that, too. |
| |
ws
Registered: Apr 2012 Posts: 251 |
although the rol solution (above) also has this effect of smearing things into one direction, which i am currently trying to suppress so i can do everything in one go, using lookup tables: (having some brightness trouble currently, though)
way less easy than i thought.
(both images show a 4x3 cross and a 3x3 square blurred)
Oh! Thank you for your explanation Sparta! Much appreciated! |
| |
Sparta
Registered: Feb 2017 Posts: 49 |
Double buffering will avoid the skew as you will not overwrite your original values.
Also, keep in mind the above solution is not a true average of the 3 values:
b=(a+2b+c)/4 |
| |
ws
Registered: Apr 2012 Posts: 251 |
And, because i didnt yet say what this is for: it is for a battle tactics ki precalculation. Now i said it. |
| |
ws
Registered: Apr 2012 Posts: 251 |
Ah! I had a double buffering version before but for some reason fell back to single frame! Thanks alot for the hint! That should do the trick! |
| |
Sparta
Registered: Feb 2017 Posts: 49 |
lda buffer1-1
adc buffer1+1
ror
adc buffer1
ror
sta buffer2
lda buffer1
adc buffer1+2
ror
adc buffer1+1
ror
sta buffer2+1
Put it in a loop if you want. Good luck! |
| |
Sparta
Registered: Feb 2017 Posts: 49 |
If speed is not an issue, you may also want to try something like this to get an even better average of 4 neighboring values:
lda buffer1-1
adc buffer1+1
ror
sta ZP
lda buffer-$28
adc buffer+$28
ror
adc ZP
ror
sta buffer2
|
| |
ws
Registered: Apr 2012 Posts: 251 |
Oh man! This is so beautiful! Thank you!!!
http://dl.dataelephant.net/spartablur.prg
(press space to move)
http://dl.dataelephant.net/spartablur.rar
(requires cbmprgstudio to compile:
http:http://www.ajordison.co.uk/download.html ) |