| |
TWW
Registered: Jul 2009 Posts: 545 |
SID to calculate line slope
To calculate a slope of a line you can use something like:
lda DeltaX
NoChange:
tax
(Plotpixel)
sec (Might not be neccessary if done right...)
txa
sbc DeltaY (Immediate)
bcs NoChange
(Change shit)
adc DeltaX (Immediate)
jmp NoChange
Which gives 11/15 cycles for each calculation/pixel(Oswald do you agree?^^). If unrolling one would have to add a cycle on the sbc/adc by use of ZP instead of Immediate values which means 12/14 cycles (No jmp if unrolled).
This is pretty good. In fact the two things I see as a possible optimalisation here is getting rid of the SEC by use of clever programming and the ADC DeltaX if you scale the relation between DeltaX and DeltaY so that the largest value = 256. You would require a division and mayhaps you get reduced accuracy (not confirmed though). Looking aside from the div. (tables perhaps?!) you'll end up with 9/11 (10/9 if unrolled(need to watch the branching here and build the routine so that you'd get a 9/10 instead) cycles fully optimized. (Am I missing something?)
However...
I read somewhere (Can't remeber where) that there is a technique where you can use the SID to calculate slopes. The SID is the chip i know the least about so I figgured I'd post here the folowing questions:
#1: Can it be done?
#2: If yes, How?
#3: How much time would the calc. use?
-TWW |
|
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
:)
you dont need clever programming to remove the sec, if you do the bresenham as it supposed to work, then the adc will always overflow and make a sec. (or vice versa, I usually do adc first, and sbc if overflow)
I've found that a logarithm/exponantal division is accurate enough for these line slopes:
ldx nr1
ldy nr2
lda log,x
sec
sbc log,y
tax
lda exp,x
sta result
SID could be used, but it introduces too much timing problem. U can read out the waveform of the 3rd channel on a register, so with correct timing/frequency you can use the triangle/sawtooth waveform for this.
; setup SID here
ldx magicsidregister
lda bytecolumn,x
ora actmask
sta bytecolumn,x
ldx magicsidregister
bad lines will most likely fuck the whole thing up, or even a simple music irq which bcomes at the middle of your line drawer routine. |
| |
Frantic
Registered: Mar 2003 Posts: 1648 |
Codebase64 lacks good info on fast implementations of line drawing. If someone is up to it, please ago ahead...
http://codebase64.org/doku.php?id=base:lines
//FTC |
| |
TWW
Registered: Jul 2009 Posts: 545 |
I have found my own implementation (not the code but the table generation) of the log exp tables which seems to be accurate enough (Tested/perfectioned in the immortal MS excell^^);
// Kickassembler format
.pseudocommand DivU8 teller;nevner {
ldx teller
ldy nevner
sec
lda LogTable,x
sbc LogTable,y
tax
lda ExpTable,x
}
.align $100
.pc = * "LogTable"
LogTable:
.fill 256, round(45.3*log(i))
.pc = * "ExpTable"
ExpTable:
.fill 256, round(exp(i/45.3))
Slope-claulations;
NoChange:
tax
// PLOT PIXEL
txa
sbc #magic_number // Immediate if looped or ZP if unrolled
bcs NoChange // If wraparound the correct number is already loaded so no need to add DX or simmular.
tax
// CHANGE SHIT
unrolled / jmp NoChange+1....
Which means 9-10 cycles each pixel. I do not think there is any faster way to implement this but hey I might be wrong (as usual)!
edit: Frantic- Yeah yeah when it's done i'll put it you know where :) |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
"If wraparound the correct number is already loaded so no need to add DX or simmular."
where is it already loaded ? X contains the count value for the sbc ? also will you have 256x line routines for all the magic numbers?
txa
sbc
tax
bcs
will save you a tax I guess. |
| |
TWW
Registered: Jul 2009 Posts: 545 |
Quote: "If wraparound the correct number is already loaded so no need to add DX or simmular."
where is it already loaded ? X contains the count value for the sbc ? also will you have 256x line routines for all the magic numbers?
txa
sbc
tax
bcs
will save you a tax I guess.
Yes. I had already changed the tax in the linecode.
The setup of the linealgo. should include:
* Gfx pointer ($fc:$fb) for the plotting
* Largest Delta/2 into X
* ZP value used for the sbc based on the following:
ZPv = Smallest Delta / Largest Delta * 255
Based on this relation (largest delta scaled up to 256):
Smallest Delta Magic Number
-------------- = ------------
Largest Delta 255
which means you don't need to add/sub the smallest delta since the value wraps around 255 after the relation-scaling :)
the "CHANGE SHIT"
is then responsible for altering gfx pointers / bitmask if neccessary....
hope this is not half as confusing as I think it is :) |
| |
Frantic
Registered: Mar 2003 Posts: 1648 |
Krill once told me that you can plot all the pixels that will go into the same byte at once, somehow. I mean, of course a whole byte can always be written at once, but the point here was rather to get the byte value "directly" somehow, rather than looping through each pixel that would go into the same byte. At least that is how I remember what he said. I never got around to figuring out how exactly that might be done because I didn't really need the line routine to be that fast at that time anyway, but since Mr. K said so, I guess there is some nice and clever way to do it. Probably even quite easy and simple once realized... |
| |
TWW
Registered: Jul 2009 Posts: 545 |
Your talking about lines where DX>DY.
I can however promiss you that in vertical to diagonal lines (DY>DX) you only have 1(!) pixel to plot each bytes :)
A horizontal line however is a differen matter^^ |
| |
Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
I tabulated the length of the horizontal segments in lines for which DX>DY. Making use of that table also for lines where DY>DX, I can draw a cube (12 lines) on a 128x128 screen in slightly more than one frame. With tricks like letting the 1541 calculate the rotations, I think it is possible to show a rotating cube faster than 25 fps. |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Setup dy as - dy and dx as dx, then get err saved automatically to x and never bother about the state of carry as well. Code is best placed in zeropage.
-
;plot some pixel
;...
txa
dx=*+1
sbx #$00
bcs -
txa
dy=*+1
sbx #$00
;advance x direction
;...
jmp -
also note that you can combine the upoming subtraction of dx and addition of dy into a single addition/subtraction when you do an change in x direction. |
| |
SIDWAVE Account closed
Registered: Apr 2002 Posts: 2238 |
use a pulsewidth program, to stream into memory from sid code, then make whatever you want. structures. |
| |
Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
So how many cycles per pixel are needed on average if you include EVERYTHING (initialization, slope calculation, changing bitmask, changing gfx pointer for plotting)?
I guess I need about 30-35 cycles per pixel in TOTAL when drawing 12 lines of a cube on a single color 128x128 pixel character screen. I do not unroll the loops and my code does not run in ZP.(For slope calculations I do not use the log/exp method, but just a plain divide. I should change that :-) ). |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
Hard to give exact values, but for a cube in said size it is possible to render it > 50fps while even moving the 16x16 grid across the screen.
With all overhead i need around 9900 cycles per frame (without clear). If you know about the average linelength you can find out how many cycles i need for a pixel :-) |
| |
Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
Sorry, but I don't understand how 9900 cycles is possible.
ora (ZP),y
sta (ZP),y
for the plotting is already 11 cycles. With the 9/10 cycles mentioned above we have at least 20 cycles per pixel. And this does not include any overhead, changing of bitmask etc.
12 lines times 50 pixels per line times 20 cycles is already 12000 cycles. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11386 |
so you just found out the trick is not doing that per pixel =P |
| |
Bitbreaker
Registered: Oct 2002 Posts: 508 |
There's also horizontal lines and hidden surface removal done, the latter is something i include, but reading again that you draw 12 lines, it seems you draw all lines of the cube.
Also one can forgo on the ora if it is sure that the line is out of reach of other lines (what would cause clashes). |
| |
Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
I made some improvements to my code and can draw and rotate 12 lines ("a cube") at about 30-35 fps.
Rotations about x,y, or z axes are over a fixed small angle though (of about 7 degrees). This I do to make the rotations fast. The angle is chosen such that sin(phi)=1/8, then cos(phi) is very close to 1-1/128-1/32768. Further, suppose you rotate first around the x-axis and then around the z-axis, then you can write:
x(i+2)=z(i)+(x(i+1)-z(i+1))*cos(phi)
y(i+2)=y(i)+(x(i+1)-z(i+1))*sin(phi)
where e.g. y(i) is the y-coord after the i-th rotation.
Now you need to calculate only 2 cos/sin terms, while with the rotation matrix there are 4 of such terms.
Similar equations can be derived for 2 consecutive rotations around any 2 axes. |