[CSDb] - User Forums - Your favorite 3d tricks

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Your favorite 3d tricks

2008-04-17 20:36

doynax
Account closed

Registered: Oct 2004
Posts: 212

Your favorite 3d tricks

In the interest of educating newbies like me who don't yet know all the tricks, and detracting attention from that ridiculous multithreading "debate" I thought we could share some tips on how to write a speedy vector part.
Because the depressing fact is that aside from this very forum there seems to be about zero information on how to do 3D on 8-bit systems, unless it's all been hiding in some old diskmag somewhere (and as far as I'm concerned information that isn't indexed by Google might as well not exist).

Now, to start things off..

You can get rid of all those nasty multiplications when building a rotation matrix from Euler angles by exploiting the trigonometric product rules ( 2*cos x*cos y = cos(x - y) + cos(x + y) and friends ). I realize that this is probably common knowledge but its far from obvious and without it those multiplications are real cycle eaters. Oh, and thanks for the tip WVL :)

Another method is an alternative to Bresenham, or fixed point slopes, for deciding whether to continue on the major axis or move diagonally when drawing lines. The idea here is simply that those decisions can easily be precalculated and stuffed into bit arrays. The tricky part here is that you can use SBX to test these bits while letting the same mask for figuring out what pixels to plot in a horizontal run in turn mask our decision bits in X.

In other words:

	lda #%11111111
	sbx #%10000000
	bcs .x1

	eor #%01111111
	eor pixels,y
	sta pixels,y
	iny

	lda #%01111111
.x1	sbx #%01000000
	bcs .x2

	eor #%00111111
	eor pixels,y
	sta pixels,y
	iny

	lda #%00111111
.x2	sbx #%00100000
	bcs .x3
	.
	.
	.

So please post any interesting suggestions you've got. Any efficient and/or elegant methods for doing clipping, transformations, line drawing, lighting, dithering and all the other black arts..

2008-04-17 21:01

Skate

Registered: Jul 2003
Posts: 494

read c=hacking magazine issues #8 and #9 first. that should help a lot.

2008-04-17 21:23

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quote: read c=hacking magazine issues #8 and #9 first. that should help a lot.

Thank you.. Those articles are surprisingly complete.

2008-04-17 21:28

Oswald

Registered: Apr 2002
Posts: 5094

hmm this sbx trick is nifty, I dont even understand it :) apart from the sine theorems I'm afraid there are not much tricks left, except the x*y=((x+y)^2/2)-((x+y)^2/2) which you probably aready know. and stuff like getting all cube coordinates with additions/substraction of the matrix elements, using all the symmetry you can get, etc.

there's grahams's method to rotate 192 dots in one-der: you build a rotation matrix, then precalculate (adc speedcode) the multiplies of the x,y,z elements of your unit vector. then you can calculate the 3d positions of a lot of dots based on those tables with additions/substractions. to speed up address calculations you can lay out 3 32 char wide char matrices so the start of each char row will be aligned to a page. then to speed up dot clearing you can store the calculated addy into an sta xxxx speedcode after having the dot plotted. 32 char wide char matrixes are ideal for plotters.

another trick is to use a character screen. linedrawing will be very costly as you have to plot chars aswell etc, but you gain a lot when it comes to eorfill/clr. Desert Dream uses this method to plot the lines in the linetunel&lineball part, and the rotating "1024" text & circle saw & at the pyramids. (melon and spaceship are sprites)

speaking of dithering natural wonders bugs me a lot, looking at the code it appears me it shouldnt be able to do what it does but it still works :)

2008-04-17 21:48

Martin Piper

Registered: Nov 2007
Posts: 722

SNES Star Fox (Star Wing in Europe) used the character screen for its 3D rendering.

2008-04-17 21:54

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quote:

hmm this sbx trick is nifty, I dont even understand it :)

All it really does is use subtraction and SBX's masking effect to test and clear the bits in order (i.e. msb to lsb).

Quote:

there's grahams's method to rotate 192 dots in one-der: you build a rotation matrix, then precalculate (adc speedcode) the multiplies of the x,y,z elements of your unit vector. then you can calculate the 3d positions of a lot of dots based on those tables with additions/substractions.

I've been trying to do something similar, that is transform vertices manually from prescaled matrices, but keeping track of things is tricky. I mean a cube is fine but try hand-rolling a dodecahedron..

Quote:

another trick is to use a character screen. linedrawing will be very costly as you have to plot chars aswell etc, but you gain a lot when it comes to eorfill/clr. Desert Dream uses this method to plot the lines in the linetunel&lineball part, and the rotating "1024" text & circle saw & at the pyramids. (melon and spaceship are sprites)

You know, this is actually what I've been doing =)
Not out of choice really, as it was a damned mess to implement, but I've been working on the NES which lacks a bitmap mode or fast VRAM access (here's a preview which only works in Nestopia).

Quote:

speaking of dithering natural wonders bugs me a lot, looking at the code it appears me it shouldnt be able to do what it does but it still works :)

I would have assumed it was only a matter of EOR filling alternate scanlines separately. Are you saying there are more efficient alternatives, or is there just more to it that I've missed?

By the way is there any viable way of rendering concave models with EOR filling? That is methods to efficiently figure out which polygons might need to be clipped against each other and clever ways of doing the actual clipping.

2008-04-17 22:21

Oswald

Registered: Apr 2002
Posts: 5094

here's an idea I havent used so far: if you have a kind of small amount of vertices like in case of a dodecahedron, one could use precalculated tables to scale the unit vector to match the needed coordinates, then its all adcs and table lookups again, except for the perspective.

dithering: well maybe its just me being dumb.. :/ but I can not visualise how graham can do it with just one eorbuffer, and plotting each times 2 pixels under eachother in the liner. for the dither not to be messed up the line's pixel pattern must be aligned vertically, but the lineroutine seem to be not care about that (ie seemingly no anding of lowmost bit of y coord). also plotting 2 pixels each time would mean a loss of resulotion which is not visible on the result :)

the only viable way to do concave models with eor filling is presented in altered states imho :)

btw your NES routine is _very_ impressive :)

2008-04-17 23:04

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quote:

here's an idea I havent used so far: if you have a kind of small amount of vertices like in case of a dodecahedron, one could use precalculated tables to scale the unit vector to match the needed coordinates, then its all adcs and table lookups again, except for the perspective.

That's *exactly* what I've been doing, except even the perspective is based on logarithm tables. Granted, the precision is a bit crappy but I can live with it.

Quote:

dithering: well maybe its just me being dumb.. :/ but I can not visualise how graham can do it with just one eorbuffer, and plotting each times 2 pixels under eachother in the liner. for the dither not to be messed up the line's pixel pattern must be aligned vertically, but the lineroutine seem to be not care about that (ie seemingly no anding of lowmost bit of y coord). also plotting 2 pixels each time would mean a loss of resulotion which is not visible on the result :)

Okay.. Well, you have fun disassembling that one and I'll let you get to me when you've figured it out ;)

Quote:

the only viable way to do concave models with eor filling is presented in altered states imho :)

Huh. The only thing I noticed was a three second sequence with a cube and a skewed lid which overlapped the base. But from the looks of it they simply assign two bit patterns to the same color so the overlap doesn't matter.
I still think it should be possible to handle real concave objects without precalculating everything. Just imagine the sheer awesomeness of a 3D text scroller.

Quote:

btw your NES routine is _very_ impressive :)

The limited RAM space and crappy graphics chip may be real problems but you can still do a lot more with that 1.7 MHz processor and large ROMs than people seem to realize. I mean good luck fitting 32k of unrolled line drawing code or a two-dimensional atan2 table into a C64 demo.

2008-04-17 23:08

Cruzer

Registered: Dec 2001
Posts: 1048

My favorite trick is to precalc the data, since doing realtime math on 1MHz/8bits seems a bit too ambitious for me. The usual argument against precalculation is of course that it takes up too much memory, but what I have learned is that it can be packed enough to leave room for everything you need, including unrolled code.

E.g. face hidden status, or any other status that typically stays the same for a number of frames, like which direction a texture mapped face is drawn, can be packed down to an initial state and a count for each time it changes, as well as some bits for the new value. Or if it's binary, like face hidden status, the latter can be omitted, since there is always only one other available option to switch to. So an object with 20 faces that each change hidden state on average 10 times in the duration of the effect, the total memory consumption would be like 220 bytes.

Coords can be packed pretty good too with delta-packing. I'm working on some vectorballs where each coord only takes about 4-5 bits for both X and Y values. So for the 49 bobs that I have currently, it takes about 27 bytes/frame, which means I can have 30 seconds of unique movement running in 25 fps with about 20K of tables.

Other favorites are some routines I developed for filled vector some years ago, like this one for line drawing, using precalculated y-coords for each pixel, which is of course faster than any iterative algorithm (Bresenham go home:)...

ldy yCoords,x
lda #pixel
eor (fillerPnt),y
sta (fillerPnt),y

Which happens to work very well with this filler (notice immeditate mode - two cycles faster than normally)

eor #byte
sta gfx

The drawback is that it only works if the lines are max 51 pixels high, but I have a new version on the way that tackles this by dividing the line into smaller segmnents, that can be drawn in "parallel" which in return makes the whole thing even faster, since the coord lookup, and in some cases the pixel value can be reused for several pixels.

Another idea which I haven't used yet is to make the filler skip the unnecessary parts. Most of the time a typical eor filler is just wasting time on eor'ing with 0 and storing the same byte that was there before, so if you could make it skip at least a good bit of these wastelands of nothingness, there would be a lot of cycles to gain, especially for big flatshaded objects. I have some ideas on how to do this without adding any overhead to the linedrawer/filler, and without using too much memory or rastertime for administration, but I'm afraid I can't go into details.

2008-04-17 23:31

Oswald

Registered: Apr 2002
Posts: 5094

imho you cant do better than altered states' trick or precalculating the lines. doing inconvex objects with eorfiller would need a very complex routine. its not only clipping the lines, but theres even need to mess with the colors :(

2008-04-18 06:52

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quoting Cruzer

Coords can be packed pretty good too with delta-packing. I'm working on some vectorballs where each coord only takes about 4-5 bits for both X and Y values. So for the 49 bobs that I have currently, it takes about 27 bytes/frame, which means I can have 30 seconds of unique movement running in 25 fps with about 20K of tables.

And I'd wager that floppy streaming comes in handy here too.

Quote:

Other favorites are some routines I developed for filled vector some years ago, like this one for line drawing, using precalculated y-coords for each pixel, which is of course faster than any iterative algorithm (Bresenham go home:)...

Hey, that's pretty clever :)
Y-major lines work out particularly nicely as they'll skip right over the invisible parts and you'll automatically hit the right X coordinate.
Honestly, I wouldn't have thought there'd be enough memory for this kind of scheme. But as you say a bit of subdivision (or just plain short lines) it ought to work out. Too bad it won't work in char mode though.

Quote:

Another idea which I haven't used yet is to make the filler skip the unnecessary parts. Most of the time a typical eor filler is just wasting time on eor'ing with 0 and storing the same byte that was there before, so if you could make it skip at least a good bit of these wastelands of nothingness, there would be a lot of cycles to gain, especially for big flatshaded objects. I have some ideas on how to do this without adding any overhead to the linedrawer/filler, and without using too much memory or rastertime for administration, but I'm afraid I can't go into details.

Then allow me to speculate ;)
How about keeping a separate bit array with one element per screen tile. Then when drawing a line you'd plot a separate low-resolution outline to the bit array, i.e. a conservative estimate of which cells in the real bitmap the line might cover. The standard line algorithm should work if modified to draw both tiles when moving diagonally. At least that's a scheme I tinkered with, without getting it to work I might add..
But I guess setting up and clipping all those extra lines would be a tad too costly to be practical.

2008-04-18 10:59

The Phantom

Registered: Jan 2004
Posts: 360

Doynax,

Here's a collection of all the source code I have, from various people around the world.

There is a source code from Cruzer/CML that goes into detail on a 12 faces glenz vector. Should be in the source code (txt) section of the zip.

There are a lot of other sources in this collection (demos, game, magazine, multiplexing, etc).

The link is limited to so many MB a day, so if it doesn't work, try again tomorrow.

http://h1.ripway.com/ThePhantomFOE/SourceCode.rar

The file should be 828KB..

2008-04-18 11:47

Graham
Account closed

Registered: Dec 2002
Posts: 990

Quoting Cruzer

My favorite trick is to precalc the data, since doing realtime math on 1MHz/8bits seems a bit too ambitious for me.

The math is very simple and can be very fast. Natural Wonders is doing everything 100% realtime, still most of the objects run in 25 fps. And that with far more than 8 bit math (matrix is 24 bit, rotation and z-scaling 16 bit etc).

Quote:

E.g. face hidden status, or any other status that typically stays the same for a number of frames

In 50% of the cases you just need to EOR all signs of the deltas and you know the hidden status. In the other 50% you can try to simply choose other vectors so that EOR-sign-check works again, or simply do those two muls which are also quite fast. But for platonic objects like in Natural Wonders you can do an even far easier check: Simply check the Z-value of the face midpoint against a visibility threshold.

2008-04-18 11:54

Oswald

Registered: Apr 2002
Posts: 5094

some words about why the dither works master? :)

2008-04-18 12:39

Graham
Account closed

Registered: Dec 2002
Posts: 990

Modified EOR fill code and special line routine :)

2008-04-18 13:14

Axis/Oxyron
Account closed

Registered: Apr 2007
Posts: 91

@Doynax:

The technique you describe (a bitarray keeping track of which block of the screen must be xor filled and which not) is very close to what i have done for the big cube viewed from the inside in natural wonders. It doesnt use a bitarray but one extra byte per char on the screen, which makes the line drawing and filling faster (you dont need any and/ora/xor for manipulations and testing) at the cost of <1kb. I think using char resolution for this extra data is obvious, because so you can use precalculated chars for the areas that dont need to be xor filled. I remember TTS, Graham and Me discussing about this technique back in 1994 as if it was yesterday.

2008-04-18 13:23

Oswald

Registered: Apr 2002
Posts: 5094

Quote: Modified EOR fill code and special line routine :)

hmm drew some drafts, looks like your method works auto-magically. I always thought lines should be drawn into 2 bbuffers twice.

2008-04-18 14:59

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quoting Axis/Oxyron

The technique you describe (a bitarray keeping track of which block of the screen must be xor filled and which not) is very close to what i have done for the big cube viewed from the inside in natural wonders. It doesnt use a bitarray but one extra byte per char on the screen, which makes the line drawing and filling faster (you dont need any and/ora/xor for manipulations and testing) at the cost of <1kb. I think using char resolution for this extra data is obvious, because so you can use precalculated chars for the areas that dont need to be xor filled. I remember TTS, Graham and Me discussing about this technique back in 1994 as if it was yesterday.

In that case it's exactly what I'm currently doing, what with the NES not having a bitmap mode and very little RAM.
The messy bit is that crossing the char boundaries inline when drawing the lines as it eats about as many cycles as the blitting itself, besides which it doesn't work with Cruzer's y-tables at all. So it'd be nice if you could do an optimized preparation pass instead, that is something to preallocate the line's characters in the video-matrix/bit-array. You'd still keep a full bitmap for the lines even if the output would be in char mode, so the lines could be addressed easily. But on the plus side inline allocation allows you to have special-case code for fresh characters and clearing only the undrawn bytes of the chars in-line.

Quoting Graham

The math is very simple and can be very fast. Natural Wonders is doing everything 100% realtime, still most of the objects run in 25 fps. And that with far more than 8 bit math (matrix is 24 bit, rotation and z-scaling 16 bit etc).

I just don't see how you do it. I've only got 16-bit precision in the matrix and 8-bits (9 really..) for the transformations, plus lookup tables for everything, manually built vertices and so forth, yet it *still* takes over 2500 cycles to process a damned cube. Yet you're throwing around dodecahedrons with twice my precision without taking a noticeable hit.
See why I think you're all holding out on the really good techniques? ;)

Quoting Graham

In 50% of the cases you just need to EOR all signs of the deltas and you know the hidden status. In the other 50% you can try to simply choose other vectors so that EOR-sign-check works again, or simply do those two muls which are also quite fast. But for platonic objects like in Natural Wonders you can do an even far easier check: Simply check the Z-value of the face midpoint against a visibility threshold.

But.. That would be cheating.. =)

By the way is there any particularly good method for working out the lighting? I've tried using values you get as a byproduct of back-face culling but they're in screen coordinates so the precision is already shot to hell (having the surfaces flicker between two colors isn't all that appealing). Another idea I had was to store normals in spherical coordinates and which you could rotate by tweaking the angles, but I couldn't work out a corresponding regular rotation matrix (in fact I doubt whether it's even possible in the first place).
Then there's always the option of quantizing the rotation along the X and Y axis down to (say) four or five bits with which to indexing into a set of precalculated colors tables, that is one such table per normal minus whatever you can reuse with symmetry. Not my idea of an elegant solution perhaps but it would undoubtedly work.

2008-04-18 19:53

Graham
Account closed

Registered: Dec 2002
Posts: 990

Quoting doynax

I just don't see how you do it. I've only got 16-bit precision in the matrix and 8-bits (9 really..) for the transformations, plus lookup tables for everything, manually built vertices and so forth, yet it *still* takes over 2500 cycles to process a damned cube.

There is a transformation routine for each object. For the octahedron it simply uses the matrix vectors as coordinates, for the cube it adds 3 matrix vectors for each coordinate. Also a lot of mirroring etc is hardcoded.

Quoting doynax

But.. That would be cheating.. =)

For surfaces were the midpoint builds a vector towards the origin which is orthogonal to the surface, this Z-value is as good as a cross product for hidden surface calculation. In that case it was even far easier because there even is no explicit hidden surface check. Z is also used for the surface shading, so if the visibility threshold is passed, it simply sets the surface color to 0 which automatically let's the surface disappear.

The other check (50% sign check) is also quite simple, look at the cross product:

z = dx1*dy2 - dx2*dy1

Now if dx1*dy2 is negative and dx2*dy1 is positive, you know that the result will ALWAYS be negative. Same for (pos)-(neg)=(always pos). So with a simple EOR of all 4 signs you can avoid the calculation in 50% of all hidden surface checks. And if you also try a few other surface vector combinations, you might be able to avoid the multiplications in far more than 50%.

2008-04-19 08:45

JackAsser

Registered: Jun 2002
Posts: 2014

I won't dwell into implementation details and not even algorithms since the most common already have been mentioned in this thread and in this thread: 3D projection on the C=64 (or...how do I divide?)

What I like to add to the discussion in the more general field:

* Don't be afraid to do real time filled vectors. The math, if done right, is minimal compared to line drawing and EOR-filling (assuming typical platonic soilds demo parts).

* Implement a good frame rate counter early in the process. Many "optimizations" really arn't any optimizantions because the overhead simply is too great. For example, an unrolled EOR-filler takes approx (4+4)*128 = 1024 cycles to fill a column. A potential speed up would be only to EOR-fill the chars that contain lines and simply STA for the rest. At most you can gain 4*128 = 512 cycles (per column). Any extra overhead in the linedrawer, in the code modifier (JSR+RTS) into the EOR-areas and STA areas quicky eats up those cycles. So DO add a frame rate counter FIRST so that you see that you really get bang for your bucks.

* Think out of the box and forget about doing everything so damn general (vertex coords, object data, dot procuts, matrix multiplications etc.) Start with the normal 3d-calcs, limit the degree of freedom etc. Take apart the problem and implement it smart. F.e. to calculate all 8 corners of a cube, use the fact that each corner simply is a simple combination of the world axices and the fact that 4 of the corners are mirrored verion of the other 4. Another example that Graham mention is backface culling for platonic solids, it's extremly simple and requiers only a CMP, assume eye in origo looking at Z=1, then do your favorite back face culling calcs and remove all multiplications with zero. You'll see that any method will result in a Z<e comparison where e is constant.

* Try doing something more than everybody else. Natural Wonders and Panta Rhei are both fine examples. First he implemented sub-pixel precision, then in NW he implemented dithering which I have to say uses one of the most elegant solutions for the odd/even fill problem with dither patterns. It is so simple it's hard to explain. ;D

* Release your stuff!

Thank you, that's all I think! :)

2008-04-19 18:03

A Life in Hell
Account closed

Registered: May 2002
Posts: 204

Quote:

* Implement a good frame rate counter early in the process. Many "optimizations" really arn't any optimizantions because the overhead simply is too great. For example, an unrolled EOR-filler takes approx (4+4)*128 = 1024 cycles to fill a column. A potential speed up would be only to EOR-fill the chars that contain lines and simply STA for the rest. At most you can gain 4*128 = 512 cycles (per column). Any extra overhead in the linedrawer, in the code modifier (JSR+RTS) into the EOR-areas and STA areas quicky eats up those cycles. So DO add a frame rate counter FIRST so that you see that you really get bang for your bucks.

This one can't be emphisised enough... I'd suggest taking it even lower level than that though. Three minutes spent patching vice (Consider a one line patch in cpu.c that printf's cycle/current PC address) and a few scripts to sift through it (try http://artificial-stupidity.net/~alih/ , process-log.c and profiler.py) can help you a lot. Or at least helped me a lot. YMMV.

EDIT: the patch is line #2130 in 6510core.c in viceplus, done like:

trap_skipped:
+#ifndef DRIVE_CPU
+ printf("profile %d %d\n", CLK, reg_pc);
+#endif
SET_LAST_OPCODE(p0);

2008-04-19 20:39

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quoting \^_^/

This one can't be emphisised enough... I'd suggest taking it even lower level than that though. Three minutes spent patching vice (Consider a one line patch in cpu.c that printf's cycle/current PC address) and a few scripts to sift through it (try http://artificial-stupidity.net/~alih/ , process-log.c and profiler.py) can help you a lot. Or at least helped me a lot. YMMV.

Hey, that's pretty cool.

Now you've got me thinking about my wish-list of features for the ideal profiler. In addition to just measuring the number of cycles in a straight piece of code you'd want to know things like how long the line setup time takes compared to the plotting code, how often a branch is taken, how often each variable is accessed, the frequency of page crossing penalties and so forth.

The annoying thing is that the critical code is as often as not split out over 10k of generated code, and what label lists you may have are usually quite lacking (mixing constants and addressed, missing local labels, no bindings to the sourcecode, etc). A truly excellent profiler would require integration with the assembler and be controlled by a powerful scripting language.

2008-04-20 10:22

Skate

Registered: Jul 2003
Posts: 494

About lighting and dithering, I'd like to remind you two demos and a their open-source engine.

Water/Aesrude
Water 90%

Mist/Civitas
Mist

Dalaman 3D Engine
Dalaman 3D Engine

All coded by Nightlord for your pleasure.

2008-04-21 14:18

Cruzer

Registered: Dec 2001
Posts: 1048

Quote:

Then allow me to speculate ;)
How about keeping a separate bit array with one element per screen tile. Then when drawing a line you'd plot a separate low-resolution outline to the bit array, i.e. a conservative estimate of which cells in the real bitmap the line might cover. The standard line algorithm should work if modified to draw both tiles when moving diagonally. At least that's a scheme I tinkered with, without getting it to work I might add..
But I guess setting up and clipping all those extra lines would be a tad too costly to be practical.

You were right about dividing it into some cells, but as usual my philosophy is "why do it in realtime if it can be precalculated without taking up too much memory?"

One idea I have is to analyze in advance which cells needed to be filled when, and save this as a packed list that could be depacked in realtime w/o adding too much overhead. Kinda an extension on how I optimized the funfiller in YKTR2, except that only worked for start/end for each column, and not for "holes" inside.

The number of cells could be a round number like 256, which means each cell will be 1/256th the size of the filled area. So if it's using one charset, each cell will be a char big, but they could be any other size depending on the effect's size, and no one says 256 is the optimal number either. For each of them we need to predetermine whether it can be skipped for each frame, which could be done with some conditions like that all pixels must the same as the previous frame, and that the inital value before any eor'ing has been done is the same as the value after the last eor. If this is the case there is no need to fill this cell at all, and both eor'ing and sta'ing can be skipped.

To make the filler skip a cell, the first command must be changed from eor #$00 (assuming that's how the filler works, but it would work for absolute mode as well with a few modifications) to some branch, e.g. bcc, which skips to the next cell. This can be done by eor'ing the opcode with #EOR_IMM^BCC and the argument byte with the branch length. To switch it back to fill-mode, it just needs to be eor'ed with the same values again. This code modding could for instance be done with 256 specialized routines, which I think would be pretty quick.

Storing which cells to change can be done with one byte/frame for each cell that must swap status. Since there is always only one option on what the state should be (the opposite of the current one) we just need the cell number, thus one byte.

The size of the look-up tables depends on how many cells would change state/frame. E.g. if we have a cube, and it rotates from one face facing the eye to another, probably most of the cells would have to change state about two times => 512 bytes used. And if this happens 10 times during the effect, we would need 5K. So probably not a big deal as long as the effect is pretty simple and shortlived. But I could be wrong, need to make some proof-of-concept test to be sure.

The advantage of this method is that it adds very little administration overhead. No need for additional lowres lines or anything like that, and the filler itself doesn't get any slower at all. It doesn't even know anything about that it's optimized, it just gets modded from the outside, which I think should be possible to do pretty fast.

Guess the only problem I can see (apart from the agony of implementing it) would be if it turns out to require more memory than expected. But I have some schemes for this as well. E.g. to add some rigidity, so only cells which have a certain number of fill-free frames in a row gets switched to skip-mode, otherwise they just stay in fill mode. Another idea could be to add some additional "layers" of bigger cells to make it possible to switch big areas faster and with lesser memory consumption. And maybe it's more efficient to do the start/end of each column more like I did for the YKTR2 effect, which as far as I remember didn't consume very much.

2008-04-22 07:18

HCL

Registered: Feb 2003
Posts: 728

@Oswald, it's interesting to see that you hardly believe the dither-shit in NaturalWonders is possible. Especially after seing your large dithered vectors in Real.. It's obvious how you made that one, and NW is just one step ahead. I don't know if Graham was inspired of your vector, but i was when i made my dithered vector, and my line-routine looks much more like yours.. drawing in two buffers.

Well, i think you have figured out how Graham does his dithered line-drawing by now, and you see it really is using two buffers, only they are aligned 1 byte from each other.. the line routine ajusts (with the eor(zp),y) to only draw one bit-pattern into one buffer and another bit pattern into the other. The eor-filling is then made with two buffers. The gain: only draw one line (two lines at the same time), the loss: always draw both lines, even if one is invisible.

2008-04-22 18:35

Krill

Registered: Apr 2002
Posts: 2980

Quote:

but as usual my philosophy is "why do it in realtime if it can be precalculated without taking up too much memory?"

Sigh. Looks like i'll always stay the king of the c64-4k hill. :\

2008-04-22 19:11

Oswald

Registered: Apr 2002
Posts: 5094

HCL,ah I understand now. (forgot meanwhile the eor (),y stuff, now rechecked). Im suprised now I didnt get it for the first time... :) btw glad that real inspired you :) that vector in vector stuff was inspired in turn by some classic amiga demo. cant recall its name atmo tho.

2008-04-24 15:45

doynax
Account closed

Registered: Oct 2004
Posts: 212

I just did some back-of-the-envelope calculations and it seems like I could optimize the transformation code by a factor of two, save lots of memory on code and tables as well as increase the precision by dropping the perspective divisions.

The depressing part is that the only object which actually seems to benefit from it is the cube (and even there I got a lot of complaints about the exaggerated perspective), more complex models just look plain weird if you let the Z depths have more than about a 25% effect. Sure, if you had a real scene to place the camera within, with Z-clipping and everything, then it'd be invaluable but for a single object spinning around in the middle of the screen there isn't much you can do with it..

Does anyone actually ever bother with proper perspective projection?

2008-04-24 16:09

Oswald

Registered: Apr 2002
Posts: 5094

perspective projection is just a table and a multiplication.

x=(x/z)*n
x=x*1/z)*n

n is constant. you'll have a table of (1/z)*n. look up the value for Z, use a 8x8=16 bit mul, throw away the low8 bits. done.

afaik everyone does it like that. or similarly.

2008-04-24 16:30

doynax
Account closed

Registered: Oct 2004
Posts: 212

And that's just what I've been doing it myself but with a 16x8-bit multiplication instead. Well, in the old demo I used logarithmic division but the precision was horrible.

But without perspective there's no need to work out the Z value of the vertices, or the Z vector of the matrices or store all those multiplication tables. Plus you don't need to work out the low-bytes of X and Y coordinates when building the vertices, only propagate carry from the additions/subtractions (this saves a surprising amount of cycles). Or at least that's what I think should happen *if* I dropped Z support, but I might have missed something.

It seems like the only reason I'm doing the perspective right now is because it wouldn't be "real" 3d without it..

2008-04-24 18:04

Oswald

Registered: Apr 2002
Posts: 5094

you can only get away with dots/bobs without perspective imho. I dont understand why it is such a big problem to you tho. and 3d rotations: by using symmetries and thinking of the 3d matrix as vectors you can boil down vertex calcs into a series of additions of 3d vectors, after you have built your matrix with a series of additions.

I think you are aiming simply too high, your methods of doing 3d actually represent the state of the art way of doing it on the c64, you cant do it much faster. 98% of c64 demos doing 3d do it slower than how you already do it.

2008-04-24 18:46

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quoting Oswald

you can only get away with dots/bobs without perspective imho.

I've been trying out a few things in my C prototype and the whole thing is kind of weird.
For low-polygon objects (say a cube or a tetrahedron) you truly *need* a small bit of perspective or the the illusion of 3d falls apart, it doesn't have to be much mind you but it has to be there (about 10% of scaling difference between nearest and farthest points is sufficient). On the other hand if you try let the Z value have more than a 30% effect on more complex objects (an icosahedron or a dodecahedron in my case) then it just looks *wrong*. And here I see no problems whatsoever when dropping Z altogether.

Yet I don't particularly feel like maintaining separate code paths for them. Perhaps the simple objects would look better with a bit of lighting?

Quoting Oswald

I think you are aiming simply too high, your methods of doing 3d actually represent the state of the art way of doing it on the c64, you cant do it much faster. 98% of c64 demos doing 3d do it slower than how you already do it.

Trying to optimize this bloody program has become a matter of diminishing returns, and the transformation code just happens to be the easiest thing to attack right now.
I've just run out of practical ideas on how to squeeze cycles out of the line drawing or filling code and I'm not quite ready to give up yet, which is why I created this thread in the first place I suppose.

Refresh

Subscribe to this thread: