[CSDb] - User Forums - Your favorite 3d tricks

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Your favorite 3d tricks

2008-04-17 20:36

doynax
Account closed

Registered: Oct 2004
Posts: 212

Your favorite 3d tricks

In the interest of educating newbies like me who don't yet know all the tricks, and detracting attention from that ridiculous multithreading "debate" I thought we could share some tips on how to write a speedy vector part.
Because the depressing fact is that aside from this very forum there seems to be about zero information on how to do 3D on 8-bit systems, unless it's all been hiding in some old diskmag somewhere (and as far as I'm concerned information that isn't indexed by Google might as well not exist).

Now, to start things off..

You can get rid of all those nasty multiplications when building a rotation matrix from Euler angles by exploiting the trigonometric product rules ( 2*cos x*cos y = cos(x - y) + cos(x + y) and friends ). I realize that this is probably common knowledge but its far from obvious and without it those multiplications are real cycle eaters. Oh, and thanks for the tip WVL :)

Another method is an alternative to Bresenham, or fixed point slopes, for deciding whether to continue on the major axis or move diagonally when drawing lines. The idea here is simply that those decisions can easily be precalculated and stuffed into bit arrays. The tricky part here is that you can use SBX to test these bits while letting the same mask for figuring out what pixels to plot in a horizontal run in turn mask our decision bits in X.

In other words:

	lda #%11111111
	sbx #%10000000
	bcs .x1

	eor #%01111111
	eor pixels,y
	sta pixels,y
	iny

	lda #%01111111
.x1	sbx #%01000000
	bcs .x2

	eor #%00111111
	eor pixels,y
	sta pixels,y
	iny

	lda #%00111111
.x2	sbx #%00100000
	bcs .x3
	.
	.
	.

So please post any interesting suggestions you've got. Any efficient and/or elegant methods for doing clipping, transformations, line drawing, lighting, dithering and all the other black arts..

... 21 posts hidden. Click here to view all posts....

2008-04-18 11:47

Graham
Account closed

Registered: Dec 2002
Posts: 990

Quoting Cruzer

My favorite trick is to precalc the data, since doing realtime math on 1MHz/8bits seems a bit too ambitious for me.

The math is very simple and can be very fast. Natural Wonders is doing everything 100% realtime, still most of the objects run in 25 fps. And that with far more than 8 bit math (matrix is 24 bit, rotation and z-scaling 16 bit etc).

Quote:

E.g. face hidden status, or any other status that typically stays the same for a number of frames

In 50% of the cases you just need to EOR all signs of the deltas and you know the hidden status. In the other 50% you can try to simply choose other vectors so that EOR-sign-check works again, or simply do those two muls which are also quite fast. But for platonic objects like in Natural Wonders you can do an even far easier check: Simply check the Z-value of the face midpoint against a visibility threshold.

2008-04-18 11:54

Oswald

Registered: Apr 2002
Posts: 5094

some words about why the dither works master? :)

2008-04-18 12:39

Graham
Account closed

Registered: Dec 2002
Posts: 990

Modified EOR fill code and special line routine :)

2008-04-18 13:14

Axis/Oxyron
Account closed

Registered: Apr 2007
Posts: 91

@Doynax:

The technique you describe (a bitarray keeping track of which block of the screen must be xor filled and which not) is very close to what i have done for the big cube viewed from the inside in natural wonders. It doesnt use a bitarray but one extra byte per char on the screen, which makes the line drawing and filling faster (you dont need any and/ora/xor for manipulations and testing) at the cost of <1kb. I think using char resolution for this extra data is obvious, because so you can use precalculated chars for the areas that dont need to be xor filled. I remember TTS, Graham and Me discussing about this technique back in 1994 as if it was yesterday.

2008-04-18 13:23

Oswald

Registered: Apr 2002
Posts: 5094

Quote: Modified EOR fill code and special line routine :)

hmm drew some drafts, looks like your method works auto-magically. I always thought lines should be drawn into 2 bbuffers twice.

2008-04-18 14:59

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quoting Axis/Oxyron

The technique you describe (a bitarray keeping track of which block of the screen must be xor filled and which not) is very close to what i have done for the big cube viewed from the inside in natural wonders. It doesnt use a bitarray but one extra byte per char on the screen, which makes the line drawing and filling faster (you dont need any and/ora/xor for manipulations and testing) at the cost of <1kb. I think using char resolution for this extra data is obvious, because so you can use precalculated chars for the areas that dont need to be xor filled. I remember TTS, Graham and Me discussing about this technique back in 1994 as if it was yesterday.

In that case it's exactly what I'm currently doing, what with the NES not having a bitmap mode and very little RAM.
The messy bit is that crossing the char boundaries inline when drawing the lines as it eats about as many cycles as the blitting itself, besides which it doesn't work with Cruzer's y-tables at all. So it'd be nice if you could do an optimized preparation pass instead, that is something to preallocate the line's characters in the video-matrix/bit-array. You'd still keep a full bitmap for the lines even if the output would be in char mode, so the lines could be addressed easily. But on the plus side inline allocation allows you to have special-case code for fresh characters and clearing only the undrawn bytes of the chars in-line.

Quoting Graham

The math is very simple and can be very fast. Natural Wonders is doing everything 100% realtime, still most of the objects run in 25 fps. And that with far more than 8 bit math (matrix is 24 bit, rotation and z-scaling 16 bit etc).

I just don't see how you do it. I've only got 16-bit precision in the matrix and 8-bits (9 really..) for the transformations, plus lookup tables for everything, manually built vertices and so forth, yet it *still* takes over 2500 cycles to process a damned cube. Yet you're throwing around dodecahedrons with twice my precision without taking a noticeable hit.
See why I think you're all holding out on the really good techniques? ;)

Quoting Graham

In 50% of the cases you just need to EOR all signs of the deltas and you know the hidden status. In the other 50% you can try to simply choose other vectors so that EOR-sign-check works again, or simply do those two muls which are also quite fast. But for platonic objects like in Natural Wonders you can do an even far easier check: Simply check the Z-value of the face midpoint against a visibility threshold.

But.. That would be cheating.. =)

By the way is there any particularly good method for working out the lighting? I've tried using values you get as a byproduct of back-face culling but they're in screen coordinates so the precision is already shot to hell (having the surfaces flicker between two colors isn't all that appealing). Another idea I had was to store normals in spherical coordinates and which you could rotate by tweaking the angles, but I couldn't work out a corresponding regular rotation matrix (in fact I doubt whether it's even possible in the first place).
Then there's always the option of quantizing the rotation along the X and Y axis down to (say) four or five bits with which to indexing into a set of precalculated colors tables, that is one such table per normal minus whatever you can reuse with symmetry. Not my idea of an elegant solution perhaps but it would undoubtedly work.

2008-04-18 19:53

Graham
Account closed

Registered: Dec 2002
Posts: 990

Quoting doynax

I just don't see how you do it. I've only got 16-bit precision in the matrix and 8-bits (9 really..) for the transformations, plus lookup tables for everything, manually built vertices and so forth, yet it *still* takes over 2500 cycles to process a damned cube.

There is a transformation routine for each object. For the octahedron it simply uses the matrix vectors as coordinates, for the cube it adds 3 matrix vectors for each coordinate. Also a lot of mirroring etc is hardcoded.

Quoting doynax

But.. That would be cheating.. =)

For surfaces were the midpoint builds a vector towards the origin which is orthogonal to the surface, this Z-value is as good as a cross product for hidden surface calculation. In that case it was even far easier because there even is no explicit hidden surface check. Z is also used for the surface shading, so if the visibility threshold is passed, it simply sets the surface color to 0 which automatically let's the surface disappear.

The other check (50% sign check) is also quite simple, look at the cross product:

z = dx1*dy2 - dx2*dy1

Now if dx1*dy2 is negative and dx2*dy1 is positive, you know that the result will ALWAYS be negative. Same for (pos)-(neg)=(always pos). So with a simple EOR of all 4 signs you can avoid the calculation in 50% of all hidden surface checks. And if you also try a few other surface vector combinations, you might be able to avoid the multiplications in far more than 50%.

2008-04-19 08:45

JackAsser

Registered: Jun 2002
Posts: 2014

I won't dwell into implementation details and not even algorithms since the most common already have been mentioned in this thread and in this thread: 3D projection on the C=64 (or...how do I divide?)

What I like to add to the discussion in the more general field:

* Don't be afraid to do real time filled vectors. The math, if done right, is minimal compared to line drawing and EOR-filling (assuming typical platonic soilds demo parts).

* Implement a good frame rate counter early in the process. Many "optimizations" really arn't any optimizantions because the overhead simply is too great. For example, an unrolled EOR-filler takes approx (4+4)*128 = 1024 cycles to fill a column. A potential speed up would be only to EOR-fill the chars that contain lines and simply STA for the rest. At most you can gain 4*128 = 512 cycles (per column). Any extra overhead in the linedrawer, in the code modifier (JSR+RTS) into the EOR-areas and STA areas quicky eats up those cycles. So DO add a frame rate counter FIRST so that you see that you really get bang for your bucks.

* Think out of the box and forget about doing everything so damn general (vertex coords, object data, dot procuts, matrix multiplications etc.) Start with the normal 3d-calcs, limit the degree of freedom etc. Take apart the problem and implement it smart. F.e. to calculate all 8 corners of a cube, use the fact that each corner simply is a simple combination of the world axices and the fact that 4 of the corners are mirrored verion of the other 4. Another example that Graham mention is backface culling for platonic solids, it's extremly simple and requiers only a CMP, assume eye in origo looking at Z=1, then do your favorite back face culling calcs and remove all multiplications with zero. You'll see that any method will result in a Z<e comparison where e is constant.

* Try doing something more than everybody else. Natural Wonders and Panta Rhei are both fine examples. First he implemented sub-pixel precision, then in NW he implemented dithering which I have to say uses one of the most elegant solutions for the odd/even fill problem with dither patterns. It is so simple it's hard to explain. ;D

* Release your stuff!

Thank you, that's all I think! :)

2008-04-19 18:03

A Life in Hell
Account closed

Registered: May 2002
Posts: 204

Quote:

* Implement a good frame rate counter early in the process. Many "optimizations" really arn't any optimizantions because the overhead simply is too great. For example, an unrolled EOR-filler takes approx (4+4)*128 = 1024 cycles to fill a column. A potential speed up would be only to EOR-fill the chars that contain lines and simply STA for the rest. At most you can gain 4*128 = 512 cycles (per column). Any extra overhead in the linedrawer, in the code modifier (JSR+RTS) into the EOR-areas and STA areas quicky eats up those cycles. So DO add a frame rate counter FIRST so that you see that you really get bang for your bucks.

This one can't be emphisised enough... I'd suggest taking it even lower level than that though. Three minutes spent patching vice (Consider a one line patch in cpu.c that printf's cycle/current PC address) and a few scripts to sift through it (try http://artificial-stupidity.net/~alih/ , process-log.c and profiler.py) can help you a lot. Or at least helped me a lot. YMMV.

EDIT: the patch is line #2130 in 6510core.c in viceplus, done like:

trap_skipped:
+#ifndef DRIVE_CPU
+ printf("profile %d %d\n", CLK, reg_pc);
+#endif
SET_LAST_OPCODE(p0);

2008-04-19 20:39

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quoting \^_^/

This one can't be emphisised enough... I'd suggest taking it even lower level than that though. Three minutes spent patching vice (Consider a one line patch in cpu.c that printf's cycle/current PC address) and a few scripts to sift through it (try http://artificial-stupidity.net/~alih/ , process-log.c and profiler.py) can help you a lot. Or at least helped me a lot. YMMV.

Hey, that's pretty cool.

Now you've got me thinking about my wish-list of features for the ideal profiler. In addition to just measuring the number of cycles in a straight piece of code you'd want to know things like how long the line setup time takes compared to the plotting code, how often a branch is taken, how often each variable is accessed, the frequency of page crossing penalties and so forth.

The annoying thing is that the critical code is as often as not split out over 10k of generated code, and what label lists you may have are usually quite lacking (mixing constants and addressed, missing local labels, no bindings to the sourcecode, etc). A truly excellent profiler would require integration with the assembler and be controlled by a powerful scripting language.

Previous - 1 | 2 | 3 | 4 - Next

Refresh

Subscribe to this thread: