lda #%11111111 sbx #%10000000 bcs .x1 eor #%01111111 eor pixels,y sta pixels,y iny lda #%01111111 .x1 sbx #%01000000 bcs .x2 eor #%00111111 eor pixels,y sta pixels,y iny lda #%00111111 .x2 sbx #%00100000 bcs .x3 . . .
hmm this sbx trick is nifty, I dont even understand it :)
there's grahams's method to rotate 192 dots in one-der: you build a rotation matrix, then precalculate (adc speedcode) the multiplies of the x,y,z elements of your unit vector. then you can calculate the 3d positions of a lot of dots based on those tables with additions/substractions.
another trick is to use a character screen. linedrawing will be very costly as you have to plot chars aswell etc, but you gain a lot when it comes to eorfill/clr. Desert Dream uses this method to plot the lines in the linetunel&lineball part, and the rotating "1024" text & circle saw & at the pyramids. (melon and spaceship are sprites)
speaking of dithering natural wonders bugs me a lot, looking at the code it appears me it shouldnt be able to do what it does but it still works :)
here's an idea I havent used so far: if you have a kind of small amount of vertices like in case of a dodecahedron, one could use precalculated tables to scale the unit vector to match the needed coordinates, then its all adcs and table lookups again, except for the perspective.
dithering: well maybe its just me being dumb.. :/ but I can not visualise how graham can do it with just one eorbuffer, and plotting each times 2 pixels under eachother in the liner. for the dither not to be messed up the line's pixel pattern must be aligned vertically, but the lineroutine seem to be not care about that (ie seemingly no anding of lowmost bit of y coord). also plotting 2 pixels each time would mean a loss of resulotion which is not visible on the result :)
the only viable way to do concave models with eor filling is presented in altered states imho :)
btw your NES routine is _very_ impressive :)
Coords can be packed pretty good too with delta-packing. I'm working on some vectorballs where each coord only takes about 4-5 bits for both X and Y values. So for the 49 bobs that I have currently, it takes about 27 bytes/frame, which means I can have 30 seconds of unique movement running in 25 fps with about 20K of tables.
Other favorites are some routines I developed for filled vector some years ago, like this one for line drawing, using precalculated y-coords for each pixel, which is of course faster than any iterative algorithm (Bresenham go home:)...
Another idea which I haven't used yet is to make the filler skip the unnecessary parts. Most of the time a typical eor filler is just wasting time on eor'ing with 0 and storing the same byte that was there before, so if you could make it skip at least a good bit of these wastelands of nothingness, there would be a lot of cycles to gain, especially for big flatshaded objects. I have some ideas on how to do this without adding any overhead to the linedrawer/filler, and without using too much memory or rastertime for administration, but I'm afraid I can't go into details.
My favorite trick is to precalc the data, since doing realtime math on 1MHz/8bits seems a bit too ambitious for me.
E.g. face hidden status, or any other status that typically stays the same for a number of frames
The technique you describe (a bitarray keeping track of which block of the screen must be xor filled and which not) is very close to what i have done for the big cube viewed from the inside in natural wonders. It doesnt use a bitarray but one extra byte per char on the screen, which makes the line drawing and filling faster (you dont need any and/ora/xor for manipulations and testing) at the cost of <1kb. I think using char resolution for this extra data is obvious, because so you can use precalculated chars for the areas that dont need to be xor filled. I remember TTS, Graham and Me discussing about this technique back in 1994 as if it was yesterday.
The math is very simple and can be very fast. Natural Wonders is doing everything 100% realtime, still most of the objects run in 25 fps. And that with far more than 8 bit math (matrix is 24 bit, rotation and z-scaling 16 bit etc).
In 50% of the cases you just need to EOR all signs of the deltas and you know the hidden status. In the other 50% you can try to simply choose other vectors so that EOR-sign-check works again, or simply do those two muls which are also quite fast. But for platonic objects like in Natural Wonders you can do an even far easier check: Simply check the Z-value of the face midpoint against a visibility threshold.
I just don't see how you do it. I've only got 16-bit precision in the matrix and 8-bits (9 really..) for the transformations, plus lookup tables for everything, manually built vertices and so forth, yet it *still* takes over 2500 cycles to process a damned cube.
But.. That would be cheating.. =)
* Implement a good frame rate counter early in the process. Many "optimizations" really arn't any optimizantions because the overhead simply is too great. For example, an unrolled EOR-filler takes approx (4+4)*128 = 1024 cycles to fill a column. A potential speed up would be only to EOR-fill the chars that contain lines and simply STA for the rest. At most you can gain 4*128 = 512 cycles (per column). Any extra overhead in the linedrawer, in the code modifier (JSR+RTS) into the EOR-areas and STA areas quicky eats up those cycles. So DO add a frame rate counter FIRST so that you see that you really get bang for your bucks.
This one can't be emphisised enough... I'd suggest taking it even lower level than that though. Three minutes spent patching vice (Consider a one line patch in cpu.c that printf's cycle/current PC address) and a few scripts to sift through it (try http://artificial-stupidity.net/~alih/ , process-log.c and profiler.py) can help you a lot. Or at least helped me a lot. YMMV.
Then allow me to speculate ;) How about keeping a separate bit array with one element per screen tile. Then when drawing a line you'd plot a separate low-resolution outline to the bit array, i.e. a conservative estimate of which cells in the real bitmap the line might cover. The standard line algorithm should work if modified to draw both tiles when moving diagonally. At least that's a scheme I tinkered with, without getting it to work I might add.. But I guess setting up and clipping all those extra lines would be a tad too costly to be practical.
but as usual my philosophy is "why do it in realtime if it can be precalculated without taking up too much memory?"
you can only get away with dots/bobs without perspective imho.
I think you are aiming simply too high, your methods of doing 3d actually represent the state of the art way of doing it on the c64, you cant do it much faster. 98% of c64 demos doing 3d do it slower than how you already do it.