lda #%11111111 sbx #%10000000 bcs .x1 eor #%01111111 eor pixels,y sta pixels,y iny lda #%01111111 .x1 sbx #%01000000 bcs .x2 eor #%00111111 eor pixels,y sta pixels,y iny lda #%00111111 .x2 sbx #%00100000 bcs .x3 . . .
Then allow me to speculate ;) How about keeping a separate bit array with one element per screen tile. Then when drawing a line you'd plot a separate low-resolution outline to the bit array, i.e. a conservative estimate of which cells in the real bitmap the line might cover. The standard line algorithm should work if modified to draw both tiles when moving diagonally. At least that's a scheme I tinkered with, without getting it to work I might add.. But I guess setting up and clipping all those extra lines would be a tad too costly to be practical.
but as usual my philosophy is "why do it in realtime if it can be precalculated without taking up too much memory?"
you can only get away with dots/bobs without perspective imho.
I think you are aiming simply too high, your methods of doing 3d actually represent the state of the art way of doing it on the c64, you cant do it much faster. 98% of c64 demos doing 3d do it slower than how you already do it.