[CSDb] - User Forums - Sprite multiplexing *with* priorities

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Sprite multiplexing *with* priorities

2009-01-24 14:40

Bregalad
Account closed

Registered: Jul 2008
Posts: 42

Sprite multiplexing *with* priorities

I'm willing to port a game on the C64 but for sure I'll need more than 8 sprites per screen. The game uses a top-down perspective so sprites that are southern on the screen must show above the sprites that are nothern, else it will look weird.
Unfortunately, most sprite multiplexing tehniques I've seen arround the net seems to kill any priorities, and that's not good.

I guess I can come up with a solution, but I'm not sure how good it is :
- The screen is sepearated into 10 areas of 20 pixel tall
- There is a raster interrupt each 20 pixels that will handle the multiplexing, and the interrupt at line 250 will also be used for the other usual stuff (screen updates, sound code, etc...)
- Each sprite always crosses 2 areas no matter where they are positionned (this assumes Y expanding is never used)

Now during the frame :
In the first area (lines 0-20), the sprites are mapped to hardware sprites with regards to their sorted priorities. If there is more than 8 then the lowest pirority is discarded.

In the second area, the sprites that were already used in the first area cannot be used, so I map the sprites that are still unused to the sprites starting in the second area, regarding to their priorities.

In the third area, all sprites that were used in the first are done being displayed and I can re-map them to sprites of the third area according to their priorities, etc....

Now the priorities are not matched between areas, but at least they are inside each area. Also the number of interrupts is constant, and everything can be pre-calculed during the non-display time, so that interrupts just do a few writes each 20 scanlines which is not too bothersome and allow it to be mixed with other raster split without too much trouble.

I know Cadavers said it was a bad idea to split the screen with areas, but I can't come with anything else. I could use the generic method in a way so that top sprites are lowest priorities than bottom sprites, and hoping it won't look too weird.

... 61 posts hidden. Click here to view all posts....

2009-02-04 14:47

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quote: Nobody uses the decimal mode so no need to clear the decimal flag.

What.. No one has figured out a way to save cycles by using BCD arithmetic in a bithack?

Anyway, I managed to get bitten by this myself a while back.
Once in a blue moon my game would glitch for no apparent reason. Eventually it turned out (after *many* hours of debugging) to be caused by an interrupt being triggered during the dozen or so cycles it took to update the player's score counter. Making things worse was the fact that the collision detection logic would normally have finished running long before the visible display area, so it only ever happened during the the busiest screens.

2009-02-04 15:32

Bregalad
Account closed

Registered: Jul 2008
Posts: 42

Well that decimal flag is really evil but I do never use it for now, and I belive the Kernal clears it for me.
Quote:

Or you could look at the whole timing problem from a slightly different perspective. That is you might opt to program the sprites as early as possible rather than as late as possible, e.g. right after the previous shape has finished with the sprite channel you're attempting to reuse. That way there's no need to estimate raster timing at all, with the drawback that some sprite which would otherwise be displayed imperfectly simply won't show up at all.

I've considered that, but it only works if sprites are never Y-expanded. If some sprites are Y-expanded and some aren't, the place where each sprite ends could not be in the same order as the order where each sprite starts. As the result you'll have to ressort to 2 sorting routine, one to sort the start coordinates as usual, and a second to sort the end coordinate which are the same plus 21/42. So it would become significantly more CPU intensive during VBlank, and more complicated logic overall. Altough you'd save the mess with $d015, as all sprites could be always enabled, and write the first 8 active sprites during VBlank.

Quote:

As for speeding up the interrupt handler itself about the best you can do is to unroll it completely and keep a separate interrupt handler per-sprite. That way you can poke sprite values directly into LDA immediates and STA to absolute addresses instead of indexing anything. Whether or not the cycle savings are worth the space depends on your game and how tightly you'll want to pack the sprites, but I've found it a useful method.

I don't plan to pack up sprites a lot, but I'd just like that if that happen the programm does not crash or slow down, just graphic glitches are acceptable.
I don't know how you could "unroll" anything since there is no loop. The only "loop" there is is finding the next valid IRQ position, which normally is the next one unless more than 8 sprites tries to coexist on one line and were rejected.

Again I don't see how I could have one separate IRQ per sprite and have any advantage in doing that. You'll have to show me some example code. Having one separate code for the 8 hardware sprites would only save 1 cycles when storing the frame number, color, X_Lo pos and Y pos, making a grand total of 4 cycles. The time lost in changing the IRQ vector for each sprite would probably be much greater (about 12-20 cycles) so I really don't see the point. And this would eat up so much space that you'll have to change $ffff each time as well.

Having one IRQ per software sprite could possibly save some time, as you could do lda immediate *but* not only you'd have to change the IRQ vector each time or end up using a jump table or something like that and it would also slow things down. Not to mention the arguemnt of lda immediate would be complicated to change inside the VBlank, and you'll have to have the almost exactly same code about 16-20 times in RAM, and I think it should be a waste to reserve up to 2-3 kb of memory for IRQ code only to save a couple of cycles.

2009-02-04 16:07

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quote:

I've considered that, but it only works if sprites are never Y-expanded. If some sprites are Y-expanded and some aren't, the place where each sprite ends could not be in the same order as the order where each sprite starts. As the result you'll have to ressort to 2 sorting routine, one to sort the start coordinates as usual, and a second to sort the end coordinate which are the same plus 21/42. So it would become significantly more CPU intensive during VBlank, and more complicated logic overall. Altough you'd save the mess with $d015, as all sprites could be always enabled, and write the first 8 active sprites during VBlank.

General-purpose multiplexers rarely bother with any of the bit-packed attributes other than the x bits.
To handle y-expansion effectively I think you'll need to sort by both the top y coordinate and the bottom y coordinate anyway regardless of the interrupt method. At any rate the hardware sprites won't be reprogrammed in strict sequential order anymore.

Quote:

Having one IRQ per software sprite could possibly save some time, as you could do lda immediate *but* not only you'd have to change the IRQ vector each time or end up using a jump table or something like that and it would also slow things down. Not to mention the arguemnt of lda immediate would be complicated to change inside the VBlank, and you'll have to have the almost exactly same code about 16-20 times in RAM, and I think it should be a waste to reserve up to 2-3 kb of memory for IRQ code only to save a couple of cycles.

I meant using one interrupt handler per software sprite. Otherwise it might be hard to use immediate LDAs ;)

In my current project the multiplexer interrupts essentially look like this:

	;roughly 75 cycles and 50 bytes per IRQ
irq0	sta irq_save_a

mux_x0	lda #$00
	sta $d000
mux_c0	lda #$00
	sta $d027
mux_p0	lda #$00
	sta $07f8
mux_m0	lda #$00
	sta $d01d
mux_y0	lda #$00
	sta $d001

	asl $d019

mux_l0	lda #$00
	sta $d012
	cmp $d012
	bcc irq1+2
	beq irq1+2

	lda #<irq1
	sta $fffe
;	inc $ffff ;every fifth time or so..

	lda irq_save_a
	rti

irq1	sta irq_save_a

mux_x1	lda #$00
	sta $d002
mux_c1	lda #$00
	sta $d028
	.
	.
	.

On top of that there's a matching (unrolled) writer "loop" which goes through the sorted sprite list and pokes the sprite attributes into the immediate constants, compiles the most-significant x bytes, and a few other things. The nice thing is that it doesn't really cost me any extra cycles since I had to preserve the sprite data somewhere anyway.

Having lots of sprites flying around happens to be a core aspect of my game so it's easily worth the cost, but I can see how your project might be different.

2009-02-04 16:35

Bregalad
Account closed

Registered: Jul 2008
Posts: 42

Quote:

General-purpose multiplexers rarely bother with any of the bit-packed attributes other than the x bits.

Then I guess I made one of the first true general multiplexer. How could you call it a general multiplexer if it imposes some restrictions to your flags ? I have to admit that in-game sprites will mostly be multicolor, will mostly never be X or Y expanded and will mostly appear on the front of the background, but it's nice to have an engine that isn't that restricting, especially since I plan to have it enabled at all times.
PS : Commodore were really stupid they should have implemented X and Y *inversion* instead of expansion, it would be MUCH more usefull for in-game use. However I guess I can do inversion by software without having to store the graphics twice.
Quote:

To handle y-expansion effectively I think you'll need to sort by both the top y coordinate and the bottom y coordinate anyway regardless of the interrupt method. At any rate the hardware sprites won't be reprogrammed in strict sequential order anymore.

I don't see the problem as long as I make the interrupt before the sprite, since the starting coordinate is the same regardless if the sprite is Y-expanded. I only regard whether Y-expansion is enabled when computing if sprites overlap, and if there is more than 8 on a scanline.
Quote:

On top of that there's a matching (unrolled) writer "loop" which goes through the sorted sprite list and pokes the sprite attributes into the immediate constants, compiles the most-significant x bytes, and a few other things. The nice thing is that it doesn't really cost me any extra cycles since I had to preserve the sprite data somewhere anyway.

Oh I see this is really fast. For now I guess I'll stick with the current version for a while, if I start to have problems because it's too slow I may consider unrolling things and/or not rewrite the flags that don't need to.

2009-02-04 16:58

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quote:

I don't see the problem as long as I make the interrupt before the sprite, since the starting coordinate is the same regardless if the sprite is Y-expanded. I only regard whether Y-expansion is enabled when computing if sprites overlap, and if there is more than 8 on a scanline

Imagine that you put one y-expanded sprite on the first scanline. Then starting one line below that you add seven non-expanded sprites. Finally lets add yet another sprite 24 lines below that.
Now, the point here is that if you simply sorted by the topmost y coordinate and tried to use the sprites in sequential order then the multiplexer would try to use the same hardware sprite for both the topmost (y-expanded) software sprite and the final bottommost sprite. Clearly not optimal.

This might not be a problem if you only rarely use y-expansion, but then sorting by the bottom y coordinate and triggering interrupts according to that ought to give you much the same results as for the reuse-as-early-as-possible IRQ scheme.

2009-02-04 17:04

Graham
Account closed

Registered: Dec 2002
Posts: 990

Quoting Bregalad

Then I guess I made one of the first true general multiplexer. How could you call it a general multiplexer if it imposes some restrictions to your flags ? I have to admit that in-game sprites will mostly be multicolor, will mostly never be X or Y expanded and will mostly appear on the front of the background, but it's nice to have an engine that isn't that restricting, especially since I plan to have it enabled at all times.

On C64 you can hardly afford the luxury of "general purpose".

Quote:

PS : Commodore were really stupid they should have implemented X and Y *inversion* instead of expansion, it would be MUCH more usefull for in-game use.

Without the expansion registers: sprite stretching, all border sprites, a lot of gfx modes etc etc wouldn't have been possible. X/Y inversion on the other hand only saves a bit of memory. I agree that expansion isn't often used on game sprites, but that doesn't make it useless.

Quote:

However I guess I can do inversion by software without having to store the graphics twice.

I guess the routine is longer than those extra sprite gfx.

2009-02-04 21:53

Bregalad
Account closed

Registered: Jul 2008
Posts: 42

Quote:

Imagine that you put one y-expanded sprite on the first scanline. Then starting one line below that you add seven non-expanded sprites. Finally lets add yet another sprite 24 lines below that.
Now, the point here is that if you simply sorted by the topmost y coordinate and tried to use the sprites in sequential order then the multiplexer would try to use the same hardware sprite for both the topmost (y-expanded) software sprite and the final bottommost sprite. Clearly not optimal.

Well, in the following case my algorithm will work that way :
- The expanded sprite is the first so it is assigned to sprite #7
- The following 7 sprites are overlapping with their respective previous sprites so they are assigned to sprite #6 down to #0
- The next srite 24 lines later does not overlap with the previous so it is assigned to sprite #7
- It checks if it overlaps with the previously used sprite #7, the answer is yes so this sprite is rejected.

So the sprite is rejected when it would in fact have been possible to draw it, but no graphic glitches should happen.
But I'm pretty sure I will rarely use Y or X expansion anyways, but I see no reason to forbid myself to use it.
Quote:

I guess the routine is longer than those extra sprite gfx.

Certainly not, in fact in most games most object can be seen in 2 directions, and one is the mirror of the other. It's a complete waste to store the same thing twice, only bit-inverted. You could use a 256-byte table if you want to be fast, and this is the same data as 4 inverted sprites, when you'll probably need many dozens. A routine that manually shifts the data would take probably about 30 bytes which is less than one single sprite.
In my case I will have 4 directions, Up, Down, Left, Right and only Left&Right are a mirror of eachother so not all graphics will need to be mirroered, but quite some.

2009-02-05 10:15

Jetboy

Registered: Jul 2006
Posts: 337

Quote:

However I guess I can do inversion by software without having to store the graphics twice.

You keep forgetting c64 is not deamon of speed.
Usually having precalculated graphics, using some extra memory is the way to go. Oftentimes it makes a difference between running your game smoothly, or not. Or should i say running your game at all or not at all.

Manipulating your graphics data in realtime is something you want to avoid at all costs.

Unrolled routine to flip the sprite would take about (if my memory serves me right - havent been coding for years) 13*63 bytes, that's 13 full rasterlines (in the border, few more on the screen area, especially with sprites turned on)! You cannot afford that! Plus unrolled code would be much longer than 64 bytes, or if it wasn't unrolled it would take much more time. Plus there is 256 byte table you need to store somewhere.

2009-02-05 12:05

Bregalad
Account closed

Registered: Jul 2008
Posts: 42

If can be pretty fast that way :

ldy Data,X
lda HInversionTbl,Y
sta Data2+2,X
ldy Data+1,X
lda HInversionTbl,Y
sta Data2+1,X
ldy Data+2,X
lda HInversionTbl,Y
sta Data2,X

HInversionTbl
.db $00, $80, $40, $c0, $20, etc...

The HInversion Tbl could be created by software when the game starts to not have to load it from the disc.

2009-02-05 12:29

Frantic

Registered: Mar 2003
Posts: 1648

This code is bigger than the data for a sprite anyway and would be required to be duplicated for each sprite since the adresses are hard coded. Perhaps you meant to use something like lda (source_zp),y instead? I have to agree with others here, that you seem to underestimate how slow the C64 really is, assuming that you are supposed to do this stuff at the same time as a lot of other game logic and graphics stuff is running.

I'd say, either you have time for this stuff, and in that case you would probably do a loop anyway, to save RAM, or you don't have time, and in that case you would use pre-flipped sprite data and just change sprite pointers instead of doing the flipping in real time.

Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 - Next

Refresh

Subscribe to this thread: