[CSDb] - User Forums

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Sorting

2007-10-08 16:08

Oswald

Registered: Apr 2002
Posts: 5127

Sorting

are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :)

... 193 posts hidden. Click here to view all posts....

2007-10-08 17:45

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quote: you might miss sprites otherwise possible to display with the bucket sort.

spr1=70
spr2=74

bucket sort result:

spr2
spr1

then you fire an irq for the next sprite that will be spr2, and already late from spr1

The way I do it is to attach the IRQ to the end of the last use of the new sprite, and thus try to reprogram the "channel" as soon as possible. Then all you have to do is check if you're too late to generate next IRQ (if it is to run before the current scanline) and if so jump straight to the next handler.

2007-10-08 17:46

cadaver

Registered: Feb 2002
Posts: 1163

If you preprocess the sprite-IRQs, you can take care to always take the upmost Y-coordinate as the basis for the IRQ, even if it isn't the first sprite in sort order. Did this in MW1-3 which had inexact sprite sorting. Brr, never again! :)

2007-10-08 17:46

chatGPZ

Registered: Dec 2001
Posts: 11523

now you are assuming too much :) ofcourse the displayer code must be written in a way that such thing can not happen.

2007-10-08 18:30

Oswald

Registered: Apr 2002
Posts: 5127

Quote: If you preprocess the sprite-IRQs, you can take care to always take the upmost Y-coordinate as the basis for the IRQ, even if it isn't the first sprite in sort order. Did this in MW1-3 which had inexact sprite sorting. Brr, never again! :)

what was so bad about it, when you could get it 'right' afterall ?

2007-10-08 18:33

Oswald

Registered: Apr 2002
Posts: 5127

Quote: The way I do it is to attach the IRQ to the end of the last use of the new sprite, and thus try to reprogram the "channel" as soon as possible. Then all you have to do is check if you're too late to generate next IRQ (if it is to run before the current scanline) and if so jump straight to the next handler.

hmm 'before the new' sprite approach allows more tight packing ;) clever solution nevertheless.

2007-10-08 18:38

cadaver

Registered: Feb 2002
Posts: 1163

Oswald: well it was never completely "right" in those games, you could create unnecessary artifacts for example when several motorcycles (2x2 sprites) were coming at you and you jumped. But for example, if you have mostly airborne enemies coming at you from several heights and you're mostly on ground, it doesn't matter.

2007-10-09 10:22

Oswald

Registered: Apr 2002
Posts: 5127

radix sort is a very interesting approach :) offers a constant sort time, but sadly that is much worse than progressive insertion sort. when fully unrolled I estimate a running time of ~55 rasterlines.

2007-10-09 10:31

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quote: radix sort is a very interesting approach :) offers a constant sort time, but sadly that is much worse than progressive insertion sort. when fully unrolled I estimate a running time of ~55 rasterlines.

The bucket sort we talked about earlier is just bucket sort is just a special case of the radix sort. Except it sorts in one step and with some loss of precision to reduce the number of buckets.

Anyway an optimized and unrolled two-step radix sort shouldn't be quite as bad as that. It ought to be possible to get it well below 55 lines. In fact it'd probably be faster than a bucket sort combined with a bubble sort fix-up stage.

Dammit.. Now I have to implement one just to see how it turns out ;)

2007-10-09 11:05

Oswald

Registered: Apr 2002
Posts: 5127

here is my implementation:

there's 2 bucket arrays, each 256 bytes. each bucket can hold max 15 elements, 16th is the counter of the elements. adress of fex. bucket 3's element 5 is: 3*16+5.

bucket1 is used for the first pass
bucket2 for the 2nd.

each code snipplet is a code segment out of an unrolled loop. (ie stuff is missing like lda bucket1+bucketnr*16,x)

I dont see how this could be much faster :) some1 proove me wrong :)

;pass1
	ldx sort+0
	ldy spry,x        ; get sprite y coord
	lax and#0fmul16,y ;get bucket startadress 
	ora bucket1+15,x  ;get adress INside the bucket
	inc bucket1+15,x  ;inc nr of elements
	tax		
smod	sty bucket1,x	  ;store spr to bucket



;pass2

	ldx bucket1+15
	beq next	;empty bucket dont bother
blp1	

	ldy bucket1,x
	lda spry,y         ;get spr y
	and #%11110000     ;upper 4 bits only this time
	tay                ;which is exactly our pointer
	ora bucket2+15,y   ;addy inside bucket
	inc bucket2+15,y   ;inc bucket counter
	tay                ;final bucket addy
	sta bucket2,y      ;store sprite
	dex
	bne blp1	   ;any more in bucket1?

next

;pass3
        ldy #$00 ; nr of sprites counter this is done only once

	ldx bucket2+15
	beq next             ;empty bucket?
blp2
	lda bucket2,x        ;get sprite nr
smod	sta final,y          ;store to final list
	iny                  ;final spr count
	dex           
	bne blp2             ;any more in curr. bucket?

edit, ok, well some bugs there, destroying regs, but assuming they werent.. fixing those would make it even more slower tho

2007-10-09 11:37

doynax
Account closed

Registered: Oct 2004
Posts: 212

I'm attempting something like this:

	;; initialize the buckets
	ldx #$81
	!for i,0,16 {
	sta lsd_bucket+i
	}
	lda #$fe
	!for i,0,6 {
	sax msd_bucket+i*2+0
	stx msd_bucket+i*2+1
	sbx #-3
	}

	;; lsd sort
	lda #$0f
	!for i,0,32 {
	ldx actor_ypos+i
	sbx #$00
	ldy lsd_bucket,x
	sty actor_link+i
	ldy #i
	sty lsd_bucket,x
	}

	;; msd sort
	!for i,0,13 {
	ldx lsd_bucket+i
	bmi .next

.msd	ldy actor_ypos,x
	lda msd_table,y
	tay

	lda msd_bucket,y
	stx msd_bucket,y

	ldy actor_link,x
	sta actor_link,x
	bmi .next

	ldx actor_ypos,y
	lda msd_table,x
	tax

	lda msd_bucket,x
	sty msd_bucket,x

	ldx actor_link,y
	sta actor_link,y
	bne .msd

.next	}

	;; finally in the mux writer.
	;; for each sprite, alternating x and y
	ldx actor_link,y
	bpl .ok
.bucket	lda msd_bucket-$80,x
	tax
	bmi .bucket
.ok	...

The idea is to use linked lists for the buckets. Sort all possible actors in the first pass (invalid ones having high y values), you can optimize it by trying to keep the maximum actor number as low as possible and skipping those high entries. And we can link the actors together in the actual multiplexer instead of a separate pass. Also the sentinels in the msd buckets can help us to skip buckets easily.
Finally we don't need more than about 13 buckets since not all y coordinates are valid, and as a bonus we can collect the invalid ones into a single "death" list automatically by tweaking the division table. Another bonus is that by sticking a store in the bucket skipping code of the multiplexer you can easily link together a complete list of actors which IMO is more convenient to work with than an order list (i.e. you only need to know the current actor's index to move to the next one, no need to keep track of an order index). Furthermore just about everything is kept on the zeropage, thought only temporarily of course.

All in all it seems to work out to about 28 raster lines. Except I'm not at all certain whether this will actually work yet.. ;)