| |
Oswald
Registered: Apr 2002 Posts: 5086 |
Sorting
are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :) |
|
... 193 posts hidden. Click here to view all posts.... |
| |
Oswald
Registered: Apr 2002 Posts: 5086 |
Quote: If you preprocess the sprite-IRQs, you can take care to always take the upmost Y-coordinate as the basis for the IRQ, even if it isn't the first sprite in sort order. Did this in MW1-3 which had inexact sprite sorting. Brr, never again! :)
what was so bad about it, when you could get it 'right' afterall ? |
| |
Oswald
Registered: Apr 2002 Posts: 5086 |
Quote: The way I do it is to attach the IRQ to the end of the last use of the new sprite, and thus try to reprogram the "channel" as soon as possible. Then all you have to do is check if you're too late to generate next IRQ (if it is to run before the current scanline) and if so jump straight to the next handler.
hmm 'before the new' sprite approach allows more tight packing ;) clever solution nevertheless. |
| |
cadaver
Registered: Feb 2002 Posts: 1160 |
Oswald: well it was never completely "right" in those games, you could create unnecessary artifacts for example when several motorcycles (2x2 sprites) were coming at you and you jumped. But for example, if you have mostly airborne enemies coming at you from several heights and you're mostly on ground, it doesn't matter. |
| |
Oswald
Registered: Apr 2002 Posts: 5086 |
radix sort is a very interesting approach :) offers a constant sort time, but sadly that is much worse than progressive insertion sort. when fully unrolled I estimate a running time of ~55 rasterlines. |
| |
doynax Account closed
Registered: Oct 2004 Posts: 212 |
Quote: radix sort is a very interesting approach :) offers a constant sort time, but sadly that is much worse than progressive insertion sort. when fully unrolled I estimate a running time of ~55 rasterlines.
The bucket sort we talked about earlier is just bucket sort is just a special case of the radix sort. Except it sorts in one step and with some loss of precision to reduce the number of buckets.
Anyway an optimized and unrolled two-step radix sort shouldn't be quite as bad as that. It ought to be possible to get it well below 55 lines. In fact it'd probably be faster than a bucket sort combined with a bubble sort fix-up stage.
Dammit.. Now I have to implement one just to see how it turns out ;) |
| |
Oswald
Registered: Apr 2002 Posts: 5086 |
here is my implementation:
there's 2 bucket arrays, each 256 bytes. each bucket can hold max 15 elements, 16th is the counter of the elements. adress of fex. bucket 3's element 5 is: 3*16+5.
bucket1 is used for the first pass
bucket2 for the 2nd.
each code snipplet is a code segment out of an unrolled loop. (ie stuff is missing like lda bucket1+bucketnr*16,x)
I dont see how this could be much faster :) some1 proove me wrong :)
;pass1
ldx sort+0
ldy spry,x ; get sprite y coord
lax and#0fmul16,y ;get bucket startadress
ora bucket1+15,x ;get adress INside the bucket
inc bucket1+15,x ;inc nr of elements
tax
smod sty bucket1,x ;store spr to bucket
;pass2
ldx bucket1+15
beq next ;empty bucket dont bother
blp1
ldy bucket1,x
lda spry,y ;get spr y
and #%11110000 ;upper 4 bits only this time
tay ;which is exactly our pointer
ora bucket2+15,y ;addy inside bucket
inc bucket2+15,y ;inc bucket counter
tay ;final bucket addy
sta bucket2,y ;store sprite
dex
bne blp1 ;any more in bucket1?
next
;pass3
ldy #$00 ; nr of sprites counter this is done only once
ldx bucket2+15
beq next ;empty bucket?
blp2
lda bucket2,x ;get sprite nr
smod sta final,y ;store to final list
iny ;final spr count
dex
bne blp2 ;any more in curr. bucket?
edit, ok, well some bugs there, destroying regs, but assuming they werent.. fixing those would make it even more slower tho |
| |
doynax Account closed
Registered: Oct 2004 Posts: 212 |
I'm attempting something like this:
;; initialize the buckets
ldx #$81
!for i,0,16 {
sta lsd_bucket+i
}
lda #$fe
!for i,0,6 {
sax msd_bucket+i*2+0
stx msd_bucket+i*2+1
sbx #-3
}
;; lsd sort
lda #$0f
!for i,0,32 {
ldx actor_ypos+i
sbx #$00
ldy lsd_bucket,x
sty actor_link+i
ldy #i
sty lsd_bucket,x
}
;; msd sort
!for i,0,13 {
ldx lsd_bucket+i
bmi .next
.msd ldy actor_ypos,x
lda msd_table,y
tay
lda msd_bucket,y
stx msd_bucket,y
ldy actor_link,x
sta actor_link,x
bmi .next
ldx actor_ypos,y
lda msd_table,x
tax
lda msd_bucket,x
sty msd_bucket,x
ldx actor_link,y
sta actor_link,y
bne .msd
.next }
;; finally in the mux writer.
;; for each sprite, alternating x and y
ldx actor_link,y
bpl .ok
.bucket lda msd_bucket-$80,x
tax
bmi .bucket
.ok ...
The idea is to use linked lists for the buckets. Sort all possible actors in the first pass (invalid ones having high y values), you can optimize it by trying to keep the maximum actor number as low as possible and skipping those high entries. And we can link the actors together in the actual multiplexer instead of a separate pass. Also the sentinels in the msd buckets can help us to skip buckets easily.
Finally we don't need more than about 13 buckets since not all y coordinates are valid, and as a bonus we can collect the invalid ones into a single "death" list automatically by tweaking the division table. Another bonus is that by sticking a store in the bucket skipping code of the multiplexer you can easily link together a complete list of actors which IMO is more convenient to work with than an order list (i.e. you only need to know the current actor's index to move to the next one, no need to keep track of an order index). Furthermore just about everything is kept on the zeropage, thought only temporarily of course.
All in all it seems to work out to about 28 raster lines. Except I'm not at all certain whether this will actually work yet.. ;) |
| |
Oswald
Registered: Apr 2002 Posts: 5086 |
wow, very creative ! :) 28 lines sounds very good :) tho I have yet not an idea how progressive sort can perform when there's little changes ;) |
| |
doynax Account closed
Registered: Oct 2004 Posts: 212 |
Quote: wow, very creative ! :) 28 lines sounds very good :) tho I have yet not an idea how progressive sort can perform when there's little changes ;)
It depends really. A progressive sort is faster in 95% of the cases but it's that twentieth missed frame which fucks up the flow in an action game. Then again I'm writing a twitch game which aspires to be a bullet hell shooter, but a platformer (say) would obviously have other requirements.
The radix sort is only something like 15 cycles (per actor) slower than my current bucket sort. So if I can get it to work I'll probably switch, and I'd be happy to share the final code if I do. Weighting the extra cycles vs. the extra sprites makes it a tough choice. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1408 |
I just used insertion sort starting with the ordering from the previous frame in Teradyne, setting up the first 8 sprites in the end-of-frame IRQ, and then moving each sprite down with an IRQ at the end of its previous use. If the next sprite needs moving before I returning from the IRQ I do it then instead of setting up another.
I stagger creation of all the sprites in a new formation over several frames to spread the workload, then there's never too much work for the sorter to do. |
Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ... | 21 - Next |