| |
Oswald
Registered: Apr 2002 Posts: 5086 |
Sorting
are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :) |
|
... 193 posts hidden. Click here to view all posts.... |
| |
lft
Registered: Jul 2007 Posts: 369 |
I wrote a post about how JSR/RTS would interfere with pushing the result to the stack. But that is only a problem if you do it the wrong way, putting JSR in the field and RTS in the emptying routine. So it was a non-issue after all. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1408 |
Ah!
Well, FWIW I forgot the results were being pushed to the stack when I was wondering if a TXS:JMP would be a faster way to return to the fieldÂ… (ie, just keep start of empty routine at top of stack) |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Just some thoughts...
Multiplexers in game scenarios, is the sorting normally done in the vertical blank or is the sorting performed in the main loop bur for the next frame's sprite setup?
Was just thinking of the ideas of not sorting at all and the fact that the vertical blank period is a bit crowded. A lot of other stuff must be updated there such as the scrolling etc.
What if the multiplexer simply scans the remaining actors for the next entry? This is in total of course a O(n^2/2) algorithm and dead slow. Worst case to handle would be 8 sprites having to be multiplexed directly below. So you have 21/8 raster lines to find the lowest index in the remaining actors.
Think I'll test this approach some day soon.. |
| |
cadaver
Registered: Feb 2002 Posts: 1160 |
There are games that do both. I remember at least Midnight Resistance *not* doublebuffering the sorted sprites, so it was doing the sort in the vblank / scorepanel area.
Generally I'd recommend not making something timecritical that absolutely doesn't need to be, therefore rather pre-calculate the sorted sprites anywhere when the main program has time.
If you do the sorting "on the fly", you can't take advantage of last frame's sorting result. In a tight sprite formation, you barely have enough time to load the sprite registers from pre-sorted data, so would imagine you would run into trouble with a "find the next sprite" approach, even with unrolled code. |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: There are games that do both. I remember at least Midnight Resistance *not* doublebuffering the sorted sprites, so it was doing the sort in the vblank / scorepanel area.
Generally I'd recommend not making something timecritical that absolutely doesn't need to be, therefore rather pre-calculate the sorted sprites anywhere when the main program has time.
If you do the sorting "on the fly", you can't take advantage of last frame's sorting result. In a tight sprite formation, you barely have enough time to load the sprite registers from pre-sorted data, so would imagine you would run into trouble with a "find the next sprite" approach, even with unrolled code.
Ok, thanks for the insights. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1408 |
For applications where border time is a lot scarcer than display time, I suspect I'd lean towards a nice simple linked list based bucket sort; the times it takes a while to find next sprite are precisely when the next sprite is a fair way down the screen, so the time taken wouldn't be so critical. If the next sprite is in the next line or three there'd only be a few links to follow, which shouldn't take too long.. |
| |
Repose
Registered: Oct 2010 Posts: 225 |
Just wanted to mention an O(n) sort I invented long ago I call Fibonacci sort, for how one step does a running sum.
Now I realize it's been invented long ago and is called the counting sort.
I think it takes about 50 cycles per number.
#Fibonacci Sort
unsorted = [3, 1, 4, 1, 5, 9, 2, 6, 0]
counts = [0] * 10
fib = [0] * (10 + 1)
sortd = [0] * len(unsorted)
#First pass: find the counts
for n in unsorted:
counts[n]+=1
#Second pass: do the Fibonacci magic
i = 0
total = 0
for n in counts:
fib[i] = total
total += n
i+=1
#Third pass: output sorted
for n in unsorted:
sortd[fib[n]] = n
fib[n] += 1
#Results
print(sortd)
a very buggy version in assembler:
;count each number
ldx #$ff
count ldy unsort,x
inc count,y
dex
bne count;15/ea
;Fibonacci step
clc
lda count
inx
fib adc count,x
sta sta count,x
inx
bne fib;13/ea
;Copy sorted
sort ldy unsort
ldx count,y
sty sort,x
inc count,y
inc sort+1
bne sort/27/ea
|
| |
Oswald
Registered: Apr 2002 Posts: 5086 |
this is one case for (,x) if you want the usual indice indirection too. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1408 |
Nice work independently discovering counting sort, Repose
It's worth noting that this one is O(n)+O(m), where m is the number of buckets.
Pass 2 for a "perfect" sort of n sprites would take 220 iterations (256 in your example code)
That's around 220*10 cycles if you unroll the loop a little, so about 35 lines of overhead. Of course, you can cut that down considerably if you don't need as much accuracy or range.
Clearing the counts array also takes time, though you can drop that back from O(m) to O(n) by keeping it seperate to your "fib" array, persisting it frame to frame, and only clearing the entries you incremented in the first place. |
| |
Oswald
Registered: Apr 2002 Posts: 5086 |
found out this one, posting if someone finds a flaw, somehow I am unsure if I'm overlooking something, because in speed its close to the pha pha method, which feels unbeatable.
phase1 bucketsort, 4bit per buckets:
;this code repeats * numofsprites
ldy yvalues+spriteindex
ldx table,y ;x = y / 16 *2
lda #spriteindex
sta (bucket,x) ;needs some prep on zp
inc bucket,x ;every 2nd zp points to a buckettab
phase2 sorting network:
ldy bucket+z ;19 such compare snippets per bucket
lda yvalues,y ;can sort max 8 sprite per bucket
ldx bucket+q ;z and q is precalculated
cmp yvalues,x
bcc + ; or bcs?
sty bucket+q
stx bucket+z ;26 cycles worst case
with $20 sprites phase1 is 768 cycles, 1976 cycles for phase 2 at worst case (highly unlikely) , overall ~43 rasterlines.
probably will use different sorting networks based on nr of sprites in bucket. that will reduce nr of compare/swap snippets per bucket. |
Previous - 1 | ... | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 - Next |