| |
Oswald
Registered: Apr 2002 Posts: 5086 |
Sorting
are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :) |
|
... 193 posts hidden. Click here to view all posts.... |
| |
Trash
Registered: Jan 2002 Posts: 122 |
I have made a mistake in my earlier posting about how to reduce the accuracy of the sorter...
Imagine you have 32 actors you want to sort and that K is 200, what you need is to reduce K so it holds the upper 5 bits (%11111 = 31).
The result will be a fast sorter that is inaccurate but accurate enough for sprite sorting.
64tass-style:
; YSortPositions contains the reduced spritepositions in Y
NUMBEROFBYTESTOBESORTED = 32
; With tables on zero-page
; 2 + ((3 + 9 + 6 + 19) * NUMBEROFBYTESTOBESORTED) - 3 + 2 + 3 + 12 = 1200
; With tables not on zero-page
; 2 + ((4 + 11 + 8 + 22) * NUMBEROFBYTESTOBESORTED) - 4 + 2 + 3 + 12 = 1455
DoSort
lda #0 ; 2
.for i = 0, i < NUMBEROFBYTESTOBESORTED, i = i + 1
sta SortTable + i ; 4 / 3
.next
.for i = 0, i < NUMBEROFBYTESTOBESORTED, i = i + 1
ldx YSortPositions + i ; 4 / 3
inc YSortPositions,x ; 7 / 6
;--------
; 11 / 9
.next
clc ; 2
.for i = 0, i < NUMBEROFBYTESTOBESORTED, i = i + 1
sta SortOrder + i ; 4 / 3
.if i < (NUMBEROFBYTESTOBESORTED - 1)
adc SortTable + i ; 4 / 3
.fi
;--------
; 8 / 6
.next
.for i = 0, i < NUMBEROFBYTESTOBESORTED, i = i + 1
ldx ySortPos + i ; 4 / 3
ldy SortOrder,x ; 4
inc SortOrder,x ; 7 / 6
lda #i ; 2
sta Sorted,y ; 5 / 4
;--------
; 22 / 19
.next
rts ; 12 (jsr + rts)
|
| |
lft
Registered: Jul 2007 Posts: 369 |
Typo on line 4, should increment at SortTable,x.
The problem with this approach is that you have to compute the upper 5 bits of the y-position somehow, and that will take time. It can be done during sorting, at the cost of an additional 2 cycles per actor, like this:
DoSort
lda #0
.for i = 0, i < NUMBEROFBYTESTOBESORTED, i = i + 1
sta SortTable + 8*i
.next
lda #$f8
.for i = 0, i < NUMBEROFBYTESTOBESORTED, i = i + 1
ldx YSortPositions + i
sbx #0
inc SortTable,x
.next
...
But I'm not convinced that the quality of this "rough sorting" is comparable with the real thing. Consider for instance the case where all actors are positioned 3 rasterlines apart. Whenever a hardware sprite becomes free, the multiplexer must take the next sprite according to the true sorting order, otherwise all sprite units will be busy eight actors later, so an actor must be dropped.
Nevertheless, it is nice to know that this option is available, in situations where the sprites aren't packed too closely but rastertime is tight. |
| |
Trash
Registered: Jan 2002 Posts: 122 |
Quoting lftTypo on line 4, should increment at SortTable,x.
Thanks, missed that..
Quoting lftThe problem with this approach is that you have to compute the upper 5 bits of the y-position somehow, and that will take time. It can be done during sorting, at the cost of an additional 2 cycles per actor, like this:
Or it could be done in a precalculated table, whatever fits your needs..
Quoting lftNevertheless, it is nice to know that this option is available, in situations where the sprites aren't packed too closely but rastertime is tight.
Yes, but if you have small amounts of rastertime you either should pre-sort your data or have data that fits with a bubblesort like the one THCM implemented...
I am (like you) not convinced that this sorter is the best for all use-cases but for some it's certainly the best option. |
| |
The Human Code Machine
Registered: Sep 2005 Posts: 112 |
In most game scenarios an optimized bubble sort with special routines to add or remove an actor should be faster. In my example the sorter takes ~20 raster lines most of the time with sprites moving up and down. For a game, accurate sorting is important, to get an optimal multiplexing result with reusing a free sprite slot as soon as possible. (as lft said above) |
| |
Digger
Registered: Mar 2005 Posts: 427 |
btw. Nice tool for visualising various sorting algos, Counting Sort included: https://visualgo.net/en/sorting |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1408 |
Well, an unrolled bubble iteration should only take about 8 lines for 32 actors if there's no change, 10ish if there are four or five swaps.
I do like Falco Paul's suggestion of kicking off an Ocean Sort but aborting and switching to bucket sort (or some other constant time algorithm) if the number of swaps passes a certain threshold.
(cf http://www.codebase64.org/doku.php?id=base:sprite_multiplexer_s.. )
As for making lft's suggestion a little more memory flexible, I'm wondering if writing an RTS into the increment field could be cost effective? |
| |
lft
Registered: Jul 2007 Posts: 369 |
I have thought of a different way to relax a memory constraint. Consider this variant of the bucket-emptying routine:
sty endptr+1
lax ylink,y
pha
lda actorlink,x
sta ylink,y
bpl endptr
sta inyfield,y
endptr jmp inyfield
The worst-case execution time is the same, while the case of multiple actors in the same bucket is now slower. But now the location of the link table is unconstrained. |
| |
lft
Registered: Jul 2007 Posts: 369 |
Quoting ChristopherJamI'm wondering if writing an RTS into the increment field could be cost effective?
That would imply a JSR out of the increment field, right? That still means you have to put the bucket-emptying routine in four different, fixed places. Meanwhile the execution time increases by a rasterline.
Perhaps you meant the other way around, JSR into the increment field and RTS to the bucket-emptying routine. That would indeed make the memory layout more flexible, but at the cost of three rasterlines.
(Edit: I see now that I misread your post. Of course you mean the second thing.) |
| |
lft
Registered: Jul 2007 Posts: 369 |
(removed) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1408 |
(intrigued) |
Previous - 1 | ... | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 - Next |