| |
Oswald
Registered: Apr 2002 Posts: 5086 |
Sorting
are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :) |
|
... 193 posts hidden. Click here to view all posts.... |
| |
Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
Copyfault, that is a very interesting approach. But to halve the Y position resolution an AND #$fe has to be applied, which costs an additional 2 cycles per actor. And if you want to preserve the full resolution values as well (for use in the Y updating routine for example), you will have to store them somewhere, adding more to the cycle count. |
| |
Copyfault
Registered: Dec 2001 Posts: 474 |
Quote: Copyfault, that is a very interesting approach. But to halve the Y position resolution an AND #$fe has to be applied, which costs an additional 2 cycles per actor. And if you want to preserve the full resolution values as well (for use in the Y updating routine for example), you will have to store them somewhere, adding more to the cycle count.
Ofcourse this works only if the game concept allows for all objects moving with suitable speed (all values hsve to be a multiple of two).
Otherwise you're right, all the necessary additional position handling will most probably eat up too much cycles. What's the current best case cycle count?? |
| |
Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
See posts #91 and #101. It is about 3*k+43*n.
For lft's Field Sort (post #126) it is 2*k+55*n. |
| |
Trash
Registered: Jan 2002 Posts: 122 |
I have made a mistake in my earlier posting about how to reduce the accuracy of the sorter...
Imagine you have 32 actors you want to sort and that K is 200, what you need is to reduce K so it holds the upper 5 bits (%11111 = 31).
The result will be a fast sorter that is inaccurate but accurate enough for sprite sorting.
64tass-style:
; YSortPositions contains the reduced spritepositions in Y
NUMBEROFBYTESTOBESORTED = 32
; With tables on zero-page
; 2 + ((3 + 9 + 6 + 19) * NUMBEROFBYTESTOBESORTED) - 3 + 2 + 3 + 12 = 1200
; With tables not on zero-page
; 2 + ((4 + 11 + 8 + 22) * NUMBEROFBYTESTOBESORTED) - 4 + 2 + 3 + 12 = 1455
DoSort
lda #0 ; 2
.for i = 0, i < NUMBEROFBYTESTOBESORTED, i = i + 1
sta SortTable + i ; 4 / 3
.next
.for i = 0, i < NUMBEROFBYTESTOBESORTED, i = i + 1
ldx YSortPositions + i ; 4 / 3
inc YSortPositions,x ; 7 / 6
;--------
; 11 / 9
.next
clc ; 2
.for i = 0, i < NUMBEROFBYTESTOBESORTED, i = i + 1
sta SortOrder + i ; 4 / 3
.if i < (NUMBEROFBYTESTOBESORTED - 1)
adc SortTable + i ; 4 / 3
.fi
;--------
; 8 / 6
.next
.for i = 0, i < NUMBEROFBYTESTOBESORTED, i = i + 1
ldx ySortPos + i ; 4 / 3
ldy SortOrder,x ; 4
inc SortOrder,x ; 7 / 6
lda #i ; 2
sta Sorted,y ; 5 / 4
;--------
; 22 / 19
.next
rts ; 12 (jsr + rts)
|
| |
lft
Registered: Jul 2007 Posts: 369 |
Typo on line 4, should increment at SortTable,x.
The problem with this approach is that you have to compute the upper 5 bits of the y-position somehow, and that will take time. It can be done during sorting, at the cost of an additional 2 cycles per actor, like this:
DoSort
lda #0
.for i = 0, i < NUMBEROFBYTESTOBESORTED, i = i + 1
sta SortTable + 8*i
.next
lda #$f8
.for i = 0, i < NUMBEROFBYTESTOBESORTED, i = i + 1
ldx YSortPositions + i
sbx #0
inc SortTable,x
.next
...
But I'm not convinced that the quality of this "rough sorting" is comparable with the real thing. Consider for instance the case where all actors are positioned 3 rasterlines apart. Whenever a hardware sprite becomes free, the multiplexer must take the next sprite according to the true sorting order, otherwise all sprite units will be busy eight actors later, so an actor must be dropped.
Nevertheless, it is nice to know that this option is available, in situations where the sprites aren't packed too closely but rastertime is tight. |
| |
Trash
Registered: Jan 2002 Posts: 122 |
Quoting lftTypo on line 4, should increment at SortTable,x.
Thanks, missed that..
Quoting lftThe problem with this approach is that you have to compute the upper 5 bits of the y-position somehow, and that will take time. It can be done during sorting, at the cost of an additional 2 cycles per actor, like this:
Or it could be done in a precalculated table, whatever fits your needs..
Quoting lftNevertheless, it is nice to know that this option is available, in situations where the sprites aren't packed too closely but rastertime is tight.
Yes, but if you have small amounts of rastertime you either should pre-sort your data or have data that fits with a bubblesort like the one THCM implemented...
I am (like you) not convinced that this sorter is the best for all use-cases but for some it's certainly the best option. |
| |
The Human Code Machine
Registered: Sep 2005 Posts: 112 |
In most game scenarios an optimized bubble sort with special routines to add or remove an actor should be faster. In my example the sorter takes ~20 raster lines most of the time with sprites moving up and down. For a game, accurate sorting is important, to get an optimal multiplexing result with reusing a free sprite slot as soon as possible. (as lft said above) |
| |
Digger
Registered: Mar 2005 Posts: 427 |
btw. Nice tool for visualising various sorting algos, Counting Sort included: https://visualgo.net/en/sorting |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1408 |
Well, an unrolled bubble iteration should only take about 8 lines for 32 actors if there's no change, 10ish if there are four or five swaps.
I do like Falco Paul's suggestion of kicking off an Ocean Sort but aborting and switching to bucket sort (or some other constant time algorithm) if the number of swaps passes a certain threshold.
(cf http://www.codebase64.org/doku.php?id=base:sprite_multiplexer_s.. )
As for making lft's suggestion a little more memory flexible, I'm wondering if writing an RTS into the increment field could be cost effective? |
| |
lft
Registered: Jul 2007 Posts: 369 |
I have thought of a different way to relax a memory constraint. Consider this variant of the bucket-emptying routine:
sty endptr+1
lax ylink,y
pha
lda actorlink,x
sta ylink,y
bpl endptr
sta inyfield,y
endptr jmp inyfield
The worst-case execution time is the same, while the case of multiple actors in the same bucket is now slower. But now the location of the link table is unconstrained. |
Previous - 1 | ... | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 - Next |