[CSDb] - User Forums

Welcome to our latest new user megasoftargentina ! (Registered 2024-04-16)

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Sorting

2007-10-08 16:08

Oswald

Registered: Apr 2002
Posts: 5017

Sorting

are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :)

... 193 posts hidden. Click here to view all posts....

2017-08-13 19:26

doynax
Account closed

Registered: Oct 2004
Posts: 212

Very neat. I never would have considered doing a sparse bucket sort :)

Oh, and you probably shouldn't take my stated radix sort timings at face-value. The sparse sort implementation may easily be faster already.
Plus counting sorting cycles in isolation is a bit of a limited model. A major part depends on whether the sorting steps slot into the game's actor and multiplexer updates such that the loops may be folded and loaded data reused.

(Off-topic, but the coarse bucket sort method can be improved for less frantic applications by using a stable sort while cycling through the remainders as offsets for the division table lookup each frame, eventually reaching the correct order for static displays. It helps if the permutation is chosen to minimize settling errors, and perhaps using a prime divisor or randomized shuffle to avoid interference against sprite movement.)

2017-08-14 07:08

ChristopherJam

Registered: Aug 2004
Posts: 1370

Quoting doynax

Very neat. I never would have considered doing a sparse bucket sort :)

Thanks!

Quote:

Oh, and you probably shouldn't take my stated radix sort timings at face-value. The sparse sort implementation may easily be faster already.
Plus counting sorting cycles in isolation is a bit of a limited model. A major part depends on whether the sorting steps slot into the game's actor and multiplexer updates such that the loops may be folded and loaded data reused.

Yes, I wasn't sure how best to benchmark. Code as posted has an unrolled loop for pulling out the indices into a linear list, so the time for subsequent 'fetch next index' operations is just 10 cycles per actor (inc *+4:ldy $xxxx:bmi noneleft), for a total of 2742 cycles per frame (43.5 raster lines) to sort and then fetch 32 indices.

FWIW most of the time is "per used bucket", so if there are only 24(?) distinct Y values, I suspect I'd come out ahead, but if there are that many horizontally aligned enemies a zone sort's likely to be sufficient anyway.

I did experiment with just inserting into the sparse bucket list in the sort call (34 cycles per actor, 17.2 raster lines), then only pulling them out as required, but that basically requires either coroutines or a lot of loading and storing of sentinels, so my best fetch time was up around 62 cycles per actor, for a fairly unappealing total of (34+62)*32=3072 cycles per frame (48.7 raster lines)

I suspect that if one just wanted to minimise CPU time expended by the sorter in in the border, a plain bucket sort would be better; insertion's only 14 cycles per actor (7 rasters!), and the times it takes a while for 'fetch next' are precisely when the next actor is a fair way down the screen from the one you just dealt with, so timing wouldn't be quite so critical anyway.

Quote:

(Off-topic, but the coarse bucket sort method can be improved for less frantic applications by using a stable sort while cycling through the remainders as offsets for the division table lookup each frame, eventually reaching the correct order for static displays. It helps if the permutation is chosen to minimize settling errors, and perhaps using a prime divisor or randomized shuffle to avoid interference against sprite movement.)

Interesting point. At first I was thinking that was kind of shell sort related, but the latter is intrinsically unstable. Might have to run some simulations

2017-08-14 12:18

doynax
Account closed

Registered: Oct 2004
Posts: 212

Quoting ChristopherJam

Code as posted has an unrolled loop for pulling out the indices into a linear list, so the time for subsequent 'fetch next index' operations is just 10 cycles per actor (inc *+4:ldy $xxxx:bmi noneleft), for a total of 2742 cycles per frame (43.5 raster lines) to sort and then fetch 32 indices.

Well, `PLA : TAX` with the branch and sentinel baked into the actor class dispatch might work. And if the update happens to leave the Y-coordinate lying around then you might as well make use of it.

Quoting ChristopherJam

I did experiment with just inserting into the sparse bucket list in the sort call (34 cycles per actor, 17.2 raster lines), then only pulling them out as required, but that basically requires either coroutines or a lot of loading and storing of sentinels, so my best fetch time was up around 62 cycles per actor

Yeah, the bitset introduces state which is difficult to get rid whereas the radix sorts gets away with it by chaining the buckets together into a unified list.

Quote:

Interesting point. At first I was thinking that was kind of shell sort related, but the latter is intrinsically unstable. Might have to run some simulations

Oh, nothing remotely fancy. I just meant that a bucket sort need not necessarily to use the same partition every frame. So with bucket=(y+k)/N then by cycling through 0≤k<N each frame a stable sort produces the correct ordering in at most N frames for static scenery. This is effectively free and with a carefully permutation the error tends to decay exponentially even for moving objects.

2017-08-14 13:34

ChristopherJam

Registered: Aug 2004
Posts: 1370

Quoting doynax

Well, `PLA : TAX` with the branch and sentinel baked into the actor class dispatch might work. And if the update happens to leave the Y-coordinate lying around then you might as well make use of it.

Indeed. And for the first 8 sprites, you can just LDX immediate directly from $017f down.

Quote:

Yeah, the bitset introduces state which is difficult to get rid whereas the radix sorts gets away with it by chaining the buckets together into a unified list.

Exactly. The plain bucket sort has the same advantage (TBH I pinched the idea from Playstation order-tables - it's how they z-sort the texture mapped triangles, by inserting them into a linked list of null graphics primitives)

Quote:

Oh, nothing remotely fancy. I just meant that a bucket sort need not necessarily to use the same partition every frame. So with bucket=(y+k)/N then by cycling through 0≤k<N each frame a stable sort produces the correct ordering in at most N frames for static scenery. This is effectively free and with a carefully permutation the error tends to decay exponentially even for moving objects.

Yes; I got there eventually :)

2017-09-04 16:19

Rastah Bar

Registered: Oct 2012
Posts: 336

A brute force approach seems already very efficient. I think that can be done in 35.6 lines or less for 32 sprite positions and 220 buckets.

;Fill Buckets
ldx input
inc Buckets,x
...
ldx input+31
inc Buckets,x
-------------
10*32=320 cycles assuming input is in ZP.

;Collect sorted values
ldy Buckets
beq next1:   ;Skip if empty.
lda #value1  ;Value corresponding to this bucket.       
pha
dey
bne *-2      ;In case not unique.
sty Buckets  ;Empty the buckets.
next1:
ldy Buckets+1
beq next2:
lda #value2
pha
dey
bne *-2
sty Buckets
next2:
;etc.

If all input numbers are unique the collection stage costs 7 cycles for empty buckets and 19 cycles for the buckets with 1 input value for a total of 7*(220-32)+19*32+320 = 35.6 lines.

If not all input numbers are unique, the cost for a bucket with k numbers is 12+(k-1)*8+7 cycles. Adding one more to a non-empty bucket costs 8 cycles, but since there is then one bucket less with one contribution, and one more empty bucket, one gains 4 cycles overall.

But I could have made a mistake in these calculations or missed something else (initialization, overhead ...).

2017-09-04 20:56

ChristopherJam

Registered: Aug 2004
Posts: 1370

Color Bar, that is indeed a fast way to get a sorted list of Y values, but what is needed is a sorted list of the indices of the sprites that have those Y positions.

(eg if sprites 1, 2, 3 and 4 are at positions 10, 46, 73 and 32, the required output is 1,4,2,3, not 10,32,46,73 )

2017-09-05 04:53

Oswald

Registered: Apr 2002
Posts: 5017

35-40 lines for 32 sprites is crazy good guys 1.x line per sprite.

2017-09-05 11:52

Rastah Bar

Registered: Oct 2012
Posts: 336

Quote: Color Bar, that is indeed a fast way to get a sorted list of Y values, but what is needed is a sorted list of the indices of the sprites that have those Y positions.

(eg if sprites 1, 2, 3 and 4 are at positions 10, 46, 73 and 32, the required output is 1,4,2,3, not 10,32,46,73 )

Oh, I see, thanks.

The fact that multiple sprites can have the same Y values makes the problem considerably harder, it seems.

2017-09-06 21:34

Perplex

Registered: Feb 2009
Posts: 254

Here is some old sort code I have used, I think it's originally from Armalyte, or at least based on that code (by Gary Liddon and Dan Phillips.)

It uses the stack for buckets, so lower your stack pointer to somewhere below 50 (or whatever is your lowest Y-value for sorting), then initialize the stack area above that with $ff values.


NOBJECTS  = 24	; Number of objects to be sorted

OBJ_YPOS  = $80	; Y-values of all objects
OBJ_SORT  = $A0	; Indices of all objects sorted by Y-value
TEMP_SP   = $FF	; Temporary stack pointer

StackSort:

	; Save stack pointer
	tsx		
	stx TEMP_SP

	; Phase 1 - Store Y-values in buckets
	.repeat NOBJECTS, i
	  lda OBJ_YPOS + i
	  ;lsr
	  tax
	  txs
	  pla
	  bpl *-1 ; Already value here, try next
	  lda #i
	  pha
	.endrep

	; Phase 2 - Gather (indices of) sorted values
	;ldx #(50/2)-1
	ldx #50-1
	txs
	.repeat NOBJECTS, i
	  pla
	  bmi *-1
	  sta OBJ_SORT + i
	  lda #$ff ; Clean up after ourselves
	  pha
	  pla
	.endrep

	; Restore stack pointer
	ldx TEMP_SP
	txs
	rts

The two commented out lines are for speeding up the gathering phase by halving each Y-value. Depending on how your multiplexer works, you may not need exact order for very close objects if you handle those at the same time.

In circumstances where limiting the amount of available stack is not viable, it can easily be modified to use some other memory area at the cost of a few cycles per object.

2017-09-07 07:40

The Human Code Machine

Registered: Sep 2005
Posts: 109

While working on Vincenct 2 in the nineties I had the same problem finding a fast solution for sorting sprites and for most purposes this sorter here: http://codebase64.org/doku.php?id=base:flexible_32_sprite_multi..
was the best I came up with. Inspired by Rune Gram-Madsen's Bubble a la Rune sorting routine, published at the Asm-One manual, I wrote a fast implementation for the c64. It's a modified bubble sort working bidirectional and for presorted arrays it's very fast.

Previous - 1 | ... | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | ... | 21 - Next

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

zscs
Guests online: 131

Top Demos

1 Next Level  (9.8)
2 Mojo  (9.7)
3 Coma Light 13  (9.7)
4 Edge of Disgrace  (9.6)
5 Comaland 100%  (9.6)
6 No Bounds  (9.6)
7 Uncensored  (9.6)
8 Wonderland XIV  (9.6)
9 The Ghost  (9.6)
10 Bromance  (9.6)

Top onefile Demos

1 It's More Fun to Com..  (9.9)
2 Party Elk 2  (9.7)
3 Cubic Dream  (9.6)
4 Copper Booze  (9.5)
5 Rainbow Connection  (9.5)
6 TRSAC, Gabber & Pebe..  (9.5)
7 Onscreen 5k  (9.5)
8 Dawnfall V1.1  (9.5)
9 Quadrants  (9.5)
10 Daah, Those Acid Pil..  (9.5)

Top Groups

1 Oxyron  (9.3)
2 Nostalgia  (9.3)
3 Booze Design  (9.3)
4 Censor Design  (9.3)
5 Crest  (9.3)

Top Graphicians

1 Sulevi  (10)
2 Mirage  (9.8)
3 Lobo  (9.7)
4 Mikael  (9.7)
5 Archmage  (9.7)

Page generated in: 0.066 sec.