[CSDb] - User Forums

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Sorting

2007-10-08 16:08

Oswald

Registered: Apr 2002
Posts: 5127

Sorting

are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :)

... 193 posts hidden. Click here to view all posts....

2017-09-06 21:34

Perplex

Registered: Feb 2009
Posts: 257

Here is some old sort code I have used, I think it's originally from Armalyte, or at least based on that code (by Gary Liddon and Dan Phillips.)

It uses the stack for buckets, so lower your stack pointer to somewhere below 50 (or whatever is your lowest Y-value for sorting), then initialize the stack area above that with $ff values.

NOBJECTS  = 24	; Number of objects to be sorted

OBJ_YPOS  = $80	; Y-values of all objects
OBJ_SORT  = $A0	; Indices of all objects sorted by Y-value
TEMP_SP   = $FF	; Temporary stack pointer

StackSort:

	; Save stack pointer
	tsx		
	stx TEMP_SP

	; Phase 1 - Store Y-values in buckets
	.repeat NOBJECTS, i
	  lda OBJ_YPOS + i
	  ;lsr
	  tax
	  txs
	  pla
	  bpl *-1 ; Already value here, try next
	  lda #i
	  pha
	.endrep

	; Phase 2 - Gather (indices of) sorted values
	;ldx #(50/2)-1
	ldx #50-1
	txs
	.repeat NOBJECTS, i
	  pla
	  bmi *-1
	  sta OBJ_SORT + i
	  lda #$ff ; Clean up after ourselves
	  pha
	  pla
	.endrep

	; Restore stack pointer
	ldx TEMP_SP
	txs
	rts

The two commented out lines are for speeding up the gathering phase by halving each Y-value. Depending on how your multiplexer works, you may not need exact order for very close objects if you handle those at the same time.

In circumstances where limiting the amount of available stack is not viable, it can easily be modified to use some other memory area at the cost of a few cycles per object.

2017-09-07 07:40

The Human Code Machine

Registered: Sep 2005
Posts: 114

While working on Vincenct 2 in the nineties I had the same problem finding a fast solution for sorting sprites and for most purposes this sorter here: http://codebase64.org/doku.php?id=base:flexible_32_sprite_multi..
was the best I came up with. Inspired by Rune Gram-Madsen's Bubble a la Rune sorting routine, published at the Asm-One manual, I wrote a fast implementation for the c64. It's a modified bubble sort working bidirectional and for presorted arrays it's very fast.

2017-09-07 09:17

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

Perplex, that is interesting, thanks for sharing the code. Do you know of an efficient modification that ensures exact ordering?

2017-09-07 12:53

Perplex

Registered: Feb 2009
Posts: 257

If you need exact ordering even with many values clustered close together, then simply moving on to the next unoccupied bucket won't really work, so you'd need a quite different approach, I think.

2017-09-07 20:19

Hein

Registered: Apr 2004
Posts: 965

Using bucket sort ($10 size) and insertion sort to sort the indices of a list of y-values. In this case it takes around 40 rasterlines for 32 sprites when placed on zp. It assumes there's no need for more than 8 sprites per bucket, but 15 are possible.

The multiplexer checks per bucket based on the counter at the end of each bucket.

	*=$10
clear_bucket_counter

	lda #0
	sta buckets+15
	sta buckets+15+16*1
	sta buckets+15+16*2
	sta buckets+15+16*3
	sta buckets+15+16*4
	sta buckets+15+16*5
	sta buckets+15+16*6
	sta buckets+15+16*7
	sta buckets+15+16*8
	sta buckets+15+16*9
	sta buckets+15+16*10
	sta buckets+15+16*11
	sta buckets+15+16*12
	sta buckets+15+16*13
	sta buckets+15+16*14
	sta buckets+15+16*15

	
	ldx #32
set_element_index
	stx element_index
	
	lda sprite_y,x
	ldx #$f0
	sbx #$00	;select right bucket

	ldy buckets+15,x	;get bucket counter (# elements in bucket)
	inc buckets+15,x	;increase bucket counter

	stx set_2+1	;set high nybble bucket index
	dey
	bmi place_current_element	;element count=0 skip sort
			
	stx get_1+1	
	stx get_2+1	
	inx
	stx set_1+1
	
			
	sty slot_hi+1	;highest index
	
compare_element_y
get_1	ldx buckets,y
	cmp sprite_y,x
	dey
	bcc insert_lower_element	;woohoo, found a spot
	bpl compare_element_y
	
insert_lower_element
	sty slot_lo+1	

slot_hi	ldy #0

move_higher_elements

get_2	lda buckets,y		;move higher elements up
set_1	sta buckets,y
	dey
slot_lo	cpy #$ff		
 	bcc move_higher_elements

place_current_element		;finally, insert the lower element index in the right place
	iny
	lax element_index
set_2	sta buckets,y
	dex
	bne set_element_index

	rts

element_index
	.byte 0	


	*=$1000
buckets
	.fill 256,0

2017-09-08 11:16

Fresh

Registered: Jan 2005
Posts: 101

I use a slightly modified version of "Ocean" sorter, with a (somewhat) optimized backward sorting routine.
Its worst case is significantly better than plain Ocean.
Code in kickass syntax.

.const	NSPRITES  	= 32
.const	indexes		= $60
.const	ypos		= $80

// SORT FORWARD
// ************
// Check if values are sorted, if necessary call sortbw routine to
// put an unsorted value in the correct position
sortfw:	
	.for(var i=0;i<NSPRITES;i++) {
		over:
			.if (i!=NSPRITES-1) {
				ldy indexes+1+i
				ldx indexes+i
				lda ypos,y
				cmp ypos,x
				bcs sortfw[i+1].over								
				jsr sortbw[NSPRITES-2-i].back				
			}
			else	rts			
	}

// SORT BACKWARD
// *************
// Move backward a value in .A which is less than the one in current position.
// All values at the left of the current position are ordered so, once we found
// the correct spot for .A, we don't need to check forward, we can get back
// to where the function was called
sortbw:
	.for(var i=0;i<NSPRITES-1;i++)	{
		back:
			stx indexes+NSPRITES-1-i
			.if (i!=NSPRITES-2) {
				ldx indexes+NSPRITES-3-i
				cmp ypos,x						
				bcc sortbw[i+1].back		
			}
			sty indexes+NSPRITES-2-i
			rts			
	}

2017-09-10 22:29

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

Quote: Color Bar, that is indeed a fast way to get a sorted list of Y values, but what is needed is a sorted list of the indices of the sprites that have those Y positions.

(eg if sprites 1, 2, 3 and 4 are at positions 10, 46, 73 and 32, the required output is 1,4,2,3, not 10,32,46,73 )

I think that can be done in less than 37 rasterlines.

Consider the following unrolled code for outputting sprite indices, sorted according to Y-value (phase 2)

.bucketK
bcc .nextBucket		;Branches always: decrease this value by 3 with selfmodifying code in phase 1 for every new sprite in the bucket.
lda #sprite_index	;index of last sprite
pha		
...	
lda #sprite_index  	;index of first sprite.
pha		
sty .bucketK+1	;Restore branch (to skip all sprites for empty buckets). Call with y=$1b for max 8 sprites per bucket. 

.nextBucket	;unrolled
-------------
3 cycles for an empty bucket, 12 voor buckets with 1 sprite => 3*(220-32)+12*32 = 948 cycli

Phase1: The following unrolled code jumps to the code that modifies the above phase 2 code for the right bucket:

dex   ;counts sprite index. Initialized with $20.
bmi .phase2
lda Yvalues,x   
asl
bcs *+8   ;There are more than 128 buckets, so I use 2 jump tables
sta *+4
jmp ($xxxx) ;jump table for buckets <128	;jumps to code that fills out phase2 Bucket code (see below)
sta *+4
jmp ($xxxx) ;jump table for buckets >=128	;21/22 cycles sofar
.phase2
clc	;to branch always (either to next bucket or to fetch the sprite indices in the current bucket).
ldy #27		;to restore the branch value of the bucket
jmp .Bucket0
           

;code to fill out bucketK
ldy .bucketK+1   ;The trick is that I use this as an index for the number of sprites already in the bucket
lda newBranchValue,y		;fetch value to branch to the code for one more sprite 
sta .bucketK+1   ;since this is unrolled code, the address of .bucketK is known
tay
txa
sta .bucketK+3,y  ;Store new sprite index. This works since the branch value decreases by 3 for each new sprite, but the address where to store the sprite index as well!
...	;Unrolled => repeat all the above code starting with dex.
--------------
41/42 cycles ->41.5*32 = 1328

In total 1328+948 =2276 -> 36.1 lines.
But there will be at least 25 additional cycles from page boundary crossings. Slightly less than 37 lines I think if I did not overlook something.

Table with branch values (for max 8 sprites per bucket) looks like (x=don't care)

newBranchValue
 BYTE 0 x x 0 x x 3 x x 6 x x 9 x x 12 x x 15 x x 18 x x x x x 21 x

The first zero makes that a full bucket replaces the last sprite with the new one.

2017-09-12 09:28

ChristopherJam

Registered: Aug 2004
Posts: 1424

Excellent work, Color Bar.

If every sprite lands in the second jump table, that pushes you up to 42 cycles for every insertion, but total time in that case is still only
3*(220-32)+12*32 + 42*32 = 2292 cycles, or 36.4 raster lines.

The 25 additional cycles from page boundary crossings you mention can be avoided by padding the output routine out to 32 bytes by adding three bytes of padding after the BCC .nextBucket, and starting bucket0 at $xxff

It takes up another 660 bytes, but the routine is already occupying most of a bank as it is, so that's small change at this point ;)

2017-09-12 10:36

ChristopherJam

Registered: Aug 2004
Posts: 1424

Should be able to save about 1760 bytes by only including the last 12 bytes of the insertion code in every third routine; the preceding and succeeding routine for each complete routine can just jump to the nearest complete one (the part from the second sta *+4 onward).

Timing's unchanged, but drags it back down to around 14k including jump tables.

2017-09-12 11:48

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

Quoting ChristopherJam

Excellent work, Color Bar.
The 25 additional cycles from page boundary crossings you mention can be avoided by padding the output routine out to 32 bytes by adding three bytes of padding after the BCC .nextBucket, and starting bucket0 at $xxff

It takes up another 660 bytes, but the routine is already occupying most of a bank as it is, so that's small change at this point ;)

Those 3 padding bytes can be put to good use by allowing 9 sprites in a bucket.

The routine probably uses too much memory for many applications.

Previous - 1 | ... | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | ... | 21 - Next

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

Guests online: 533

Top Demos

1 Next Level  (9.7)
2 13:37  (9.7)
3 Codeboys & Endians  (9.7)
4 Mojo  (9.6)
5 Coma Light 13  (9.6)
6 Edge of Disgrace  (9.6)
7 Signal Carnival  (9.6)
8 Wonderland XIV  (9.5)
9 Uncensored  (9.5)
10 Comaland 100%  (9.5)

Top onefile Demos

1 Nine  (9.7)
2 Layers  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.5)
6 Scan and Spin  (9.5)
7 Onscreen 5k  (9.5)
8 Grey  (9.5)
9 Dawnfall V1.1  (9.5)
10 Rainbow Connection  (9.5)

Top Groups

1 Artline Designs  (9.3)
2 Booze Design  (9.3)
3 Performers  (9.3)
4 Oxyron  (9.3)
5 Censor Design  (9.3)

Top Cover Designers

1 Duce  (9.8)
2 Electric  (9.8)
3 Junkie  (9.6)
4 The Elegance  (9.5)
5 Mermaid  (9.3)

Page generated in: 0.066 sec.