NOBJECTS = 24 ; Number of objects to be sorted OBJ_YPOS = $80 ; Y-values of all objects OBJ_SORT = $A0 ; Indices of all objects sorted by Y-value TEMP_SP = $FF ; Temporary stack pointer StackSort: ; Save stack pointer tsx stx TEMP_SP ; Phase 1 - Store Y-values in buckets .repeat NOBJECTS, i lda OBJ_YPOS + i ;lsr tax txs pla bpl *-1 ; Already value here, try next lda #i pha .endrep ; Phase 2 - Gather (indices of) sorted values ;ldx #(50/2)-1 ldx #50-1 txs .repeat NOBJECTS, i pla bmi *-1 sta OBJ_SORT + i lda #$ff ; Clean up after ourselves pha pla .endrep ; Restore stack pointer ldx TEMP_SP txs rts
*=$10 clear_bucket_counter lda #0 sta buckets+15 sta buckets+15+16*1 sta buckets+15+16*2 sta buckets+15+16*3 sta buckets+15+16*4 sta buckets+15+16*5 sta buckets+15+16*6 sta buckets+15+16*7 sta buckets+15+16*8 sta buckets+15+16*9 sta buckets+15+16*10 sta buckets+15+16*11 sta buckets+15+16*12 sta buckets+15+16*13 sta buckets+15+16*14 sta buckets+15+16*15 ldx #32 set_element_index stx element_index lda sprite_y,x ldx #$f0 sbx #$00 ;select right bucket ldy buckets+15,x ;get bucket counter (# elements in bucket) inc buckets+15,x ;increase bucket counter stx set_2+1 ;set high nybble bucket index dey bmi place_current_element ;element count=0 skip sort stx get_1+1 stx get_2+1 inx stx set_1+1 sty slot_hi+1 ;highest index compare_element_y get_1 ldx buckets,y cmp sprite_y,x dey bcc insert_lower_element ;woohoo, found a spot bpl compare_element_y insert_lower_element sty slot_lo+1 slot_hi ldy #0 move_higher_elements get_2 lda buckets,y ;move higher elements up set_1 sta buckets,y dey slot_lo cpy #$ff bcc move_higher_elements place_current_element ;finally, insert the lower element index in the right place iny lax element_index set_2 sta buckets,y dex bne set_element_index rts element_index .byte 0 *=$1000 buckets .fill 256,0
.const NSPRITES = 32 .const indexes = $60 .const ypos = $80 // SORT FORWARD // ************ // Check if values are sorted, if necessary call sortbw routine to // put an unsorted value in the correct position sortfw: .for(var i=0;i<NSPRITES;i++) { over: .if (i!=NSPRITES-1) { ldy indexes+1+i ldx indexes+i lda ypos,y cmp ypos,x bcs sortfw[i+1].over jsr sortbw[NSPRITES-2-i].back } else rts } // SORT BACKWARD // ************* // Move backward a value in .A which is less than the one in current position. // All values at the left of the current position are ordered so, once we found // the correct spot for .A, we don't need to check forward, we can get back // to where the function was called sortbw: .for(var i=0;i<NSPRITES-1;i++) { back: stx indexes+NSPRITES-1-i .if (i!=NSPRITES-2) { ldx indexes+NSPRITES-3-i cmp ypos,x bcc sortbw[i+1].back } sty indexes+NSPRITES-2-i rts }
.bucketK bcc .nextBucket ;Branches always: decrease this value by 3 with selfmodifying code in phase 1 for every new sprite in the bucket. lda #sprite_index ;index of last sprite pha ... lda #sprite_index ;index of first sprite. pha sty .bucketK+1 ;Restore branch (to skip all sprites for empty buckets). Call with y=$1b for max 8 sprites per bucket. .nextBucket ;unrolled ------------- 3 cycles for an empty bucket, 12 voor buckets with 1 sprite => 3*(220-32)+12*32 = 948 cycli
dex ;counts sprite index. Initialized with $20. bmi .phase2 lda Yvalues,x asl bcs *+8 ;There are more than 128 buckets, so I use 2 jump tables sta *+4 jmp ($xxxx) ;jump table for buckets <128 ;jumps to code that fills out phase2 Bucket code (see below) sta *+4 jmp ($xxxx) ;jump table for buckets >=128 ;21/22 cycles sofar .phase2 clc ;to branch always (either to next bucket or to fetch the sprite indices in the current bucket). ldy #27 ;to restore the branch value of the bucket jmp .Bucket0 ;code to fill out bucketK ldy .bucketK+1 ;The trick is that I use this as an index for the number of sprites already in the bucket lda newBranchValue,y ;fetch value to branch to the code for one more sprite sta .bucketK+1 ;since this is unrolled code, the address of .bucketK is known tay txa sta .bucketK+3,y ;Store new sprite index. This works since the branch value decreases by 3 for each new sprite, but the address where to store the sprite index as well! ... ;Unrolled => repeat all the above code starting with dex. -------------- 41/42 cycles ->41.5*32 = 1328
newBranchValue BYTE 0 x x 0 x x 3 x x 6 x x 9 x x 12 x x 15 x x 18 x x x x x 21 x
Excellent work, Color Bar. The 25 additional cycles from page boundary crossings you mention can be avoided by padding the output routine out to 32 bytes by adding three bytes of padding after the BCC .nextBucket, and starting bucket0 at $xxff It takes up another 660 bytes, but the routine is already occupying most of a bank as it is, so that's small change at this point ;)