Try losing precision on K by the right amount and in the right way and it will improve without losing to much accuracy....
ldx #offset for SPRITE = 0 to n-1 dcp (ypos(SPRITE),x) ;operand byte is also memory for ypos next lda #$ff ldx #$00 for LINE = 0 to 2*(floor((k/2)-2)) STEP 2 stx start+LINE sbx #$00 ;operand byte is memory for the count-array next stx start+2*(floor((k/2)-1)) for SPRITE = 0 to n-1 ldx ypos(SPRITE) ldy start,x lda #SPRITE sta output,y inc start,x lda #0 sta (offset,x) next
... Sprite0_ypos = *+1 dcp (ypos(Sprite0),x) Sprite1_ypos = *+1 dcp (ypos(Sprite1),x) ...
lda #$ff ldx #$00 stx start+0 count_ypos0 = *+1 sbx #$00 stx start+2 count_ypos2 = *+1 sbx #$00 stx start+4 ...
ypos0 .byte <count_ypos0, >count_ypos0 ypos2 .byte <count_ypos2, >count_ypos2 ...
; YSortPositions contains the reduced spritepositions in Y NUMBEROFBYTESTOBESORTED = 32 ; With tables on zero-page ; 2 + ((3 + 9 + 6 + 19) * NUMBEROFBYTESTOBESORTED) - 3 + 2 + 3 + 12 = 1200 ; With tables not on zero-page ; 2 + ((4 + 11 + 8 + 22) * NUMBEROFBYTESTOBESORTED) - 4 + 2 + 3 + 12 = 1455 DoSort lda #0 ; 2 .for i = 0, i < NUMBEROFBYTESTOBESORTED, i = i + 1 sta SortTable + i ; 4 / 3 .next .for i = 0, i < NUMBEROFBYTESTOBESORTED, i = i + 1 ldx YSortPositions + i ; 4 / 3 inc YSortPositions,x ; 7 / 6 ;-------- ; 11 / 9 .next clc ; 2 .for i = 0, i < NUMBEROFBYTESTOBESORTED, i = i + 1 sta SortOrder + i ; 4 / 3 .if i < (NUMBEROFBYTESTOBESORTED - 1) adc SortTable + i ; 4 / 3 .fi ;-------- ; 8 / 6 .next .for i = 0, i < NUMBEROFBYTESTOBESORTED, i = i + 1 ldx ySortPos + i ; 4 / 3 ldy SortOrder,x ; 4 inc SortOrder,x ; 7 / 6 lda #i ; 2 sta Sorted,y ; 5 / 4 ;-------- ; 22 / 19 .next rts ; 12 (jsr + rts)
DoSort lda #0 .for i = 0, i < NUMBEROFBYTESTOBESORTED, i = i + 1 sta SortTable + 8*i .next lda #$f8 .for i = 0, i < NUMBEROFBYTESTOBESORTED, i = i + 1 ldx YSortPositions + i sbx #0 inc SortTable,x .next ...
Typo on line 4, should increment at SortTable,x.
The problem with this approach is that you have to compute the upper 5 bits of the y-position somehow, and that will take time. It can be done during sorting, at the cost of an additional 2 cycles per actor, like this:
Nevertheless, it is nice to know that this option is available, in situations where the sprites aren't packed too closely but rastertime is tight.