| |
Oswald
Registered: Apr 2002 Posts: 5017 |
Sorting
are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :) |
|
... 193 posts hidden. Click here to view all posts.... |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1370 |
Got a new record now, I think.
We can pack 221 addresses into a 222 byte table by using the high byte of each address as the low byte of the next address.
If we use something like this:
for i in 0..73:
.byt 64+(i%74)*(1+i//74*24)%74)
.. there are no two addresses less than 24 bytes apart.
The fill routine then drops down to this:
;code to fill out bucketK. 24 bytes, 36 cycles
fillK
ldy .bucketK+1
lda newBranchValue,y
sta .bucketK+1 ; 12
tay
txa
sta .bucketK+3,y ; 9
dex
lda Yvalues,x
sta *+4
jmp ($xxxx) ; 15
; 36*32 = 1152 cycle, 36 per actor
Just 24 bytes long :)
We avoid the check for x becoming negative by placing actor $ff at y=0, and having the 0 entry of the branch table point to a routine
that checks X to determine whether we've reached the end of the actor list, or are just dealing with an actor that has gone offscreen.
The bucket emptying code remains the same as in Color Bar's original:
.bucketK ;32 bytes
bcc .nextBucket ;Branches always: decrease this value by 3 with selfmodifying code in phase 1 for every new sprite in the bucket.
!byt 0,0,0
lda #sprite_index ; repeat lda/pha eight times
pha
...
sty .bucketK+1 ;Restore branch (to skip all sprites for empty buckets). Call with y=$1e for max 8 sprites per bucket.
.nextBucket ;unrolled
; 3 cycles for an empty bucket, 12 voor buckets with 1 sprite => 3*(220-32)+12*32 = 948 cycli , 29.6 per actor
The bucket filling routines are packed three to a page over 74 pages, so around 19kb. Each page has an average of 184 bytes empty, so we can probably pack the bucket emptying code into the gaps. Note that some of the buckets would then need to branch past a group of bucket filling routines, so we'd need to use both X and Y for the two different values to restore the branches to "this bucket is empty"
So - total time, 36*32+3*(220-32)+12*32= 220*3+45*32 = 2100 cycles. 65.6 per sprite, 33.33 rasters total.
For 37 sprites or more, we're down to under a raster per sprite :D |
| |
Rastah Bar
Registered: Oct 2012 Posts: 336 |
Great! I still have to study your post in detail, but maybe another 64 cycles gain be gained by adding 3 NOPs before the first sprite in the bucket, like this:
.bucketK
bcc .nextBucket
lda #sprite_index8
pha
...
nop
nop
nop
lda #sprite_index1
pha
sty .bucketK+1
Then the TAY can be removed from the fill code:
ldy .bucketK+1
lda newBranchValue,y
sta .bucketK+1
txa
sta .bucketK,y
and the branch value table changes to:
newBranchValue
BYTE 0 x x 0 x x 3 x x 6 x x 9 x x 12 x x 15 x x 18 x x 21 x x x x x 27 x
Worst case for the phase 2 code is still 948 cycles, I think. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1370 |
Ah, so you're always storing 3 bytes earlier than the previous branch target?
I like it, but I'm not sure if that works for the very first sprite inserted into that bucket, unless you have the branch starting out by landing on the sty .bucketK+1
..which would then cost you an extra four cycles per empty bucket. |
| |
Rastah Bar
Registered: Oct 2012 Posts: 336 |
Quoting ChristopherJamGot a new record now, I think.
We can pack 221 addresses into a 222 byte table by using the high byte of each address as the low byte of the next address.
Clever idea!
I am not familiar with this notation
for i in 0..73:
.byt 64+(i%74)*(1+i//74*24)%74)
What do i%74 and i//74 mean? |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1370 |
Thanks!
Mod and divide (TBH I should have dropped from // back to / when I switched from python to pseudo asm). Oh, and that loop should have been to 74*3
Really, better expressed as
for i in 0..73:
.byt 64+(i*1)%74
for i in 0..73:
.byt 64+(i*25)%74
for i in 0..73:
.byt 64+(i*49)%74
|
| |
Rastah Bar
Registered: Oct 2012 Posts: 336 |
Quote: Ah, so you're always storing 3 bytes earlier than the previous branch target?
I like it, but I'm not sure if that works for the very first sprite inserted into that bucket, unless you have the branch starting out by landing on the sty .bucketK+1
..which would then cost you an extra four cycles per empty bucket.
Yes, you are right. |
| |
Rastah Bar
Registered: Oct 2012 Posts: 336 |
Quoting ChristopherJam
The bucket emptying code remains the same as in Color Bar's original:
.bucketK ;32 bytes
bcc .nextBucket ;Branches always: decrease this value by 3 with selfmodifying code in phase 1 for every new sprite in the bucket.
!byt 0,0,0
lda #sprite_index ; repeat lda/pha eight times
pha
...
sty .bucketK+1
!byt 0,0,0 could be replaced by LDA #value + PHA, i.e., allow a maximum of 9 sprites per bucket, but I guess it doesn't really matter. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1370 |
The Human Code Machine, thanks for that codebase link. Must admit, I didn't find that area when I added the flagged bucket sort to the maths and algorithms section. Might need to add some crosslinks..
FWIW though, looks like that one has the same worst case as the one Oswald referenced in the opening remark to this thread. (32*31/2)=496 swaps, 26 cycles per swap, 496*26/63=204 raster lines. |
| |
Frantic
Registered: Mar 2003 Posts: 1627 |
I added some crosslinks @ codebase to make people aware about this partial overlap in contents in the two different sections.
Preferably generic sorting algorithms should be placed in the maths and algorithms section, and sorting tailored specifically towards fast sorting of sprites/multiplexing could be kept on the sprites page, if someone feels like cleaning this up. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1370 |
Thanks Frantic, I'll move my post. |
Previous - 1 | ... | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | ... | 21 - Next |