| |
Oswald
Registered: Apr 2002 Posts: 5032 |
Sorting
are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :) |
|
... 193 posts hidden. Click here to view all posts.... |
| |
Frantic
Registered: Mar 2003 Posts: 1635 |
I added some crosslinks @ codebase to make people aware about this partial overlap in contents in the two different sections.
Preferably generic sorting algorithms should be placed in the maths and algorithms section, and sorting tailored specifically towards fast sorting of sprites/multiplexing could be kept on the sprites page, if someone feels like cleaning this up. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1386 |
Thanks Frantic, I'll move my post. |
| |
Rastah Bar
Registered: Oct 2012 Posts: 336 |
Quoting ChristopherJam
;code to fill out bucketK. 24 bytes, 36 cycles
fillK
ldy .bucketK+1
lda newBranchValue,y
sta .bucketK+1 ; 12
tay
txa
sta .bucketK+3,y ; 9
dex
lda Yvalues,x
sta *+4
jmp ($xxxx) ; 15
; 36*32 = 1152 cycle, 36 per actor
I think another 2 cycles per actor can be gained by using A as the actor counter:
ldy .bucketK+1
ldx newBranchValue,y
stx .bucketK+1
sta .bucketK+3,x
sbc #$01
.start
tax
ldy Yvalues,x
sty *+4
jmp ($xxxx)
Initialize with SEC, LDA #$1f, JMP .start
Total time: 220*3+43*32 = 2036 cycles or 32.3 rasters! |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1386 |
Quoting Color Bar
I think another 2 cycles per actor can be gained by using A as the actor counter
Sweet! Yes, that should work. Should probably do a test implementation and upload this beast somewhere. Now if we could just shave off one more cycle per actor... |
| |
Rastah Bar
Registered: Oct 2012 Posts: 336 |
Perhaps not practical, but maybe interesting to discuss:
If you are sure the maximum number of actors in a bucket is never exceeded, 2 more cycles per actor (and 2 bytes) can be shaved off the bucket filling routines by using some BOCs (bonus opcodes = illegals)
lax .bucketK+1
sbx #$06
stx .bucketK+1
tya
sta .bucketK+3,x
dey
lda Yvalues,y
sta *+4
jmp ($xxxx)
where Y is now the actor counter. The bucket emptying code should then be modified according to
bcc .nextBucket
lda #sprite_index
pha
skw $abcd ;opcode $0c
lda #sprite_index ;Always 6 bytes for every sprite.
pha ;Therefore sbx #$06 can be used in the fill routines.
skw $abcd
...
lda #sprite_index
pha
sty .bucketK+1
If we allow up to 10 actors and patch the filling routines with 2 bytes to 64 bytes total, the page crossing penalties can be avoided. Although this algorithm uses a LOT of memory, it may be worth it in some demo effects if speed is the bottleneck. And besides, the absolute minimum possible CPU time is a worthy goal by itself.
As Christopher Jam suggested, we can have the 0 entry of the branch table point to a routine that checks whether we've reached the end of the actor list, or are just dealing with an actor that has gone offscreen.
I think we can then shave off 2 more bytes if the Y values are stored on the stack (terminated by 0) and LDA Yvalues,Y is replaced by PLA. |
| |
Trash
Registered: Jan 2002 Posts: 122 |
I dont think this is really exactly what you guys are discussing but I suggest you all to have a look at the sortingroutines used in Time Machine, I have been given an explanation but I don't really understand how it works. I have code that sorts 32 elements just shy of 1400 cycles (if I remember correctly) with a constant speed using the explanation I was given, since it's HCLs I wont share it without his explicit permission but dropping a hint on where to look must be forgiven. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1386 |
re. SKW etc: Ooh, dangerous! So I think we're up to around 10k for the PHA routines now?
Will have to see if that can be sanely(!) combined with my thinking from today:
I managed to trim two cycles off the populate routine by switching to just using even numbers for the actor IDs (can interleave most of the data tables), and using a section of the stack to store Y values interleaved with actor IDs.
ldx bucketK+1
ldy newBranchTableValue,x
sty bucketK+1
pla ; grab this actor's index
sta bucketK+3,y ;21
pla ; grab next Yvalue
sta *+4
jmp (jumptable) ;11
(I was going to read the actor ID out of one of the CIA timers while pulling Y value off the stack, but realised that the timer will be counting down while the stack address increases, so had to abandon that idea).
so, 220*3+41*32=1972 cycles, 61.6 per actor, 31.3 raster lines :) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1386 |
Quoting Trash...1400 cycles...
!!! |
| |
Rastah Bar
Registered: Oct 2012 Posts: 336 |
Quoting ChristopherJam
I managed to trim two cycles off the populate routine by switching to just using even numbers for the actor IDs (can interleave most of the data tables), and using a section of the stack to store Y values interleaved with actor IDs.
so, 220*3+41*32=1972 cycles, 61.6 per actor, 31.3 raster lines :)
Awesome! Less than 1 raster per actor :-)
I actually thought of that as well, but won't that cost you somewhere else? I mean you have to write updated Y values to the stack in that form and maybe that is more costly than writing them non-interleaved? But perhaps not when you use unrolled code ... |
| |
Rastah Bar
Registered: Oct 2012 Posts: 336 |
Quote: I dont think this is really exactly what you guys are discussing but I suggest you all to have a look at the sortingroutines used in Time Machine, I have been given an explanation but I don't really understand how it works. I have code that sorts 32 elements just shy of 1400 cycles (if I remember correctly) with a constant speed using the explanation I was given, since it's HCLs I wont share it without his explicit permission but dropping a hint on where to look must be forgiven.
Since it is Time Machine, it must be ahead of us :-)
Very likely I won't understand that code without any thorough explanation, but is it for the same range of Y values (number of buckets)? And can multiple actors have the same Y values?
Maybe HCL wants to join the discussion? |
Previous - 1 | ... | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | ... | 21 - Next |