| |
Oswald
Registered: Apr 2002 Posts: 5017 |
Sorting
are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :) |
|
... 193 posts hidden. Click here to view all posts.... |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1378 |
Thanks Frantic, I'll move my post. |
| |
Rastah Bar
Registered: Oct 2012 Posts: 336 |
Quoting ChristopherJam
;code to fill out bucketK. 24 bytes, 36 cycles
fillK
ldy .bucketK+1
lda newBranchValue,y
sta .bucketK+1 ; 12
tay
txa
sta .bucketK+3,y ; 9
dex
lda Yvalues,x
sta *+4
jmp ($xxxx) ; 15
; 36*32 = 1152 cycle, 36 per actor
I think another 2 cycles per actor can be gained by using A as the actor counter:
ldy .bucketK+1
ldx newBranchValue,y
stx .bucketK+1
sta .bucketK+3,x
sbc #$01
.start
tax
ldy Yvalues,x
sty *+4
jmp ($xxxx)
Initialize with SEC, LDA #$1f, JMP .start
Total time: 220*3+43*32 = 2036 cycles or 32.3 rasters! |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1378 |
Quoting Color Bar
I think another 2 cycles per actor can be gained by using A as the actor counter
Sweet! Yes, that should work. Should probably do a test implementation and upload this beast somewhere. Now if we could just shave off one more cycle per actor... |
| |
Rastah Bar
Registered: Oct 2012 Posts: 336 |
Perhaps not practical, but maybe interesting to discuss:
If you are sure the maximum number of actors in a bucket is never exceeded, 2 more cycles per actor (and 2 bytes) can be shaved off the bucket filling routines by using some BOCs (bonus opcodes = illegals)
lax .bucketK+1
sbx #$06
stx .bucketK+1
tya
sta .bucketK+3,x
dey
lda Yvalues,y
sta *+4
jmp ($xxxx)
where Y is now the actor counter. The bucket emptying code should then be modified according to
bcc .nextBucket
lda #sprite_index
pha
skw $abcd ;opcode $0c
lda #sprite_index ;Always 6 bytes for every sprite.
pha ;Therefore sbx #$06 can be used in the fill routines.
skw $abcd
...
lda #sprite_index
pha
sty .bucketK+1
If we allow up to 10 actors and patch the filling routines with 2 bytes to 64 bytes total, the page crossing penalties can be avoided. Although this algorithm uses a LOT of memory, it may be worth it in some demo effects if speed is the bottleneck. And besides, the absolute minimum possible CPU time is a worthy goal by itself.
As Christopher Jam suggested, we can have the 0 entry of the branch table point to a routine that checks whether we've reached the end of the actor list, or are just dealing with an actor that has gone offscreen.
I think we can then shave off 2 more bytes if the Y values are stored on the stack (terminated by 0) and LDA Yvalues,Y is replaced by PLA. |
| |
Trash
Registered: Jan 2002 Posts: 122 |
I dont think this is really exactly what you guys are discussing but I suggest you all to have a look at the sortingroutines used in Time Machine, I have been given an explanation but I don't really understand how it works. I have code that sorts 32 elements just shy of 1400 cycles (if I remember correctly) with a constant speed using the explanation I was given, since it's HCLs I wont share it without his explicit permission but dropping a hint on where to look must be forgiven. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1378 |
re. SKW etc: Ooh, dangerous! So I think we're up to around 10k for the PHA routines now?
Will have to see if that can be sanely(!) combined with my thinking from today:
I managed to trim two cycles off the populate routine by switching to just using even numbers for the actor IDs (can interleave most of the data tables), and using a section of the stack to store Y values interleaved with actor IDs.
ldx bucketK+1
ldy newBranchTableValue,x
sty bucketK+1
pla ; grab this actor's index
sta bucketK+3,y ;21
pla ; grab next Yvalue
sta *+4
jmp (jumptable) ;11
(I was going to read the actor ID out of one of the CIA timers while pulling Y value off the stack, but realised that the timer will be counting down while the stack address increases, so had to abandon that idea).
so, 220*3+41*32=1972 cycles, 61.6 per actor, 31.3 raster lines :) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1378 |
Quoting Trash...1400 cycles...
!!! |
| |
Rastah Bar
Registered: Oct 2012 Posts: 336 |
Quoting ChristopherJam
I managed to trim two cycles off the populate routine by switching to just using even numbers for the actor IDs (can interleave most of the data tables), and using a section of the stack to store Y values interleaved with actor IDs.
so, 220*3+41*32=1972 cycles, 61.6 per actor, 31.3 raster lines :)
Awesome! Less than 1 raster per actor :-)
I actually thought of that as well, but won't that cost you somewhere else? I mean you have to write updated Y values to the stack in that form and maybe that is more costly than writing them non-interleaved? But perhaps not when you use unrolled code ... |
| |
Rastah Bar
Registered: Oct 2012 Posts: 336 |
Quote: I dont think this is really exactly what you guys are discussing but I suggest you all to have a look at the sortingroutines used in Time Machine, I have been given an explanation but I don't really understand how it works. I have code that sorts 32 elements just shy of 1400 cycles (if I remember correctly) with a constant speed using the explanation I was given, since it's HCLs I wont share it without his explicit permission but dropping a hint on where to look must be forgiven.
Since it is Time Machine, it must be ahead of us :-)
Very likely I won't understand that code without any thorough explanation, but is it for the same range of Y values (number of buckets)? And can multiple actors have the same Y values?
Maybe HCL wants to join the discussion? |
| |
Rastah Bar
Registered: Oct 2012 Posts: 336 |
Quoting ChristopherJam
(I was going to read the actor ID out of one of the CIA timers while pulling Y value off the stack, but realised that the timer will be counting down while the stack address increases, so had to abandon that idea).
I remember this post:
SID to calculate line slope
Maybe that could work? |
Previous - 1 | ... | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | ... | 21 - Next |