| |
Oswald
Registered: Apr 2002 Posts: 5017 |
Sorting
are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :) |
|
... 193 posts hidden. Click here to view all posts.... |
| |
chatGPZ
Registered: Dec 2001 Posts: 11114 |
thats not really useful for a game - in a demo of course that is always preferred - but in a game it would be SO much better if you could move around a large amount of objects without having to care about multiplexer limitations in the game logic. (if you look at gameboy or NES games for example - thats how a lot of them work) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1378 |
or then there was Alien Syndrome, where you could never have more than one alien on any given raster 😂 |
| |
lft
Registered: Jul 2007 Posts: 369 |
Quoting TrashThe sorting algorithm implemented by HCL is actually called Counting Sort (I just found that out..) and it is considered to be O(n).
It should be competive to the discussed code both in size and speed.
Thanks for the link! That does indeed seem useful.
Here is a straighforward implementation:
for each actor i
ldx ypos+i
inc count,x
lda #0
clc
for each count k
sta start+k
adc count+k
for each actor i
ldx ypos+i
ldy start,x
lda #i
sta output,y
inc start,x
lda #0
sta count,x
Total number of cycles is 10*i + 4 + 8*k + 28*i, so for our benchmark case with 32 actors and 220 y-positions that would be 2980 cycles, or 47.3 rasterlines. So the above implementation is not competitive.
So I suppose the game is on to try to improve it! |
| |
Trash
Registered: Jan 2002 Posts: 122 |
Quoting lftSo I suppose the game is on to try to improve it!
Try losing precision on K by the right amount and in the right way and it will improve without losing to much accuracy.... |
| |
Fresh
Registered: Jan 2005 Posts: 101 |
A less "aggressive" approach to lft's solution can be obtained by using brk instead of jmp.
It's clearly a bit slower but the emptying routine is only one and be put everywhere: a bit less messy and more "game-friendly".
irq:
sty $ff+stack // Fix stack for RTI
lda buckethead,y
irqloop:
sta output,x
tay
inx
lda list,y
bpl irqloop
ldy $ff+stack
sta table,y
sta sectable,y
rti
|
| |
lft
Registered: Jul 2007 Posts: 369 |
Fresh, that is true, and you even get to place the INY field on any page you like. That's because the opcode is 0, so it doesn't matter that SHX randomly masks it with the destination address + 1.
I would suggest that you store-absolute the output, perhaps to the zero-page, and self-modify the instruction to increment the pointer. That'd free up X as a scratch register, and you wouldn't have to restore Y from the stack area.
But there are drawbacks with this approach. As you mention, it is a bit slower: 15 cycles per actor, after implementing my suggestion above, so that'd give us a total of 43 rasterlines instead of 35. And you cannot do the sorting in main context while interrupts are running, because you'd lose even more cycles distinguishing between BRK/IRQ. More subtly, if a BRK coincides with an interrupt, the interrupt is lost, although the peripheral will typically keep asserting the (maskable) interrupt until it is eventually serviced. |
| |
HCL
Registered: Feb 2003 Posts: 716 |
Oh.. i'm late into the discussion again :P
First, i didn't invent the kind of sorting that is used in TimeMachine (and many other Booze-demos). It seems that i am not very good in inventing things. Perhaps i'm better at looking at other's code, understanding it, and making benefit to my own needs. I think i stole it from some demo from the late-80:s..
The nice thing about it is that i wrote a code-generator for it. Perhaps it was for the 3d-Bob-part in TimeMachine.. When i change 3d-bob-object it generates sorting code for the number of bobs that i currently have. Neither that being rocket science, but it helped making up to 32 bobs update each frame :). |
| |
Fresh
Registered: Jan 2005 Posts: 101 |
@lft: perfect analysis, no surprise from you! :) |
| |
lft
Registered: Jul 2007 Posts: 369 |
@HCL: Could you comment on how to reduce the number of buckets without losing too much accuracy? Trash is being deliberately enigmatic for fear of breaking a confidence, so we could really use a hint. Or would you rather have us dig into the code and see for ourselves? |
| |
Trash
Registered: Jan 2002 Posts: 122 |
Quote: @HCL: Could you comment on how to reduce the number of buckets without losing too much accuracy? Trash is being deliberately enigmatic for fear of breaking a confidence, so we could really use a hint. Or would you rather have us dig into the code and see for ourselves?
I do dare to tell you now when I know he borrowed the solution...
This is for sprites (eight per row):
If you remove the last three bits by using lsr you have your new K-value BUT you have to pair it with the original value. So what you are doing is that you assign your actors a value remove lose accuracy (lsr, lsr, lsr or a LUT (K = max value of that table)) store that value in the table you actually sort.
I might be wrong but this works for me using almost exactly your example code... |
Previous - 1 | ... | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | ... | 21 - Next |