| |
Oswald
Registered: Apr 2002 Posts: 5086 |
Sorting
are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :) |
|
... 193 posts hidden. Click here to view all posts.... |
| |
lft
Registered: Jul 2007 Posts: 369 |
Fresh, that is true, and you even get to place the INY field on any page you like. That's because the opcode is 0, so it doesn't matter that SHX randomly masks it with the destination address + 1.
I would suggest that you store-absolute the output, perhaps to the zero-page, and self-modify the instruction to increment the pointer. That'd free up X as a scratch register, and you wouldn't have to restore Y from the stack area.
But there are drawbacks with this approach. As you mention, it is a bit slower: 15 cycles per actor, after implementing my suggestion above, so that'd give us a total of 43 rasterlines instead of 35. And you cannot do the sorting in main context while interrupts are running, because you'd lose even more cycles distinguishing between BRK/IRQ. More subtly, if a BRK coincides with an interrupt, the interrupt is lost, although the peripheral will typically keep asserting the (maskable) interrupt until it is eventually serviced. |
| |
HCL
Registered: Feb 2003 Posts: 727 |
Oh.. i'm late into the discussion again :P
First, i didn't invent the kind of sorting that is used in TimeMachine (and many other Booze-demos). It seems that i am not very good in inventing things. Perhaps i'm better at looking at other's code, understanding it, and making benefit to my own needs. I think i stole it from some demo from the late-80:s..
The nice thing about it is that i wrote a code-generator for it. Perhaps it was for the 3d-Bob-part in TimeMachine.. When i change 3d-bob-object it generates sorting code for the number of bobs that i currently have. Neither that being rocket science, but it helped making up to 32 bobs update each frame :). |
| |
Fresh
Registered: Jan 2005 Posts: 101 |
@lft: perfect analysis, no surprise from you! :) |
| |
lft
Registered: Jul 2007 Posts: 369 |
@HCL: Could you comment on how to reduce the number of buckets without losing too much accuracy? Trash is being deliberately enigmatic for fear of breaking a confidence, so we could really use a hint. Or would you rather have us dig into the code and see for ourselves? |
| |
Trash
Registered: Jan 2002 Posts: 122 |
Quote: @HCL: Could you comment on how to reduce the number of buckets without losing too much accuracy? Trash is being deliberately enigmatic for fear of breaking a confidence, so we could really use a hint. Or would you rather have us dig into the code and see for ourselves?
I do dare to tell you now when I know he borrowed the solution...
This is for sprites (eight per row):
If you remove the last three bits by using lsr you have your new K-value BUT you have to pair it with the original value. So what you are doing is that you assign your actors a value remove lose accuracy (lsr, lsr, lsr or a LUT (K = max value of that table)) store that value in the table you actually sort.
I might be wrong but this works for me using almost exactly your example code... |
| |
Copyfault
Registered: Dec 2001 Posts: 474 |
This Countingsort got me thinking...
So I had a closer look at lft's straight forward implementation given before and remembered
Quoting TrashTry losing precision on K by the right amount and in the right way and it will improve without losing to much accuracy....
What about doing it as follows:
ldx #offset
for SPRITE = 0 to n-1
dcp (ypos(SPRITE),x) ;operand byte is also memory for ypos
next
lda #$ff
ldx #$00
for LINE = 0 to 2*(floor((k/2)-2)) STEP 2
stx start+LINE
sbx #$00 ;operand byte is memory for the count-array
next
stx start+2*(floor((k/2)-1))
for SPRITE = 0 to n-1
ldx ypos(SPRITE)
ldy start,x
lda #SPRITE
sta output,y
inc start,x
lda #0
sta (offset,x)
next
Remarks:
- all loops have to be unrolled
- instead of incrementing the count-array entries for every occurance of the corresponding LINE-value they are decreased, giving [$00-#occurances of LINE] in each array position count[LINE]. But as the adc-instruction in the 2nd loop is also substituted by an sbx-command, this leads to the same result without having to care about the state of carry.
- 2nd byte of each dec-cmp-instruction suits also as memory where the ypos-val for the corresponding sprite-number is stored, i.e.
...
Sprite0_ypos = *+1
dcp (ypos(Sprite0),x)
Sprite1_ypos = *+1
dcp (ypos(Sprite1),x)
...
- only evenly spaced values for the ypositions admitted, i.e. consecutive values must have distance=2
- each ypos corresponds to a pointer in zp pointing to the corresponding operand byte of the sbx#$..-instructions. The 2nd loop would look like
lda #$ff
ldx #$00
stx start+0
count_ypos0 = *+1
sbx #$00
stx start+2
count_ypos2 = *+1
sbx #$00
stx start+4
...
with the corresponding zp partypos0 .byte <count_ypos0, >count_ypos0
ypos2 .byte <count_ypos2, >count_ypos2
...
- "offset" is introduced to be able to keep away from the $00/$01-adresses
Beside the ram usage for the start- and output-table the given approach needs a fair amount of zp-memory (one vector per possible sprite-ypos!). This most probably makes it not very usable for real in-game use; what's more: due to the fact that a vector pointer always consists of two consecutive bytes only even-numbered y-positions are handled (alternatively only odd-numbered, just one type...). This could be seen as "loosing precision on the ypos-values";)
On the positive side, counting cylces the approach gets 2 + n*8 + 4 + ((k/2)-1)*6 + 4 + n*29, so for n=32 and k=220 the routine needs 1848 cycles in total -> 29+1/3 rasterlines.
RAM-usage is at $2 + n*$2 + $4 + ((k/2)-1)*$5 + $3 + n*$12 = $4aa
(plus 220 bytes in zp plus 110 bytes for the "starting" table plus 32 bytes for "output")
Mind that it would be unfair to compare this to the cycle count lft gave for the direct approach as the count-array has been halved. But a notable gain in cycle consumption comes from cutting the inner part of the 2nd loop down to 6 cycles which would also make a full loop over 220 array entries considerably faster. Maybe there's a clever way to handle full precision on the ypos-values by appropriately pre-conditioning the ypos-values plus doing a sensible correction loop afterwards... remarks and additions highly welcome;) |
| |
Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
Quote: The sorting algorithm implemented by HCL is actually called Counting Sort (I just found that out..) and it is considered to be O(n).
It should be competive to the discussed code both in size and speed.
Thanks! I found a short article about Counting Sort in C=hacking Issue #18:
http://www.ffd2.com/fridge/chacking/c=hacking18.txt |
| |
Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
Copyfault, that is a very interesting approach. But to halve the Y position resolution an AND #$fe has to be applied, which costs an additional 2 cycles per actor. And if you want to preserve the full resolution values as well (for use in the Y updating routine for example), you will have to store them somewhere, adding more to the cycle count. |
| |
Copyfault
Registered: Dec 2001 Posts: 474 |
Quote: Copyfault, that is a very interesting approach. But to halve the Y position resolution an AND #$fe has to be applied, which costs an additional 2 cycles per actor. And if you want to preserve the full resolution values as well (for use in the Y updating routine for example), you will have to store them somewhere, adding more to the cycle count.
Ofcourse this works only if the game concept allows for all objects moving with suitable speed (all values hsve to be a multiple of two).
Otherwise you're right, all the necessary additional position handling will most probably eat up too much cycles. What's the current best case cycle count?? |
| |
Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
See posts #91 and #101. It is about 3*k+43*n.
For lft's Field Sort (post #126) it is 2*k+55*n. |
Previous - 1 | ... | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 - Next |