| |
Oswald
Registered: Apr 2002 Posts: 5086 |
Sorting
are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :) |
|
... 193 posts hidden. Click here to view all posts.... |
| |
Bezerk Account closed
Registered: Mar 2020 Posts: 5 |
You should Google "radix sort" for a thorough explanation but I can try to explain how the posted code works.
It's a two pass sort.
In the first (LSD) pass you assign each actor to a bucket based on the least significant digit (LSD) of the actor's y-position ($0-$f). Each bucket has a linked list of the actors whose y-position LSD is the same as the bucket index. For example, LSD bucket number $7 has a linked list of actors with y-position LSD = $7 (i.e. $07, $17, ..., $d7). The order in which the actors are processed doesn't matter in this pass.
In the second (MSD) pass you assign each actor to a bucket based on the most significant digit (MSD) of the y-position ($0-$d). In this pass the actors are processed in LSD bucket order (from $0 to $f) and each actor is prepended to the MSD bucket list. The end result is that each MSD bucket list contains actors whose y-position MSD is the same as the bucket index and sorted based on LSD. For example, MSD bucket number $d will have a linked list of actors with y-position MSD = $d (e.g. $da, $d3, $d0). Again, the important point is that each MSD bucket list of actors is sorted in descending y-position order because the actors were added in LSD bucket order.
The final sorted order is then output to the stack by outputting the MSD bucket lists from index $d to $0.
Hopefully this helps enough to let you decipher the code. |
| |
Frantic
Registered: Mar 2003 Posts: 1646 |
@bezerk: Don't hesitate to put the radix sort implementation on codebase64 :) |
| |
Krill
Registered: Apr 2002 Posts: 2968 |
Finally, the C-64 scene delivers an answer to the age-old question of which sorting algorithm actually is best. |
| |
Oswald
Registered: Apr 2002 Posts: 5086 |
thanks, now I get it :) |
| |
lft
Registered: Jul 2007 Posts: 369 |
I was able to improve the radix sort.
Writeup at codebase64.
This version can sort 32 actors in 1970 cycles. That's 61.6 cycles per actor.
Quoting ChristopherJamExtraordinarily close to one raster line per actor! Boom! |
| |
Frantic
Registered: Mar 2003 Posts: 1646 |
@lft: Much appreciated! Thank you! :) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1408 |
Excellent work, both of you!
I've been meaning to write up a comprehensive comparison of all the approaches mentioned in this post since sometime in 2017, but as that's clearly not going to happen soon, here's a quick table comparing several of the 'sub 2500 cycle worst case' implementations; 32 actors in each instance.
+---------------+------------------+---------+--------+-----------+------+
| author | algorithm | worst | code+ | zeropage | year |
| | | cycles | data | | |
+---------------+------------------+---------+--------+-----------+------+
| cjam | flagged buckets | 2422 | ~3kb | 59b | 2017 |
| lft | field sort | 2200 | sparse | 64b | 2017 |
| colorbar+cjam | inline buckets | 2086 | 19kb | 0b | 2017 |
| bezerk | radix | 2044 | ~2kb | 94/185b | 2020 |
| bezerk+lft | radix | 1970 | ~2kb | 32b | 2020 |
+---------------+------------------+---------+--------+-----------+------+
(cycle counts do not include jsr/rts, or decanting results from index lists in stack)
Bezerk had already just scraped in below my and Colorbar's approach, with a relatively miniscule memory footprint - and now lft has improved that to hit the holy grail of sub-raster performance in the 32 actor category :D |
| |
Bezerk Account closed
Registered: Mar 2020 Posts: 5 |
That's brilliant @Lft, well done!
@ChristopherJam, no need to put me + Lft on this one in the comparison table. |
| |
Rastah Bar Account closed
Registered: Oct 2012 Posts: 336 |
Lovely! Question: if one would sort according to high nybble first, what would be the complexity?
And could someone post an equation for the the latest approach that shows the number of cycles as a function of Y-range and number of actors, please? |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1408 |
I should have included an e&oe really :)
ok, next draft just lft for final radix entry. Any comments on fieldsort memory usage will be integrated too, I just didn't manage to look that up before dinner.
Rastah Bar - Going by lft's writeup at codebase, it's an extra 51 cycles per actor regardless of order.
Flagged bucket sort's a little messier as it depends how many are on the same line or in the same eight line group.
graphs one of these days.. |
Previous - 1 | ... | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 - Next |