| |
Oswald
Registered: Apr 2002 Posts: 5034 |
Sorting
are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :) |
|
... 193 posts hidden. Click here to view all posts.... |
| |
CyberBrain Administrator
Posts: 392 |
Cool routine!! Thanx for sharing! Analyzing how it works was pretty illuminating (aka. fun) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1390 |
Nice work, Bezerk. Good to finally have a comparable radix sort implementation, and a well optimised one at that. Extraordinarily close to one raster line per actor!
The flagged bucket sort I put up at https://www.codebase64.org/doku.php?id=base:flagged_bucket_sort takes around 2423 cycles - significantly slower than field sort or radix.
Somewhat less onerous memory requirements at least; no magic pages, and the zero page use is contiguous :) |
| |
Oswald
Registered: Apr 2002 Posts: 5034 |
Bezerk, can you explain it in detail, frankly I dont understand anything of it :) |
| |
Bezerk Account closed
Registered: Mar 2020 Posts: 5 |
You should Google "radix sort" for a thorough explanation but I can try to explain how the posted code works.
It's a two pass sort.
In the first (LSD) pass you assign each actor to a bucket based on the least significant digit (LSD) of the actor's y-position ($0-$f). Each bucket has a linked list of the actors whose y-position LSD is the same as the bucket index. For example, LSD bucket number $7 has a linked list of actors with y-position LSD = $7 (i.e. $07, $17, ..., $d7). The order in which the actors are processed doesn't matter in this pass.
In the second (MSD) pass you assign each actor to a bucket based on the most significant digit (MSD) of the y-position ($0-$d). In this pass the actors are processed in LSD bucket order (from $0 to $f) and each actor is prepended to the MSD bucket list. The end result is that each MSD bucket list contains actors whose y-position MSD is the same as the bucket index and sorted based on LSD. For example, MSD bucket number $d will have a linked list of actors with y-position MSD = $d (e.g. $da, $d3, $d0). Again, the important point is that each MSD bucket list of actors is sorted in descending y-position order because the actors were added in LSD bucket order.
The final sorted order is then output to the stack by outputting the MSD bucket lists from index $d to $0.
Hopefully this helps enough to let you decipher the code. |
| |
Frantic
Registered: Mar 2003 Posts: 1635 |
@bezerk: Don't hesitate to put the radix sort implementation on codebase64 :) |
| |
Krill
Registered: Apr 2002 Posts: 2870 |
Finally, the C-64 scene delivers an answer to the age-old question of which sorting algorithm actually is best. |
| |
Oswald
Registered: Apr 2002 Posts: 5034 |
thanks, now I get it :) |
| |
lft
Registered: Jul 2007 Posts: 369 |
I was able to improve the radix sort.
Writeup at codebase64.
This version can sort 32 actors in 1970 cycles. That's 61.6 cycles per actor.
Quoting ChristopherJamExtraordinarily close to one raster line per actor! Boom! |
| |
Frantic
Registered: Mar 2003 Posts: 1635 |
@lft: Much appreciated! Thank you! :) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1390 |
Excellent work, both of you!
I've been meaning to write up a comprehensive comparison of all the approaches mentioned in this post since sometime in 2017, but as that's clearly not going to happen soon, here's a quick table comparing several of the 'sub 2500 cycle worst case' implementations; 32 actors in each instance.
+---------------+------------------+---------+--------+-----------+------+
| author | algorithm | worst | code+ | zeropage | year |
| | | cycles | data | | |
+---------------+------------------+---------+--------+-----------+------+
| cjam | flagged buckets | 2422 | ~3kb | 59b | 2017 |
| lft | field sort | 2200 | sparse | 64b | 2017 |
| colorbar+cjam | inline buckets | 2086 | 19kb | 0b | 2017 |
| bezerk | radix | 2044 | ~2kb | 94/185b | 2020 |
| bezerk+lft | radix | 1970 | ~2kb | 32b | 2020 |
+---------------+------------------+---------+--------+-----------+------+
(cycle counts do not include jsr/rts, or decanting results from index lists in stack)
Bezerk had already just scraped in below my and Colorbar's approach, with a relatively miniscule memory footprint - and now lft has improved that to hit the holy grail of sub-raster performance in the 32 actor category :D |
Previous - 1 | ... | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 - Next |