Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
 Welcome to our latest new user Haplo ! (Registered 2020-01-18) You are not logged in 
CSDb User Forums

Forums > C64 Coding > Sorting
2007-10-08 16:08

Registered: Apr 2002
Posts: 4507

are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :)
... 161 posts hidden. Click here to view all posts....
2018-08-07 07:27

Registered: Aug 2004
Posts: 1001
Nice work independently discovering counting sort, Repose

It's worth noting that this one is O(n)+O(m), where m is the number of buckets.

Pass 2 for a "perfect" sort of n sprites would take 220 iterations (256 in your example code)

That's around 220*10 cycles if you unroll the loop a little, so about 35 lines of overhead. Of course, you can cut that down considerably if you don't need as much accuracy or range.

Clearing the counts array also takes time, though you can drop that back from O(m) to O(n) by keeping it seperate to your "fib" array, persisting it frame to frame, and only clearing the entries you incremented in the first place.
2019-12-10 21:40

Registered: Apr 2002
Posts: 4507
found out this one, posting if someone finds a flaw, somehow I am unsure if I'm overlooking something, because in speed its close to the pha pha method, which feels unbeatable.

phase1 bucketsort, 4bit per buckets:

;this code repeats * numofsprites

ldy yvalues+spriteindex
ldx table,y ;x = y / 16 *2

lda #spriteindex
sta (bucket,x) ;needs some prep on zp
inc bucket,x ;every 2nd zp points to a buckettab

phase2 sorting network:

ldy bucket+z ;19 such compare snippets per bucket
lda yvalues,y ;can sort max 8 sprite per bucket
ldx bucket+q ;z and q is precalculated
cmp yvalues,x
bcc + ; or bcs?

sty bucket+q
stx bucket+z ;26 cycles worst case

with $20 sprites phase1 is 768 cycles, 1976 cycles for phase 2 at worst case (highly unlikely) , overall ~43 rasterlines.

probably will use different sorting networks based on nr of sprites in bucket. that will reduce nr of compare/swap snippets per bucket.
2019-12-11 15:27

Registered: Apr 2002
Posts: 1333
Something i keep thinking briefly about whenever sprite sorting pops up is... somehow having hardware-assisted constant-time sorting.

Like, when you have buckets of 4 or 8 coordinates, which are quickly generated as seen in Oswald's post, how about sorting the coordinates within a bucket using, say, CIA timer interrupts or VIC collision detection? :)

For timer interrupts, one could set up 4 timers and then sample the interrupt flags, which would give results according to the values to be sorted.

Collision detection could also help for buckets of 8 coordinates, with some smart sprite-char collision scheme, albeit at the expense of some memory.

But... i never came far with these ideas, but then again i haven't really tried so far. :)

Edit: Of course, with 4 timer interrupts and a raster interrupt, one could sort 5 coordinates on the fly and have the multiplexer interrupt handler triggered in sorted order. But i guess this is out of scope for these discussions. =)
2019-12-11 16:07

Registered: Aug 2004
Posts: 1001
Nice one Oswald.

As is, it's a little slower than https://www.codebase64.org/doku.php?id=base:flagged_bucket_sort (38.5 rasters worst case) but you have got me thinking about sorting networks again.

Could perhaps dispatch to the sorting network routines by spacing the zero page pointers slightly further apart, so that one could jmp($xxxx) to one byte earlier, using the low byte of the storage address as the high byte of the sort routine address

Can also combine pairs of sorting network swap macros to avoid some register spilling, saving about five or six cycles off the pair

You've also reminded me that sometime the last few years I finally realised that sorting network based methods might be able to take advantage of seperating out the player sprite so that each bucket then only needs to deal with at most seven sprites per bucket, or six if there's a second player or a multiple in one player mode.

A six element sorting element is only 12 swaps, a good 15% less per sprite than the 19 for an eight element network.
2019-12-11 20:01

Registered: Apr 2002
Posts: 4507
thanks! in the meantime I have implemented it, and I am at 46 lines with an avg case, using adaptive sort network. (something off with my speed calcs earlier?)

the swap macro combine is nice I can shave off 8 cycles with that :)

to save memory I'm using another variant than posted:

ldy buckets+q,x
lda yvalues,y
ldy buckets+z,x
cmp yvalues,y
bcs skipswap

lda buckets+q,x
sty buckets+q,x
sta buckets+z,x
4 cycles slower but 1/16th mem usage. no need for different code for each bucket.

Gunnar I dont understand how you could do that, not even the timer method :P start the timers and poll the flag register to see which one runs out first, then 2nd etc ?

the player sprite, why would one handle it differently ? it just reduces the flexibility of displayable sprites?
2019-12-12 02:04

Registered: Aug 2004
Posts: 1001
Regarding the player sprite, if it’s free roaming then the multiplexer can only guarantee seven free sprites for antagonists on any given line. So, you get the same capabilities from {putting the player into the actor list and letting the multiplexer drive eight sprites} as you do from {putting the player into sprite zero directly and driving the remaining seven from the multiplexer}.

Same logic applies to a second player, or an orbiting multiple. And if the sorter only needs to deal with at most six sprites per bucket, then the worst case sorting network only has 12 swap macros.

Speaking of buckets, if you’re determining which bucket with a table lookup, you only need 11 of them, each being 21px high.
2019-12-12 02:07

Registered: Aug 2004
Posts: 1001
Reusable sort code sounds good btw. It’d be nice to have a clearer view of code size/speed trade offs.
2019-12-12 07:35

Registered: Apr 2002
Posts: 4507
btw one thing I could never grasp, how would sprite plexer work with bucket sort, ie not fully sorted but accurate enough ? order of irq doesnt matter for 8 sprites ? just use 1 irq for all 8 ?
2019-12-12 18:33

Registered: Dec 2004
Posts: 801
Quoting Oswald
btw one thing I could never grasp, how would sprite plexer work with bucket sort, ie not fully sorted but accurate enough ? order of irq doesnt matter for 8 sprites ? just use 1 irq for all 8 ?
I'd go for buckets of 4 sprites I think.
2019-12-13 14:52

Registered: Dec 2014
Posts: 42
I know I'm late to the thread and apologies if I repeat what others have said. I spent a long time optimizing the sort for AAII (probably too long, but it is a nostalgic hobby after all). Couple of comments on that basis

- I found using an insertion sort was faster than bubble; much faster and for several reasons
- When you reuse the sorting order from frame to frame you are almost never worst case and it is pretty quick to change the order of only a couple of (neighboring) elements
- You can help it by e.g. only introducing 1 new object per frame. So rather than e.g. 10 objects in a formation coming in at once, spread them over 10 frames. Helps in general w performance as well (not instantiating 10 new objects in one frame)
- You have to optimize equally for the implementation in 6502 machine code. Some algorithms are theoretically faster, but with only 3 registers you might end up doing a lot of temp storing that kills the theory. On modern processors that is of course different
- We are working with small numbers (relative to general sorting). The algorithm for thousands of elements would be different
- In terms of the player I check specially for that sprite to see that a physical one is free. If not I override the most previous free one. Better to have an enemy or bullet flicker than the player
- As Oswald hits on the IRQ's are equally important. They take up about as long as the sort (rough order magnitude). So as always there are tradeoffs

I'll dig out my performance stats when I get home from work and can also share code if that is helpful.
Previous - 1 | ... | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Users Online
Scan/House Designs
Flex/Artline Designs
Almighty God/Level 6..
Lavazza/Censor Design
Knut Clausen/SHAPE/F..
Guests online: 38
Top Demos
1 Uncensored  (9.7)
2 Unboxed  (9.7)
3 Coma Light 13  (9.7)
4 Edge of Disgrace  (9.7)
5 Comaland 100%  (9.6)
6 Lunatico  (9.6)
7 The Shores of Reflec..  (9.5)
8 Rivalry  (9.5)
9 X Marks the Spot  (9.5)
10 C=Bit 18  (9.5)
Top onefile Demos
1 Tribute to Ben - Las..  (9.8)
2 Gumbo Revised  (9.6)
3 Crystal Gazer  (9.6)
4 Smile to the Sky  (9.5)
5 Dawnfall V1.1  (9.5)
6 The Best Compopic Ar..  (9.5)
7 Daah, Those Acid Pil..  (9.5)
8 Instinct  (9.5)
9 Innervasion  (9.5)
10 Merry Krampus  (9.5)
Top Groups
1 Oxyron  (9.4)
2 PriorArt  (9.4)
3 Fossil  (9.4)
4 Booze Design  (9.4)
5 Censor Design  (9.4)
Top Mega Swappers
1 Aslive  (9.4)
2 Dishy  (9.2)
3 Nightshade  (9.2)
4 Calypso  (9.2)
5 R.C.S.  (9.1)

Home - Disclaimer
Copyright © No Name 2001-2020
Page generated in: 0.047 sec.