You are not logged in -
nap
CSDb User Forums
Forums
>
C64 Coding
>
Sorting
2007-10-08
16:08
Oswald
Registered: Apr 2002
Posts: 5017
Sorting
are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :)
... 193 posts hidden. Click
here
to view all posts....
2007-10-13
08:14
ChristopherJam
Registered: Aug 2004
Posts: 1378
This one's has taken up residence in my brain..
Now I've gone back and read the thread properly instead of skimming the first three quarters before reading the rest, I've actually noticed the earlier points about a danmaku shooter being the aim well before the Ikaruga subthread, and that the whole idea was to minimise worst case in a very frame incoherent shooter - methinks my insertion sort commentary was a little out of place.
I did have some inspiration last night when I realised that for any displayable set of actors there will never be more than eight sprites in any 21 line bucket (else the first and last will be too close together to display), so then I went off and did some research on optimal sorting for small sets of numbers. I eventually found
this page
on Bose-Nelson sorting networks, got a swap macro down to 22 cycles ( ldx a
ldy b
lda actor_y,x
cmp actor_y,y
bge noswap
stx b
sty a
), decided that was too slow and started handcoding cases for smaller lists of sprites in a bucket (eg 70 cycles to take 3 zero page locations containing actor indices, sort them by actor_y
and add them to the actor_next list),
but even then, I lose about 26 cycles just to placing each actor onto the end of the zero page array of actors-in-that bucket, which only leaves me an average of 19 cycles per actor for the sorting and adding to the actor_next list if I'm to beat the routine above.
It looks like @doynax' two pass radix sort still wins - I'm impressed!
2007-10-13
09:00
doynax
Account closed
Registered: Oct 2004
Posts: 212
ChristopherJam: In practice that radix-sort ended up needing 34 scanlines, for 32-sprites, rather than 28. It just wasn't practical to keep keep the set of sprites compacted like that. Not when combined with certain other features anyway.
So maybe a Bose-Nelson sorting network, whatever that is, could still be competitive?
2007-10-14
13:29
ChristopherJam
Registered: Aug 2004
Posts: 1378
hmm - so I'd need to manage less than (34*63)/32=67 cycles per actor.
I've still got some optimising to do, but at the moment the worst case cost per actor ranges from 46 to 87 cycles depending on the number of other actors are in the same bucket (1 to 7 others) plus a small overhead for each of the 10 buckets.
There are a lot of unnecessary loads and stores in the more expensive cases though, so I'll see how I go over this week.
A sorting network is basically a list of optional exchanges of predetermined pairs, that after they are complete guarantee the list is sorted. It avoids computing array indices and a lot of the branching - eg
#define SWAP(a,b) .(:ldx zpt+a:ldy zpt+b:lda act_y,x:cmp act_y,y:bcs noswap:stx zpt+b:sty zpt+a:noswap:.)
SWAP(0,1)
SWAP(2,3)
SWAP(0,2)
SWAP(1,3)
SWAP(1,2)
will ensure that four actor indices stored in zpt..zpt+3 will be sorted by y position. For an 8 element list, you only need 19 swaps.
2007-10-14
17:31
Oswald
Registered: Apr 2002
Posts: 5017
lda (),y
cmp (),y
bcc
is faster when there's no need to swap than
ldx pointer
ldy pointer
lda spry,x
cmp spry,y
2007-10-14
18:04
ChristopherJam
Registered: Aug 2004
Posts: 1378
Presumably you are proposing I keep y=0 and interleave my actor indices with the high byte of the actor_y table?
Good thinking, except I am attempting to optimise the worst case, so I do not wish to sacrifice that for a faster average case.
2007-10-14
18:47
Oswald
Registered: Apr 2002
Posts: 5017
it should be noted tho, that worst case sorting never happens with progressive sort, especially if you optimize for only adding/removing 1 sprite to the sort per frame. I'm also realizing that the zp method presented above is not good, as the rare case with insertion sort (my current approach) is when there's no need to swap.
2007-10-14
19:29
JackAsser
Registered: Jun 2002
Posts: 1989
Quote:
hmm - so I'd need to manage less than (34*63)/32=67 cycles per actor.
I've still got some optimising to do, but at the moment the worst case cost per actor ranges from 46 to 87 cycles depending on the number of other actors are in the same bucket (1 to 7 others) plus a small overhead for each of the 10 buckets.
There are a lot of unnecessary loads and stores in the more expensive cases though, so I'll see how I go over this week.
A sorting network is basically a list of optional exchanges of predetermined pairs, that after they are complete guarantee the list is sorted. It avoids computing array indices and a lot of the branching - eg
#define SWAP(a,b) .(:ldx zpt+a:ldy zpt+b:lda act_y,x:cmp act_y,y:bcs noswap:stx zpt+b:sty zpt+a:noswap:.)
SWAP(0,1)
SWAP(2,3)
SWAP(0,2)
SWAP(1,3)
SWAP(1,2)
will ensure that four actor indices stored in zpt..zpt+3 will be sorted by y position. For an 8 element list, you only need 19 swaps.
Hmmm... I was sure sorting networks were O(n log n) but apparantly Ajtai, Komlós and Szemerédi have O(log n) sorting network. I.e. the maximum depth (or comparators executed) is log n. It feels highly counter intuitive imo but I won't question it. Anyways, point is that while radix-sort is O(n*m) (where n is # of input and m is # of digits), an optimal sorting network is O(log n).
Now, this O() notation always hides huges constants for a c64 but for a f.e. 48 actor sorter it should require at most 48*log2(48) comparators statically in memory and at most log2(48) of those are executed at any given input. => NICE!
2007-10-15
00:01
ChristopherJam
Registered: Aug 2004
Posts: 1378
Ajtai, Komlós and Szemerédi's network is O(log N) depth, but still has O(N log N) comparitors - their timings are for parallel hardware, where you can swap elements 0 and 1 at the same time as 2 and 3, but swapping 0 with 2 would have to wait until the next cycle. The sample network I gave above they would categorise has having depth 3.
I was looking at something that on serial hardware like our humble c64 would be O(N log k), where k is the maximum number of sprites in any given bucket, which for buckets less than 22 lines high will be 8.
Oswald: I know the stuff I'm playing with this week will not give optimal results for a relatively stable arrangement - it's why I usually use insertion sort on a linked list of actors. I believe Doynax is attempting to avoid the huge spikes of CPU time a progressive sort would take whenever there is a massive change in order, like when two multi-sprite bosses are crossing vertically while you shoot at them with vertically fast moving sprite bullets ;)
2007-10-15
06:45
Oswald
Registered: Apr 2002
Posts: 5017
actually I have just arrived to what you're talking about: insertion sort on a linked list of actors :) as one of the drawbacks of insertion sort is the need to move many elements, but with a linked list you can insert the actor into the right position without shuffling bytes around.
ldx actorlink,x
cmp actory,x
bcc nextelement
- insert actor into chain & return to the outer unrolled loop-
nextelement
ldy actorlink,x
cmp actory,y
bcc nextelement
only 11 cycles per comparison. you even have the prev actor's link in one of the registers, so it can be a one way chain easily. guess you already do it like this ;)
2007-10-15
08:53
ChristopherJam
Registered: Aug 2004
Posts: 1378
Exactly :)
I traverse a singly linked actor_next list in the end-of-screen interrupt running their movement scripts, and after each has been moved (unless it has just died) I insert it into a singly linked actor_prev list.
Then once all the actors have been processed, I reverse the list.
ldx actor_prev,y
tay
sta actor_next,x
beq done
...
Previous
-
1
|
2
|
3
|
4
|
5
|
6
| 7 |
8
|
9
|
10
|
11
|
12
| ... |
22
-
Next
Refresh
Subscribe to this thread:
You need to be logged in to post in the forum.
Search the forum:
Search
All forums
C64 Coding
C64 Composing
C64 Pixeling
C64 Productions
CSDb Bug Reports
CSDb Discussions
CSDb Entries
CSDb Feedback
CSDb Info
CSDb moderators
CSDb Questions
CSDb V2 development
Messages to moderators
Requests
for
in
Writer & text
Text
Writer
All times are CET.
Search CSDb
All
Releases
Groups
Sceners
Events
BBS
SIDs
-------
Forum
Comments
Advanced
Users Online
AlexC
Airwolf/F4CG
CreaMD/React
d0c
rexbeng
Guests online: 86
Top Demos
1
Next Level
(9.8)
2
Mojo
(9.7)
3
Coma Light 13
(9.7)
4
Edge of Disgrace
(9.6)
5
Comaland 100%
(9.6)
6
No Bounds
(9.6)
7
Uncensored
(9.6)
8
Wonderland XIV
(9.6)
9
Bromance
(9.6)
10
Memento Mori
(9.6)
Top onefile Demos
1
It's More Fun to Com..
(9.7)
2
Party Elk 2
(9.7)
3
Cubic Dream
(9.6)
4
Copper Booze
(9.5)
5
TRSAC, Gabber & Pebe..
(9.5)
6
Rainbow Connection
(9.5)
7
Onscreen 5k
(9.5)
8
Wafer Demo
(9.5)
9
Dawnfall V1.1
(9.5)
10
Quadrants
(9.5)
Top Groups
1
Oxyron
(9.3)
2
Nostalgia
(9.3)
3
Booze Design
(9.3)
4
Censor Design
(9.3)
5
Crest
(9.3)
Top Graphicians
1
Sulevi
(10)
2
Mirage
(9.8)
3
Lobo
(9.7)
4
Mikael
(9.7)
5
Archmage
(9.7)
Home
-
Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.074 sec.