[CSDb] - User Forums

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Sorting

2007-10-08 16:08

Oswald

Registered: Apr 2002
Posts: 5126

Sorting

are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :)

... 193 posts hidden. Click here to view all posts....

2017-09-19 06:04

ChristopherJam

Registered: Aug 2004
Posts: 1423

Quoting Color Bar

I'm sorry, but I think you missed 2 cycles: PLA / STA *+4 / JMP (jumptable) takes 13 cycles. This means that the approach in post #103 is 2 cycles faster.

Damn, you're right. The SBX#$06 takes two cycles less than the LDY newBranchTableValue,x

FWIW, the 'more than one sprites on a line' case could be improved by replacing the SKW instructions with JMP *+3; it saves a cycle. And you're still using SBX at least :)

2017-09-19 06:07

ChristopherJam

Registered: Aug 2004
Posts: 1423

Quoting Trash

. The drawback is that every value you want to sort has to be reduced in accuracy in order for it to work

Ah, it's not a perfect sort of eight bit values? Comparing apples and oranges in that case..

2017-09-19 06:44

Axis/Oxyron
Account closed

Registered: Apr 2007
Posts: 91

Quote: So i examined the tickcounts of the code I've got, with heavy use of zeropage it can sort 32 elements in 1188 cycles, without zp it does the same job in 1448 cycles (unless my calculations are incorrect). The drawback is that every value you want to sort has to be reduced in accuracy in order for it to work (eg two lda, sta in two different tables)...'

*edit*

HCL actually drops a hint for this sorter in the commentsection of Classics

That ADC STA sequence mentioned in the comments of classics looks like 1 of the passes of a radix-sort. But in pure speed that wont be competitive to the ideas already posted here. Atleast as long as you dont reduce accuracy and really work with 200 different y-positions. My latest version has around 2400 cycles for the given use-case, if I remember right.

But the radix approach has some really nice properties compared to the bucket-sort mostly used here. It sorts in perfect 8 bit accuracy and sorts stable. That makes it more generally usable (e.g. for sorting 3D Bobs and polygons without any flickery). And in critical cases it can even avoid bugs in multiplexing. E.g. when big groups of sprites batch in small areas.

2017-09-19 08:47

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

Quote: Quoting Color Bar
I'm sorry, but I think you missed 2 cycles: PLA / STA *+4 / JMP (jumptable) takes 13 cycles. This means that the approach in post #103 is 2 cycles faster.

Damn, you're right. The SBX#$06 takes two cycles less than the LDY newBranchTableValue,x

FWIW, the 'more than one sprites on a line' case could be improved by replacing the SKW instructions with JMP *+3; it saves a cycle. And you're still using SBX at least :)

Yes, I was a bit too eager to use illegals. Nevertheless, what I don't like about post #103 is that it will crash when there are more than 10 sprites on a line.

So I lean to sticking to the combination of posts #91 and #101.

2017-09-19 08:59

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

Quoting Axis/Oxyron

That ADC STA sequence mentioned in the comments of classics looks like 1 of the passes of a radix-sort. But in pure speed that wont be competitive to the ideas already posted here. Atleast as long as you dont reduce accuracy and really work with 200 different y-positions. My latest version has around 2400 cycles for the given use-case, if I remember right.

But the radix approach has some really nice properties compared to the bucket-sort mostly used here. It sorts in perfect 8 bit accuracy and sorts stable. That makes it more generally usable (e.g. for sorting 3D Bobs and polygons without any flickery). And in critical cases it can even avoid bugs in multiplexing. E.g. when big groups of sprites batch in small areas.

The approach proposed here (the combination of posts #91 and #101 or #103) counts down the actor list, so when there are multiple actors with the same Y value, they are added to the bucket in reverse order, but in the second pass the order is restored. So I think this approach is stable as well.

If think the method also has full 8 bit accuracy. The numbers of cycles mentioned are for visible sprite positions and that means 220 buckets. This can be increased to 255 buckets.

2017-09-20 15:41

lft

Registered: Jul 2007
Posts: 369

Field Sort

2017-09-20 16:10

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

Nice!

I think this 32K of ram usage you refer to applies to combining posts #91 and #103. When combining #91 and #101 it can be considerably lower (but it will still be a lot).

Btw, how many sprites with the same Y value can your routine handle?

2017-09-20 16:46

ChristopherJam

Registered: Aug 2004
Posts: 1423

Yes, nice work! This will take some investigating to unravel.

SHX $FE00,Y

^ is basically stx $fe00,y on account of the mask being $fe+1, yes?

2017-09-20 17:39

lft

Registered: Jul 2007
Posts: 369

Color Bar, yes I realise that you went through a number of different versions, trading memory for cycles. But I thought it fair to mention the version that beats mine in running time.

There can be any number of sprites on the same line; They are linked. Of course, the multiplexer drops them if there are too many, as is evident from the flickering in the demo.

This is a generic full random sorting routine for 8-bit values, although in the demo it is limited to 220 different y-values to make it comparable with your benchmarks.

Christopher Jam: That is correct. It's nice to find a use for this opcode where it actually improves performance, because it's quite constrained: The masking drops off if the instruction is suspended by DMA, and page crossings cause the wrong value to be written. So the best bet is indeed to stick to the operand $fe00.

2017-09-20 17:42

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

Quote: Yes, nice work! This will take some investigating to unravel.

SHX $FE00,Y
^ is basically stx $fe00,y on account of the mask being $fe+1, yes?

Yes, check out this document (pages 38 and 41):
No More Secrets v0.91

Previous - 1 | ... | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | ... | 21 - Next

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

GI-Joe/MYD!
tlr
bugjam
TheRyk/MYD!
Brittle/Dentifrice^(?)
CreaMD/React
anonym/padua
iAN CooG/HVSC
grass/LETHARGY
Guests online: 286

Top Demos

1 Next Level  (9.7)
2 13:37  (9.7)
3 Codeboys & Endians  (9.7)
4 Mojo  (9.6)
5 Coma Light 13  (9.6)
6 Edge of Disgrace  (9.6)
7 Signal Carnival  (9.6)
8 Uncensored  (9.5)
9 Wonderland XIV  (9.5)
10 No Bounds  (9.5)

Top onefile Demos

1 Nine  (9.7)
2 Layers  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.5)
6 Scan and Spin  (9.5)
7 Onscreen 5k  (9.5)
8 Grey  (9.5)
9 Dawnfall V1.1  (9.5)
10 Rainbow Connection  (9.5)

Top Groups

1 Artline Designs  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Performers  (9.3)
5 Censor Design  (9.3)

Top Crackers

1 Mr. Z  (9.9)
2 OTD  (9.8)
3 Antitrack  (9.8)
4 Fungus  (9.8)
5 S!R  (9.8)

Page generated in: 0.051 sec.