Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > Sorting
2007-10-08 16:08
Oswald

Registered: Apr 2002
Posts: 5007
Sorting

are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :)
 
... 193 posts hidden. Click here to view all posts....
 
2017-09-18 16:16
ChristopherJam

Registered: Aug 2004
Posts: 1359
Ok, a few random thoughts.


I see now that the routine in comment #103 takes the same number of cycles as that in #105; it just fetches an actor ID with TYA:DEY instead of PLA, and fetches the next Y coordinate with lda YV,Y instead of PLA.

Even if the Y coordinates are on the stack, you still need the DEY to update the actor index.

I can't see a way to combine them, so suspect I'd lean towards my routine (same speed, 5k less unrolled code, 64 bytes more stack usage, faster performance when many actors are on the one line).

Rather than copying the Y coordinates to the stack, I'd make that their home, and also only write the actor IDs once at startup. You would of course have to be very careful to ensure no interrupts while running that section of code, NMI included


Quoting Color Bar

SID to calculate line slope
Maybe that could work?


Oh god. Quite possibly, but I'd rather avoid hamstringing music/effects.

lft; sadly no cycles left for masking :)

*downloads Time Machine to examine later*
2017-09-18 16:54
Rastah Bar

Registered: Oct 2012
Posts: 336
Quoting ChristopherJam
Ok, a few random thoughts.


I see now that the routine in comment #103 takes the same number of cycles as that in #105; it just fetches an actor ID with TYA:DEY instead of PLA, and fetches the next Y coordinate with lda YV,Y instead of PLA.

Even if the Y coordinates are on the stack, you still need the DEY to update the actor index.

I can't see a way to combine them, so suspect I'd lean towards my routine (same speed, 5k less unrolled code, 64 bytes more stack usage, faster performance when many actors are on the one line).

Rather than copying the Y coordinates to the stack, I'd make that their home, and also only write the actor IDs once at startup. You would of course have to be very careful to ensure no interrupts while running that section of code, NMI included


Yes, your approach has some advantages, including not having to assume that the maximum number of actors per bucket is never exceeded. Unfortunately it does not use any illegals though :-(
2017-09-18 18:57
Trash

Registered: Jan 2002
Posts: 122
So i examined the tickcounts of the code I've got, with heavy use of zeropage it can sort 32 elements in 1188 cycles, without zp it does the same job in 1448 cycles (unless my calculations are incorrect). The drawback is that every value you want to sort has to be reduced in accuracy in order for it to work (eg two lda, sta in two different tables)...'

*edit*

HCL actually drops a hint for this sorter in the commentsection of Classics
2017-09-18 21:22
Rastah Bar

Registered: Oct 2012
Posts: 336
Quote: re. SKW etc: Ooh, dangerous! So I think we're up to around 10k for the PHA routines now?

Will have to see if that can be sanely(!) combined with my thinking from today:

I managed to trim two cycles off the populate routine by switching to just using even numbers for the actor IDs (can interleave most of the data tables), and using a section of the stack to store Y values interleaved with actor IDs.
    ldx bucketK+1
    ldy newBranchTableValue,x
    sty bucketK+1
    pla ; grab this actor's index
    sta bucketK+3,y       ;21

    pla ; grab next Yvalue
    sta *+4
    jmp (jumptable)       ;11

(I was going to read the actor ID out of one of the CIA timers while pulling Y value off the stack, but realised that the timer will be counting down while the stack address increases, so had to abandon that idea).

so, 220*3+41*32=1972 cycles, 61.6 per actor, 31.3 raster lines :)


I'm sorry, but I think you missed 2 cycles: PLA / STA *+4 / JMP (jumptable) takes 13 cycles. This means that the approach in post #103 is 2 cycles faster.
2017-09-19 06:04
ChristopherJam

Registered: Aug 2004
Posts: 1359
Quoting Color Bar
I'm sorry, but I think you missed 2 cycles: PLA / STA *+4 / JMP (jumptable) takes 13 cycles. This means that the approach in post #103 is 2 cycles faster.


Damn, you're right. The SBX#$06 takes two cycles less than the LDY newBranchTableValue,x

FWIW, the 'more than one sprites on a line' case could be improved by replacing the SKW instructions with JMP *+3; it saves a cycle. And you're still using SBX at least :)
2017-09-19 06:07
ChristopherJam

Registered: Aug 2004
Posts: 1359
Quoting Trash
. The drawback is that every value you want to sort has to be reduced in accuracy in order for it to work


Ah, it's not a perfect sort of eight bit values? Comparing apples and oranges in that case..
2017-09-19 06:44
Axis/Oxyron

Registered: Apr 2007
Posts: 91
Quote: So i examined the tickcounts of the code I've got, with heavy use of zeropage it can sort 32 elements in 1188 cycles, without zp it does the same job in 1448 cycles (unless my calculations are incorrect). The drawback is that every value you want to sort has to be reduced in accuracy in order for it to work (eg two lda, sta in two different tables)...'

*edit*

HCL actually drops a hint for this sorter in the commentsection of Classics


That ADC STA sequence mentioned in the comments of classics looks like 1 of the passes of a radix-sort. But in pure speed that wont be competitive to the ideas already posted here. Atleast as long as you dont reduce accuracy and really work with 200 different y-positions. My latest version has around 2400 cycles for the given use-case, if I remember right.

But the radix approach has some really nice properties compared to the bucket-sort mostly used here. It sorts in perfect 8 bit accuracy and sorts stable. That makes it more generally usable (e.g. for sorting 3D Bobs and polygons without any flickery). And in critical cases it can even avoid bugs in multiplexing. E.g. when big groups of sprites batch in small areas.
2017-09-19 08:47
Rastah Bar

Registered: Oct 2012
Posts: 336
Quote: Quoting Color Bar
I'm sorry, but I think you missed 2 cycles: PLA / STA *+4 / JMP (jumptable) takes 13 cycles. This means that the approach in post #103 is 2 cycles faster.


Damn, you're right. The SBX#$06 takes two cycles less than the LDY newBranchTableValue,x

FWIW, the 'more than one sprites on a line' case could be improved by replacing the SKW instructions with JMP *+3; it saves a cycle. And you're still using SBX at least :)


Yes, I was a bit too eager to use illegals. Nevertheless, what I don't like about post #103 is that it will crash when there are more than 10 sprites on a line.

So I lean to sticking to the combination of posts #91 and #101.
2017-09-19 08:59
Rastah Bar

Registered: Oct 2012
Posts: 336
Quoting Axis/Oxyron
That ADC STA sequence mentioned in the comments of classics looks like 1 of the passes of a radix-sort. But in pure speed that wont be competitive to the ideas already posted here. Atleast as long as you dont reduce accuracy and really work with 200 different y-positions. My latest version has around 2400 cycles for the given use-case, if I remember right.

But the radix approach has some really nice properties compared to the bucket-sort mostly used here. It sorts in perfect 8 bit accuracy and sorts stable. That makes it more generally usable (e.g. for sorting 3D Bobs and polygons without any flickery). And in critical cases it can even avoid bugs in multiplexing. E.g. when big groups of sprites batch in small areas.


The approach proposed here (the combination of posts #91 and #101 or #103) counts down the actor list, so when there are multiple actors with the same Y value, they are added to the bucket in reverse order, but in the second pass the order is restored. So I think this approach is stable as well.

If think the method also has full 8 bit accuracy. The numbers of cycles mentioned are for visible sprite positions and that means 220 buckets. This can be increased to 255 buckets.
2017-09-20 15:41
lft

Registered: Jul 2007
Posts: 369
Field Sort
Previous - 1 | ... | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | ... | 21 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
6R6/shape/[n0]
csabanw
hedning/G★P
Exile/Anubis
kbs/Pht/Lxt
Isildur/Samar
TCE/Hokuto Force
Guests online: 213
Top Demos
1 Next Level  (9.8)
2 Mojo  (9.7)
3 Coma Light 13  (9.7)
4 Edge of Disgrace  (9.6)
5 No Bounds  (9.6)
6 Comaland 100%  (9.6)
7 Uncensored  (9.6)
8 The Ghost  (9.6)
9 Wonderland XIV  (9.6)
10 Bromance  (9.6)
Top onefile Demos
1 Party Elk 2  (9.7)
2 Cubic Dream  (9.6)
3 Copper Booze  (9.5)
4 Rainbow Connection  (9.5)
5 TRSAC, Gabber & Pebe..  (9.5)
6 Onscreen 5k  (9.5)
7 Dawnfall V1.1  (9.5)
8 Quadrants  (9.5)
9 Daah, Those Acid Pil..  (9.5)
10 Birth of a Flower  (9.5)
Top Groups
1 Booze Design  (9.3)
2 Nostalgia  (9.3)
3 Oxyron  (9.3)
4 Censor Design  (9.3)
5 Crest  (9.3)
Top Swappers
1 Derbyshire Ram  (10)
2 Jerry  (9.8)
3 Violator  (9.8)
4 Acidchild  (9.7)
5 Starlight  (9.6)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.049 sec.