[CSDb] - User Forums

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Sorting

2007-10-08 16:08

Oswald

Registered: Apr 2002
Posts: 5126

Sorting

are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :)

... 193 posts hidden. Click here to view all posts....

2017-09-18 12:25

ChristopherJam

Registered: Aug 2004
Posts: 1423

re. SKW etc: Ooh, dangerous! So I think we're up to around 10k for the PHA routines now?

Will have to see if that can be sanely(!) combined with my thinking from today:

I managed to trim two cycles off the populate routine by switching to just using even numbers for the actor IDs (can interleave most of the data tables), and using a section of the stack to store Y values interleaved with actor IDs.

    ldx bucketK+1
    ldy newBranchTableValue,x
    sty bucketK+1
    pla ; grab this actor's index
    sta bucketK+3,y       ;21

    pla ; grab next Yvalue
    sta *+4
    jmp (jumptable)       ;11

(I was going to read the actor ID out of one of the CIA timers while pulling Y value off the stack, but realised that the timer will be counting down while the stack address increases, so had to abandon that idea).

so, 220*3+41*32=1972 cycles, 61.6 per actor, 31.3 raster lines :)

2017-09-18 12:26

ChristopherJam

Registered: Aug 2004
Posts: 1423

Quoting Trash

...1400 cycles...

!!!

2017-09-18 13:27

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

Quoting ChristopherJam

I managed to trim two cycles off the populate routine by switching to just using even numbers for the actor IDs (can interleave most of the data tables), and using a section of the stack to store Y values interleaved with actor IDs.
so, 220*3+41*32=1972 cycles, 61.6 per actor, 31.3 raster lines :)

Awesome! Less than 1 raster per actor :-)
I actually thought of that as well, but won't that cost you somewhere else? I mean you have to write updated Y values to the stack in that form and maybe that is more costly than writing them non-interleaved? But perhaps not when you use unrolled code ...

2017-09-18 13:41

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

Quote: I dont think this is really exactly what you guys are discussing but I suggest you all to have a look at the sortingroutines used in Time Machine, I have been given an explanation but I don't really understand how it works. I have code that sorts 32 elements just shy of 1400 cycles (if I remember correctly) with a constant speed using the explanation I was given, since it's HCLs I wont share it without his explicit permission but dropping a hint on where to look must be forgiven.

Since it is Time Machine, it must be ahead of us :-)

Very likely I won't understand that code without any thorough explanation, but is it for the same range of Y values (number of buckets)? And can multiple actors have the same Y values?

Maybe HCL wants to join the discussion?

2017-09-18 14:30

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

Quoting ChristopherJam

(I was going to read the actor ID out of one of the CIA timers while pulling Y value off the stack, but realised that the timer will be counting down while the stack address increases, so had to abandon that idea).

I remember this post:
SID to calculate line slope
Maybe that could work?

2017-09-18 15:25

lft

Registered: Jul 2007
Posts: 369

Quoting ChristopherJam

I was going to read the actor ID out of one of the CIA timers while pulling Y value off the stack, but realised that the timer will be counting down while the stack address increases, so had to abandon that idea

I haven't been following this discussion in detail, so I don't know if it helps, but: If your routine takes e.g. 31 cycles, then you can mask the timer value by 31, and it will appear to increment.

2017-09-18 16:16

ChristopherJam

Registered: Aug 2004
Posts: 1423

Ok, a few random thoughts.

I see now that the routine in comment #103 takes the same number of cycles as that in #105; it just fetches an actor ID with TYA:DEY instead of PLA, and fetches the next Y coordinate with lda YV,Y instead of PLA.

Even if the Y coordinates are on the stack, you still need the DEY to update the actor index.

I can't see a way to combine them, so suspect I'd lean towards my routine (same speed, 5k less unrolled code, 64 bytes more stack usage, faster performance when many actors are on the one line).

Rather than copying the Y coordinates to the stack, I'd make that their home, and also only write the actor IDs once at startup. You would of course have to be very careful to ensure no interrupts while running that section of code, NMI included

Quoting Color Bar

SID to calculate line slope
Maybe that could work?

Oh god. Quite possibly, but I'd rather avoid hamstringing music/effects.

lft; sadly no cycles left for masking :)

*downloads Time Machine to examine later*

2017-09-18 16:54

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

Quoting ChristopherJam

Ok, a few random thoughts.

I see now that the routine in comment #103 takes the same number of cycles as that in #105; it just fetches an actor ID with TYA:DEY instead of PLA, and fetches the next Y coordinate with lda YV,Y instead of PLA.

Even if the Y coordinates are on the stack, you still need the DEY to update the actor index.

I can't see a way to combine them, so suspect I'd lean towards my routine (same speed, 5k less unrolled code, 64 bytes more stack usage, faster performance when many actors are on the one line).

Rather than copying the Y coordinates to the stack, I'd make that their home, and also only write the actor IDs once at startup. You would of course have to be very careful to ensure no interrupts while running that section of code, NMI included

Yes, your approach has some advantages, including not having to assume that the maximum number of actors per bucket is never exceeded. Unfortunately it does not use any illegals though :-(

2017-09-18 18:57

Trash
Account closed

Registered: Jan 2002
Posts: 122

So i examined the tickcounts of the code I've got, with heavy use of zeropage it can sort 32 elements in 1188 cycles, without zp it does the same job in 1448 cycles (unless my calculations are incorrect). The drawback is that every value you want to sort has to be reduced in accuracy in order for it to work (eg two lda, sta in two different tables)...'

*edit*

HCL actually drops a hint for this sorter in the commentsection of Classics

2017-09-18 21:22

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

Quote: re. SKW etc: Ooh, dangerous! So I think we're up to around 10k for the PHA routines now?

Will have to see if that can be sanely(!) combined with my thinking from today:

I managed to trim two cycles off the populate routine by switching to just using even numbers for the actor IDs (can interleave most of the data tables), and using a section of the stack to store Y values interleaved with actor IDs.

ldx bucketK+1 ldy newBranchTableValue,x sty bucketK+1 pla ; grab this actor's index sta bucketK+3,y ;21 pla ; grab next Yvalue sta *+4 jmp (jumptable) ;11

(I was going to read the actor ID out of one of the CIA timers while pulling Y value off the stack, but realised that the timer will be counting down while the stack address increases, so had to abandon that idea).

so, 220*3+41*32=1972 cycles, 61.6 per actor, 31.3 raster lines :)

I'm sorry, but I think you missed 2 cycles: PLA / STA *+4 / JMP (jumptable) takes 13 cycles. This means that the approach in post #103 is 2 cycles faster.

Previous - 1 | ... | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | ... | 21 - Next

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

Feltzkrone
Matt
megasoftargentina
rambo/Therapy/ Resou..
Guests online: 360

Top Demos

1 Next Level  (9.7)
2 13:37  (9.7)
3 Codeboys & Endians  (9.7)
4 Mojo  (9.6)
5 Coma Light 13  (9.6)
6 Edge of Disgrace  (9.6)
7 Signal Carnival  (9.6)
8 Uncensored  (9.5)
9 Wonderland XIV  (9.5)
10 No Bounds  (9.5)

Top onefile Demos

1 Nine  (9.7)
2 Layers  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.5)
6 Scan and Spin  (9.5)
7 Onscreen 5k  (9.5)
8 Grey  (9.5)
9 Dawnfall V1.1  (9.5)
10 Rainbow Connection  (9.5)

Top Groups

1 Artline Designs  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Performers  (9.3)
5 Censor Design  (9.3)

Top Coders

1 Serato  (9.9)
2 Axis  (9.8)
3 Sailor  (9.8)
4 Graham  (9.8)
5 Lft  (9.8)

Page generated in: 0.079 sec.