[CSDb] - User Forums

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Sorting

2007-10-08 16:08

Oswald

Registered: Apr 2002
Posts: 5086

Sorting

are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :)

... 193 posts hidden. Click here to view all posts....

2017-09-21 05:28

lft

Registered: Jul 2007
Posts: 369

Quoting Trash

The sorting algorithm implemented by HCL is actually called Counting Sort (I just found that out..) and it is considered to be O(n).

It should be competive to the discussed code both in size and speed.

Thanks for the link! That does indeed seem useful.

Here is a straighforward implementation:

for each actor i
   ldx ypos+i
   inc count,x

   lda #0
   clc
for each count k
   sta start+k
   adc count+k

for each actor i
   ldx ypos+i
   ldy start,x
   lda #i
   sta output,y
   inc start,x
   lda #0
   sta count,x

Total number of cycles is 10*i + 4 + 8*k + 28*i, so for our benchmark case with 32 actors and 220 y-positions that would be 2980 cycles, or 47.3 rasterlines. So the above implementation is not competitive.

So I suppose the game is on to try to improve it!

2017-09-21 06:21

Trash

Registered: Jan 2002
Posts: 122

Quoting lft

So I suppose the game is on to try to improve it!

Try losing precision on K by the right amount and in the right way and it will improve without losing to much accuracy....

2017-09-21 21:55

Fresh

Registered: Jan 2005
Posts: 101

A less "aggressive" approach to lft's solution can be obtained by using brk instead of jmp.
It's clearly a bit slower but the emptying routine is only one and be put everywhere: a bit less messy and more "game-friendly".

irq:
	sty $ff+stack	// Fix stack for RTI
	lda buckethead,y
irqloop:	
	sta output,x
	tay
	inx
	lda list,y
	bpl irqloop
	ldy $ff+stack
	sta table,y
	sta sectable,y
	rti

2017-09-22 05:06

lft

Registered: Jul 2007
Posts: 369

Fresh, that is true, and you even get to place the INY field on any page you like. That's because the opcode is 0, so it doesn't matter that SHX randomly masks it with the destination address + 1.

I would suggest that you store-absolute the output, perhaps to the zero-page, and self-modify the instruction to increment the pointer. That'd free up X as a scratch register, and you wouldn't have to restore Y from the stack area.

But there are drawbacks with this approach. As you mention, it is a bit slower: 15 cycles per actor, after implementing my suggestion above, so that'd give us a total of 43 rasterlines instead of 35. And you cannot do the sorting in main context while interrupts are running, because you'd lose even more cycles distinguishing between BRK/IRQ. More subtly, if a BRK coincides with an interrupt, the interrupt is lost, although the peripheral will typically keep asserting the (maskable) interrupt until it is eventually serviced.

2017-09-22 07:48

HCL

Registered: Feb 2003
Posts: 727

Oh.. i'm late into the discussion again :P

First, i didn't invent the kind of sorting that is used in TimeMachine (and many other Booze-demos). It seems that i am not very good in inventing things. Perhaps i'm better at looking at other's code, understanding it, and making benefit to my own needs. I think i stole it from some demo from the late-80:s..

The nice thing about it is that i wrote a code-generator for it. Perhaps it was for the 3d-Bob-part in TimeMachine.. When i change 3d-bob-object it generates sorting code for the number of bobs that i currently have. Neither that being rocket science, but it helped making up to 32 bobs update each frame :).

2017-09-22 10:11

Fresh

Registered: Jan 2005
Posts: 101

@lft: perfect analysis, no surprise from you! :)

2017-09-22 19:29

lft

Registered: Jul 2007
Posts: 369

@HCL: Could you comment on how to reduce the number of buckets without losing too much accuracy? Trash is being deliberately enigmatic for fear of breaking a confidence, so we could really use a hint. Or would you rather have us dig into the code and see for ourselves?

2017-09-22 20:17

Trash

Registered: Jan 2002
Posts: 122

Quote: @HCL: Could you comment on how to reduce the number of buckets without losing too much accuracy? Trash is being deliberately enigmatic for fear of breaking a confidence, so we could really use a hint. Or would you rather have us dig into the code and see for ourselves?

I do dare to tell you now when I know he borrowed the solution...

This is for sprites (eight per row):

If you remove the last three bits by using lsr you have your new K-value BUT you have to pair it with the original value. So what you are doing is that you assign your actors a value remove lose accuracy (lsr, lsr, lsr or a LUT (K = max value of that table)) store that value in the table you actually sort.

I might be wrong but this works for me using almost exactly your example code...

2017-09-23 23:38

Copyfault

Registered: Dec 2001
Posts: 474

This Countingsort got me thinking...

So I had a closer look at lft's straight forward implementation given before and remembered
Quoting Trash

Try losing precision on K by the right amount and in the right way and it will improve without losing to much accuracy....

What about doing it as follows:

   ldx #offset
for SPRITE = 0 to n-1
   dcp (ypos(SPRITE),x)  ;operand byte is also memory for ypos
next
   lda #$ff
   ldx #$00
for LINE = 0 to 2*(floor((k/2)-2)) STEP 2
   stx start+LINE
   sbx #$00              ;operand byte is memory for the count-array
next
   stx start+2*(floor((k/2)-1))
for SPRITE = 0 to n-1
   ldx ypos(SPRITE)
   ldy start,x
   lda #SPRITE
   sta output,y
   inc start,x
   lda #0
   sta (offset,x)
next

Remarks:
- all loops have to be unrolled
- instead of incrementing the count-array entries for every occurance of the corresponding LINE-value they are decreased, giving [$00-#occurances of LINE] in each array position count[LINE]. But as the adc-instruction in the 2nd loop is also substituted by an sbx-command, this leads to the same result without having to care about the state of carry.
- 2nd byte of each dec-cmp-instruction suits also as memory where the ypos-val for the corresponding sprite-number is stored, i.e.

   ...
Sprite0_ypos = *+1
   dcp (ypos(Sprite0),x)
Sprite1_ypos = *+1
   dcp (ypos(Sprite1),x)
   ...

- only evenly spaced values for the ypositions admitted, i.e. consecutive values must have distance=2
- each ypos corresponds to a pointer in zp pointing to the corresponding operand byte of the sbx#$..-instructions. The 2nd loop would look like

   lda #$ff
   ldx #$00
   stx start+0
count_ypos0 = *+1
   sbx #$00
   stx start+2
count_ypos2 = *+1
   sbx #$00
   stx start+4
   ...

with the corresponding zp part

ypos0 .byte <count_ypos0, >count_ypos0
ypos2 .byte <count_ypos2, >count_ypos2
...

- "offset" is introduced to be able to keep away from the $00/$01-adresses

Beside the ram usage for the start- and output-table the given approach needs a fair amount of zp-memory (one vector per possible sprite-ypos!). This most probably makes it not very usable for real in-game use; what's more: due to the fact that a vector pointer always consists of two consecutive bytes only even-numbered y-positions are handled (alternatively only odd-numbered, just one type...). This could be seen as "loosing precision on the ypos-values";)

On the positive side, counting cylces the approach gets 2 + n*8 + 4 + ((k/2)-1)*6 + 4 + n*29, so for n=32 and k=220 the routine needs 1848 cycles in total -> 29+1/3 rasterlines.
RAM-usage is at $2 + n*$2 + $4 + ((k/2)-1)*$5 + $3 + n*$12 = $4aa
(plus 220 bytes in zp plus 110 bytes for the "starting" table plus 32 bytes for "output")

Mind that it would be unfair to compare this to the cycle count lft gave for the direct approach as the count-array has been halved. But a notable gain in cycle consumption comes from cutting the inner part of the 2nd loop down to 6 cycles which would also make a full loop over 220 array entries considerably faster. Maybe there's a clever way to handle full precision on the ypos-values by appropriately pre-conditioning the ypos-values plus doing a sensible correction loop afterwards... remarks and additions highly welcome;)

2017-09-24 08:58

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

Quote: The sorting algorithm implemented by HCL is actually called Counting Sort (I just found that out..) and it is considered to be O(n).

It should be competive to the discussed code both in size and speed.

Thanks! I found a short article about Counting Sort in C=hacking Issue #18:
http://www.ffd2.com/fridge/chacking/c=hacking18.txt

Previous - 1 | ... | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | ... | 21 - Next

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

Gregfeel/Lepsi De, S..
t0m3000/hf^boom!^ibx
Alias Medron/Padua
serato/Finnish Gold
Sychamis
The Syndrom/TIA/Pret..
MWR/Visdom
Adder/Triad
Copyfault/TOM/tsn
WVL/Xenon
nucleus/TempesT
mutetus/Ald ^ Ons
Electric/Extend
Freeze/Blazon
CopAss/Leader
rambo/Therapy/ Resou..
Hairdog/BOOM!^Dream
da Blondie/Resource
CA$H/TRiAD
haschpipan
zzarko/Avatar
Falcon soft
Jammer
Guests online: 136

Top Demos

1 Next Level  (9.7)
2 13:37  (9.7)
3 Coma Light 13  (9.7)
4 Edge of Disgrace  (9.6)
5 Mojo  (9.6)
6 Uncensored  (9.6)
7 Wonderland XIV  (9.6)
8 Comaland 100%  (9.6)
9 No Bounds  (9.6)
10 Christmas Megademo  (9.5)

Top onefile Demos

1 Layers  (9.6)
2 Party Elk 2  (9.6)
3 Cubic Dream  (9.6)
4 Copper Booze  (9.6)
5 Libertongo  (9.5)
6 Rainbow Connection  (9.5)
7 Onscreen 5k  (9.5)
8 Morph  (9.5)
9 Dawnfall V1.1  (9.5)
10 It's More Fun to Com..  (9.5)

Top Groups

1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Nostalgia  (9.3)
5 Censor Design  (9.3)

Top Webmasters

1 Slaygon  (9.6)
2 Perff  (9.6)
3 Morpheus  (9.5)
4 Sabbi  (9.5)
5 CreaMD  (9.1)

Page generated in: 0.046 sec.