Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > Sorting
2007-10-08 16:08
Oswald

Registered: Apr 2002
Posts: 5007
Sorting

are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :)
 
... 193 posts hidden. Click here to view all posts....
 
2017-09-21 03:27
ChristopherJam

Registered: Aug 2004
Posts: 1359
…or then there was Alien Syndrome, where you could never have more than one alien on any given raster 😂
2017-09-21 05:28
lft

Registered: Jul 2007
Posts: 369
Quoting Trash
The sorting algorithm implemented by HCL is actually called Counting Sort (I just found that out..) and it is considered to be O(n).

It should be competive to the discussed code both in size and speed.


Thanks for the link! That does indeed seem useful.

Here is a straighforward implementation:

for each actor i
   ldx ypos+i
   inc count,x

   lda #0
   clc
for each count k
   sta start+k
   adc count+k

for each actor i
   ldx ypos+i
   ldy start,x
   lda #i
   sta output,y
   inc start,x
   lda #0
   sta count,x


Total number of cycles is 10*i + 4 + 8*k + 28*i, so for our benchmark case with 32 actors and 220 y-positions that would be 2980 cycles, or 47.3 rasterlines. So the above implementation is not competitive.

So I suppose the game is on to try to improve it!
2017-09-21 06:21
Trash

Registered: Jan 2002
Posts: 122
Quoting lft
So I suppose the game is on to try to improve it!


Try losing precision on K by the right amount and in the right way and it will improve without losing to much accuracy....
2017-09-21 21:55
Fresh

Registered: Jan 2005
Posts: 101
A less "aggressive" approach to lft's solution can be obtained by using brk instead of jmp.
It's clearly a bit slower but the emptying routine is only one and be put everywhere: a bit less messy and more "game-friendly".

irq:
	sty $ff+stack	// Fix stack for RTI
	lda buckethead,y
irqloop:	
	sta output,x
	tay
	inx
	lda list,y
	bpl irqloop
	ldy $ff+stack
	sta table,y
	sta sectable,y
	rti	
2017-09-22 05:06
lft

Registered: Jul 2007
Posts: 369
Fresh, that is true, and you even get to place the INY field on any page you like. That's because the opcode is 0, so it doesn't matter that SHX randomly masks it with the destination address + 1.

I would suggest that you store-absolute the output, perhaps to the zero-page, and self-modify the instruction to increment the pointer. That'd free up X as a scratch register, and you wouldn't have to restore Y from the stack area.

But there are drawbacks with this approach. As you mention, it is a bit slower: 15 cycles per actor, after implementing my suggestion above, so that'd give us a total of 43 rasterlines instead of 35. And you cannot do the sorting in main context while interrupts are running, because you'd lose even more cycles distinguishing between BRK/IRQ. More subtly, if a BRK coincides with an interrupt, the interrupt is lost, although the peripheral will typically keep asserting the (maskable) interrupt until it is eventually serviced.
2017-09-22 07:48
HCL

Registered: Feb 2003
Posts: 716
Oh.. i'm late into the discussion again :P

First, i didn't invent the kind of sorting that is used in TimeMachine (and many other Booze-demos). It seems that i am not very good in inventing things. Perhaps i'm better at looking at other's code, understanding it, and making benefit to my own needs. I think i stole it from some demo from the late-80:s..

The nice thing about it is that i wrote a code-generator for it. Perhaps it was for the 3d-Bob-part in TimeMachine.. When i change 3d-bob-object it generates sorting code for the number of bobs that i currently have. Neither that being rocket science, but it helped making up to 32 bobs update each frame :).
2017-09-22 10:11
Fresh

Registered: Jan 2005
Posts: 101
@lft: perfect analysis, no surprise from you! :)
2017-09-22 19:29
lft

Registered: Jul 2007
Posts: 369
@HCL: Could you comment on how to reduce the number of buckets without losing too much accuracy? Trash is being deliberately enigmatic for fear of breaking a confidence, so we could really use a hint. Or would you rather have us dig into the code and see for ourselves?
2017-09-22 20:17
Trash

Registered: Jan 2002
Posts: 122
Quote: @HCL: Could you comment on how to reduce the number of buckets without losing too much accuracy? Trash is being deliberately enigmatic for fear of breaking a confidence, so we could really use a hint. Or would you rather have us dig into the code and see for ourselves?

I do dare to tell you now when I know he borrowed the solution...

This is for sprites (eight per row):

If you remove the last three bits by using lsr you have your new K-value BUT you have to pair it with the original value. So what you are doing is that you assign your actors a value remove lose accuracy (lsr, lsr, lsr or a LUT (K = max value of that table)) store that value in the table you actually sort.

I might be wrong but this works for me using almost exactly your example code...
2017-09-23 23:38
Copyfault

Registered: Dec 2001
Posts: 466
This Countingsort got me thinking...

So I had a closer look at lft's straight forward implementation given before and remembered
Quoting Trash
Try losing precision on K by the right amount and in the right way and it will improve without losing to much accuracy....

What about doing it as follows:
   ldx #offset
for SPRITE = 0 to n-1
   dcp (ypos(SPRITE),x)  ;operand byte is also memory for ypos
next
   lda #$ff
   ldx #$00
for LINE = 0 to 2*(floor((k/2)-2)) STEP 2
   stx start+LINE
   sbx #$00              ;operand byte is memory for the count-array
next
   stx start+2*(floor((k/2)-1))
for SPRITE = 0 to n-1
   ldx ypos(SPRITE)
   ldy start,x
   lda #SPRITE
   sta output,y
   inc start,x
   lda #0
   sta (offset,x)
next

Remarks:
- all loops have to be unrolled
- instead of incrementing the count-array entries for every occurance of the corresponding LINE-value they are decreased, giving [$00-#occurances of LINE] in each array position count[LINE]. But as the adc-instruction in the 2nd loop is also substituted by an sbx-command, this leads to the same result without having to care about the state of carry.
- 2nd byte of each dec-cmp-instruction suits also as memory where the ypos-val for the corresponding sprite-number is stored, i.e.
   ...
Sprite0_ypos = *+1
   dcp (ypos(Sprite0),x)
Sprite1_ypos = *+1
   dcp (ypos(Sprite1),x)
   ...
- only evenly spaced values for the ypositions admitted, i.e. consecutive values must have distance=2
- each ypos corresponds to a pointer in zp pointing to the corresponding operand byte of the sbx#$..-instructions. The 2nd loop would look like
   lda #$ff
   ldx #$00
   stx start+0
count_ypos0 = *+1
   sbx #$00
   stx start+2
count_ypos2 = *+1
   sbx #$00
   stx start+4
   ...
with the corresponding zp part
ypos0 .byte <count_ypos0, >count_ypos0
ypos2 .byte <count_ypos2, >count_ypos2
...

- "offset" is introduced to be able to keep away from the $00/$01-adresses


Beside the ram usage for the start- and output-table the given approach needs a fair amount of zp-memory (one vector per possible sprite-ypos!). This most probably makes it not very usable for real in-game use; what's more: due to the fact that a vector pointer always consists of two consecutive bytes only even-numbered y-positions are handled (alternatively only odd-numbered, just one type...). This could be seen as "loosing precision on the ypos-values";)

On the positive side, counting cylces the approach gets 2 + n*8 + 4 + ((k/2)-1)*6 + 4 + n*29, so for n=32 and k=220 the routine needs 1848 cycles in total -> 29+1/3 rasterlines.
RAM-usage is at $2 + n*$2 + $4 + ((k/2)-1)*$5 + $3 + n*$12 = $4aa
(plus 220 bytes in zp plus 110 bytes for the "starting" table plus 32 bytes for "output")

Mind that it would be unfair to compare this to the cycle count lft gave for the direct approach as the count-array has been halved. But a notable gain in cycle consumption comes from cutting the inner part of the 2nd loop down to 6 cycles which would also make a full loop over 220 array entries considerably faster. Maybe there's a clever way to handle full precision on the ypos-values by appropriately pre-conditioning the ypos-values plus doing a sensible correction loop afterwards... remarks and additions highly welcome;)
Previous - 1 | ... | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | ... | 21 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
Paul Bearer
WVL/Xenon
Mason/Unicess
t0m3000/ibex-crew
Didi/Laxity
algorithm
Guests online: 350
Top Demos
1 Next Level  (9.8)
2 Mojo  (9.7)
3 Coma Light 13  (9.7)
4 Edge of Disgrace  (9.6)
5 No Bounds  (9.6)
6 Comaland 100%  (9.6)
7 Uncensored  (9.6)
8 The Ghost  (9.6)
9 Wonderland XIV  (9.6)
10 Bromance  (9.6)
Top onefile Demos
1 Party Elk 2  (9.7)
2 Cubic Dream  (9.6)
3 Copper Booze  (9.5)
4 Rainbow Connection  (9.5)
5 TRSAC, Gabber & Pebe..  (9.5)
6 Onscreen 5k  (9.5)
7 Dawnfall V1.1  (9.5)
8 Quadrants  (9.5)
9 Daah, Those Acid Pil..  (9.5)
10 Birth of a Flower  (9.5)
Top Groups
1 Booze Design  (9.3)
2 Nostalgia  (9.3)
3 Oxyron  (9.3)
4 Censor Design  (9.3)
5 Crest  (9.3)
Top Diskmag Editors
1 Jazzcat  (9.4)
2 Magic  (9.4)
3 hedning  (9.2)
4 Newscopy  (9.1)
5 Elwix  (9.1)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.052 sec.