Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > Sorting
2007-10-08 16:08
Oswald

Registered: Apr 2002
Posts: 5086
Sorting

are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :)
 
... 193 posts hidden. Click here to view all posts....
 
2017-09-22 05:06
lft

Registered: Jul 2007
Posts: 369
Fresh, that is true, and you even get to place the INY field on any page you like. That's because the opcode is 0, so it doesn't matter that SHX randomly masks it with the destination address + 1.

I would suggest that you store-absolute the output, perhaps to the zero-page, and self-modify the instruction to increment the pointer. That'd free up X as a scratch register, and you wouldn't have to restore Y from the stack area.

But there are drawbacks with this approach. As you mention, it is a bit slower: 15 cycles per actor, after implementing my suggestion above, so that'd give us a total of 43 rasterlines instead of 35. And you cannot do the sorting in main context while interrupts are running, because you'd lose even more cycles distinguishing between BRK/IRQ. More subtly, if a BRK coincides with an interrupt, the interrupt is lost, although the peripheral will typically keep asserting the (maskable) interrupt until it is eventually serviced.
2017-09-22 07:48
HCL

Registered: Feb 2003
Posts: 727
Oh.. i'm late into the discussion again :P

First, i didn't invent the kind of sorting that is used in TimeMachine (and many other Booze-demos). It seems that i am not very good in inventing things. Perhaps i'm better at looking at other's code, understanding it, and making benefit to my own needs. I think i stole it from some demo from the late-80:s..

The nice thing about it is that i wrote a code-generator for it. Perhaps it was for the 3d-Bob-part in TimeMachine.. When i change 3d-bob-object it generates sorting code for the number of bobs that i currently have. Neither that being rocket science, but it helped making up to 32 bobs update each frame :).
2017-09-22 10:11
Fresh

Registered: Jan 2005
Posts: 101
@lft: perfect analysis, no surprise from you! :)
2017-09-22 19:29
lft

Registered: Jul 2007
Posts: 369
@HCL: Could you comment on how to reduce the number of buckets without losing too much accuracy? Trash is being deliberately enigmatic for fear of breaking a confidence, so we could really use a hint. Or would you rather have us dig into the code and see for ourselves?
2017-09-22 20:17
Trash

Registered: Jan 2002
Posts: 122
Quote: @HCL: Could you comment on how to reduce the number of buckets without losing too much accuracy? Trash is being deliberately enigmatic for fear of breaking a confidence, so we could really use a hint. Or would you rather have us dig into the code and see for ourselves?

I do dare to tell you now when I know he borrowed the solution...

This is for sprites (eight per row):

If you remove the last three bits by using lsr you have your new K-value BUT you have to pair it with the original value. So what you are doing is that you assign your actors a value remove lose accuracy (lsr, lsr, lsr or a LUT (K = max value of that table)) store that value in the table you actually sort.

I might be wrong but this works for me using almost exactly your example code...
2017-09-23 23:38
Copyfault

Registered: Dec 2001
Posts: 474
This Countingsort got me thinking...

So I had a closer look at lft's straight forward implementation given before and remembered
Quoting Trash
Try losing precision on K by the right amount and in the right way and it will improve without losing to much accuracy....

What about doing it as follows:
   ldx #offset
for SPRITE = 0 to n-1
   dcp (ypos(SPRITE),x)  ;operand byte is also memory for ypos
next
   lda #$ff
   ldx #$00
for LINE = 0 to 2*(floor((k/2)-2)) STEP 2
   stx start+LINE
   sbx #$00              ;operand byte is memory for the count-array
next
   stx start+2*(floor((k/2)-1))
for SPRITE = 0 to n-1
   ldx ypos(SPRITE)
   ldy start,x
   lda #SPRITE
   sta output,y
   inc start,x
   lda #0
   sta (offset,x)
next

Remarks:
- all loops have to be unrolled
- instead of incrementing the count-array entries for every occurance of the corresponding LINE-value they are decreased, giving [$00-#occurances of LINE] in each array position count[LINE]. But as the adc-instruction in the 2nd loop is also substituted by an sbx-command, this leads to the same result without having to care about the state of carry.
- 2nd byte of each dec-cmp-instruction suits also as memory where the ypos-val for the corresponding sprite-number is stored, i.e.
   ...
Sprite0_ypos = *+1
   dcp (ypos(Sprite0),x)
Sprite1_ypos = *+1
   dcp (ypos(Sprite1),x)
   ...
- only evenly spaced values for the ypositions admitted, i.e. consecutive values must have distance=2
- each ypos corresponds to a pointer in zp pointing to the corresponding operand byte of the sbx#$..-instructions. The 2nd loop would look like
   lda #$ff
   ldx #$00
   stx start+0
count_ypos0 = *+1
   sbx #$00
   stx start+2
count_ypos2 = *+1
   sbx #$00
   stx start+4
   ...
with the corresponding zp part
ypos0 .byte <count_ypos0, >count_ypos0
ypos2 .byte <count_ypos2, >count_ypos2
...

- "offset" is introduced to be able to keep away from the $00/$01-adresses


Beside the ram usage for the start- and output-table the given approach needs a fair amount of zp-memory (one vector per possible sprite-ypos!). This most probably makes it not very usable for real in-game use; what's more: due to the fact that a vector pointer always consists of two consecutive bytes only even-numbered y-positions are handled (alternatively only odd-numbered, just one type...). This could be seen as "loosing precision on the ypos-values";)

On the positive side, counting cylces the approach gets 2 + n*8 + 4 + ((k/2)-1)*6 + 4 + n*29, so for n=32 and k=220 the routine needs 1848 cycles in total -> 29+1/3 rasterlines.
RAM-usage is at $2 + n*$2 + $4 + ((k/2)-1)*$5 + $3 + n*$12 = $4aa
(plus 220 bytes in zp plus 110 bytes for the "starting" table plus 32 bytes for "output")

Mind that it would be unfair to compare this to the cycle count lft gave for the direct approach as the count-array has been halved. But a notable gain in cycle consumption comes from cutting the inner part of the 2nd loop down to 6 cycles which would also make a full loop over 220 array entries considerably faster. Maybe there's a clever way to handle full precision on the ypos-values by appropriately pre-conditioning the ypos-values plus doing a sensible correction loop afterwards... remarks and additions highly welcome;)
2017-09-24 08:58
Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336
Quote: The sorting algorithm implemented by HCL is actually called Counting Sort (I just found that out..) and it is considered to be O(n).

It should be competive to the discussed code both in size and speed.


Thanks! I found a short article about Counting Sort in C=hacking Issue #18:
http://www.ffd2.com/fridge/chacking/c=hacking18.txt
2017-09-24 10:00
Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336
Copyfault, that is a very interesting approach. But to halve the Y position resolution an AND #$fe has to be applied, which costs an additional 2 cycles per actor. And if you want to preserve the full resolution values as well (for use in the Y updating routine for example), you will have to store them somewhere, adding more to the cycle count.
2017-09-24 15:54
Copyfault

Registered: Dec 2001
Posts: 474
Quote: Copyfault, that is a very interesting approach. But to halve the Y position resolution an AND #$fe has to be applied, which costs an additional 2 cycles per actor. And if you want to preserve the full resolution values as well (for use in the Y updating routine for example), you will have to store them somewhere, adding more to the cycle count.

Ofcourse this works only if the game concept allows for all objects moving with suitable speed (all values hsve to be a multiple of two).

Otherwise you're right, all the necessary additional position handling will most probably eat up too much cycles. What's the current best case cycle count??
2017-09-24 16:22
Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336
See posts #91 and #101. It is about 3*k+43*n.
For lft's Field Sort (post #126) it is 2*k+55*n.
Previous - 1 | ... | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
Didi/Laxity
sln.pixelrat
iceout/Avatar/HF
mutetus/Ald ^ Ons
Frost/Triad
Sychamis
bugjam
soci/Singular
Airwolf/F4CG
redcrab/G★P
hedning/G★P
DuncanTwain
Acidchild/Padua
Guests online: 111
Top Demos
1 Next Level  (9.7)
2 13:37  (9.7)
3 Coma Light 13  (9.7)
4 Edge of Disgrace  (9.6)
5 Mojo  (9.6)
6 Uncensored  (9.6)
7 Wonderland XIV  (9.6)
8 Comaland 100%  (9.6)
9 No Bounds  (9.6)
10 Christmas Megademo  (9.5)
Top onefile Demos
1 Layers  (9.6)
2 Party Elk 2  (9.6)
3 Cubic Dream  (9.6)
4 Copper Booze  (9.6)
5 Libertongo  (9.5)
6 Rainbow Connection  (9.5)
7 Onscreen 5k  (9.5)
8 Morph  (9.5)
9 Dawnfall V1.1  (9.5)
10 It's More Fun to Com..  (9.5)
Top Groups
1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Nostalgia  (9.3)
5 Censor Design  (9.3)
Top Diskmag Editors
1 Magic  (9.8)
2 hedning  (9.6)
3 Jazzcat  (9.5)
4 Elwix  (9.1)
5 Remix  (9.1)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.05 sec.