Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > Sorting
2007-10-08 16:08
Oswald

Registered: Apr 2002
Posts: 5017
Sorting

are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :)
 
... 193 posts hidden. Click here to view all posts....
 
2017-09-10 22:29
Rastah Bar

Registered: Oct 2012
Posts: 336
Quote: Color Bar, that is indeed a fast way to get a sorted list of Y values, but what is needed is a sorted list of the indices of the sprites that have those Y positions.

(eg if sprites 1, 2, 3 and 4 are at positions 10, 46, 73 and 32, the required output is 1,4,2,3, not 10,32,46,73 )


I think that can be done in less than 37 rasterlines.

Consider the following unrolled code for outputting sprite indices, sorted according to Y-value (phase 2)

.bucketK
bcc .nextBucket		;Branches always: decrease this value by 3 with selfmodifying code in phase 1 for every new sprite in the bucket.
lda #sprite_index	;index of last sprite
pha		
...	
lda #sprite_index  	;index of first sprite.
pha		
sty .bucketK+1	;Restore branch (to skip all sprites for empty buckets). Call with y=$1b for max 8 sprites per bucket. 

.nextBucket	;unrolled
-------------
3 cycles for an empty bucket, 12 voor buckets with 1 sprite => 3*(220-32)+12*32 = 948 cycli 


Phase1: The following unrolled code jumps to the code that modifies the above phase 2 code for the right bucket:
dex   ;counts sprite index. Initialized with $20.
bmi .phase2
lda Yvalues,x   
asl
bcs *+8   ;There are more than 128 buckets, so I use 2 jump tables
sta *+4
jmp ($xxxx) ;jump table for buckets <128	;jumps to code that fills out phase2 Bucket code (see below)
sta *+4
jmp ($xxxx) ;jump table for buckets >=128	;21/22 cycles sofar
.phase2
clc	;to branch always (either to next bucket or to fetch the sprite indices in the current bucket).
ldy #27		;to restore the branch value of the bucket
jmp .Bucket0
           

;code to fill out bucketK
ldy .bucketK+1   ;The trick is that I use this as an index for the number of sprites already in the bucket
lda newBranchValue,y		;fetch value to branch to the code for one more sprite 
sta .bucketK+1   ;since this is unrolled code, the address of .bucketK is known
tay
txa
sta .bucketK+3,y  ;Store new sprite index. This works since the branch value decreases by 3 for each new sprite, but the address where to store the sprite index as well!
...	;Unrolled => repeat all the above code starting with dex.
--------------
41/42 cycles ->41.5*32 = 1328

In total 1328+948 =2276 -> 36.1 lines.
But there will be at least 25 additional cycles from page boundary crossings. Slightly less than 37 lines I think if I did not overlook something.

Table with branch values (for max 8 sprites per bucket) looks like (x=don't care)
newBranchValue
 BYTE 0 x x 0 x x 3 x x 6 x x 9 x x 12 x x 15 x x 18 x x x x x 21 x 

The first zero makes that a full bucket replaces the last sprite with the new one.
2017-09-12 09:28
ChristopherJam

Registered: Aug 2004
Posts: 1377
Excellent work, Color Bar.

If every sprite lands in the second jump table, that pushes you up to 42 cycles for every insertion, but total time in that case is still only
3*(220-32)+12*32 + 42*32 = 2292 cycles, or 36.4 raster lines.

The 25 additional cycles from page boundary crossings you mention can be avoided by padding the output routine out to 32 bytes by adding three bytes of padding after the BCC .nextBucket, and starting bucket0 at $xxff

It takes up another 660 bytes, but the routine is already occupying most of a bank as it is, so that's small change at this point ;)
2017-09-12 10:36
ChristopherJam

Registered: Aug 2004
Posts: 1377
Should be able to save about 1760 bytes by only including the last 12 bytes of the insertion code in every third routine; the preceding and succeeding routine for each complete routine can just jump to the nearest complete one (the part from the second sta *+4 onward).

Timing's unchanged, but drags it back down to around 14k including jump tables.
2017-09-12 11:48
Rastah Bar

Registered: Oct 2012
Posts: 336
Quoting ChristopherJam
Excellent work, Color Bar.
The 25 additional cycles from page boundary crossings you mention can be avoided by padding the output routine out to 32 bytes by adding three bytes of padding after the BCC .nextBucket, and starting bucket0 at $xxff

It takes up another 660 bytes, but the routine is already occupying most of a bank as it is, so that's small change at this point ;)

Those 3 padding bytes can be put to good use by allowing 9 sprites in a bucket.

The routine probably uses too much memory for many applications.
2017-09-12 11:49
Rastah Bar

Registered: Oct 2012
Posts: 336
Quoting ChristopherJam
Should be able to save about 1760 bytes by only including the last 12 bytes of the insertion code in every third routine; the preceding and succeeding routine for each complete routine can just jump to the nearest complete one (the part from the second sta *+4 onward).

Not sure I understand this. You need a separate fill routine for every possible Y-value/bucket.
2017-09-12 12:10
Rastah Bar

Registered: Oct 2012
Posts: 336
Btw, if you just want to sort 32 random numbers, the phase 2 code could be modified into
.bucketK
bcc .nextBucket
lda #K
pha
lda #K
pha
...
pha
sty .BucketK+1

and the fill code can be limited to
ldy .bucketK+1 
lda newBranchValue,y		
sta .bucketK+1


Resulting in 3*(220-32)+12*32 + 33.5*32 -> 32 lines. Put perhaps that application requires 256 buckets and then it takes 33.8 lines.
2017-09-12 16:19
ChristopherJam

Registered: Aug 2004
Posts: 1377
Quoting Color Bar
Not sure I understand this. You need a separate fill routine for every possible Y-value/bucket.


True, but every one of the 220 routines ends with
.hitableN
  sta *+4
  jmp ($xxxx) ;jump table for buckets >=128
.phase2N
  clc	                ;to branch always.
  ldy #27	   	;to restore the branch value of the bucket
  jmp .Bucket0


You can leave that tail off two thirds of them, eg only keeping it for buckets 1,4,7,10 etc.
The routine for bucket 0 would replace the bcs *+8 with a branch to the hitable lable in the the routine for bucket 1, as would the routine for bucket 2
2017-09-12 16:23
ChristopherJam

Registered: Aug 2004
Posts: 1377
Quoting Color Bar
The routine probably uses too much memory for many applications.


A quarter of all RAM? Yes, that's probably a tad excessive in practice :)

It'd be interesting to have a map of best worst case routine for different numbers of actors, and different limits on memory use... could even have CPU time as altitude…
2017-09-12 17:30
Rastah Bar

Registered: Oct 2012
Posts: 336
Quoting ChristopherJam
Quoting Color Bar
Not sure I understand this. You need a separate fill routine for every possible Y-value/bucket.


True, but every one of the 220 routines ends with
.hitableN
  sta *+4
  jmp ($xxxx) ;jump table for buckets >=128
.phase2N
  clc	                ;to branch always.
  ldy #27	   	;to restore the branch value of the bucket
  jmp .Bucket0


You can leave that tail off two thirds of them, eg only keeping it for buckets 1,4,7,10 etc.
The routine for bucket 0 would replace the bcs *+8 with a branch to the hitable lable in the the routine for bucket 1, as would the routine for bucket 2


Yes, I understand it now. Thanks!
2017-09-12 17:41
Rastah Bar

Registered: Oct 2012
Posts: 336
Quote: Quoting Color Bar
The routine probably uses too much memory for many applications.


A quarter of all RAM? Yes, that's probably a tad excessive in practice :)

It'd be interesting to have a map of best worst case routine for different numbers of actors, and different limits on memory use... could even have CPU time as altitude…


I am wondering if there aren't any theoretical bounds known in computer science, about memory usage and/or/vs CPU time. But I guess these would have to be C64 specific to be of any use.
Previous - 1 | ... | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | ... | 21 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
Andy/AEG
Peacemaker/CENSOR/Hi..
kbs/Pht/Lxt
Marq/Fit^Lieves!Tuor..
tlr
El Gato
Britelite/Dekadence
Guests online: 67
Top Demos
1 Next Level  (9.8)
2 Mojo  (9.7)
3 Coma Light 13  (9.7)
4 Edge of Disgrace  (9.6)
5 Comaland 100%  (9.6)
6 No Bounds  (9.6)
7 Uncensored  (9.6)
8 The Ghost  (9.6)
9 Wonderland XIV  (9.6)
10 Bromance  (9.6)
Top onefile Demos
1 It's More Fun to Com..  (9.8)
2 Party Elk 2  (9.7)
3 Cubic Dream  (9.6)
4 Copper Booze  (9.5)
5 Rainbow Connection  (9.5)
6 TRSAC, Gabber & Pebe..  (9.5)
7 Onscreen 5k  (9.5)
8 Wafer Demo  (9.5)
9 Dawnfall V1.1  (9.5)
10 Quadrants  (9.5)
Top Groups
1 Oxyron  (9.3)
2 Nostalgia  (9.3)
3 Booze Design  (9.3)
4 Censor Design  (9.3)
5 Crest  (9.3)
Top Crackers
1 Mr. Z  (9.9)
2 S!R  (9.9)
3 Mr Zero Page  (9.8)
4 Antitrack  (9.8)
5 OTD  (9.8)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.048 sec.