[CSDb] - User Forums

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Sorting

2007-10-08 16:08

Oswald

Registered: Apr 2002
Posts: 5127

Sorting

are sorters really so slow in games? :) I have made my unrolled version for my theoretical game (;), and it takes 132 rlines to sort 32 numbers o_O worst case is ~200 lines. tho when its the case of 4 numbers have to be swapped only it does the job in ~10 lines. wastes a lot of memory but I like it :)

... 193 posts hidden. Click here to view all posts....

2017-09-12 09:28

ChristopherJam

Registered: Aug 2004
Posts: 1424

Excellent work, Color Bar.

If every sprite lands in the second jump table, that pushes you up to 42 cycles for every insertion, but total time in that case is still only
3*(220-32)+12*32 + 42*32 = 2292 cycles, or 36.4 raster lines.

The 25 additional cycles from page boundary crossings you mention can be avoided by padding the output routine out to 32 bytes by adding three bytes of padding after the BCC .nextBucket, and starting bucket0 at $xxff

It takes up another 660 bytes, but the routine is already occupying most of a bank as it is, so that's small change at this point ;)

2017-09-12 10:36

ChristopherJam

Registered: Aug 2004
Posts: 1424

Should be able to save about 1760 bytes by only including the last 12 bytes of the insertion code in every third routine; the preceding and succeeding routine for each complete routine can just jump to the nearest complete one (the part from the second sta *+4 onward).

Timing's unchanged, but drags it back down to around 14k including jump tables.

2017-09-12 11:48

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

Quoting ChristopherJam

Excellent work, Color Bar.
The 25 additional cycles from page boundary crossings you mention can be avoided by padding the output routine out to 32 bytes by adding three bytes of padding after the BCC .nextBucket, and starting bucket0 at $xxff

It takes up another 660 bytes, but the routine is already occupying most of a bank as it is, so that's small change at this point ;)

Those 3 padding bytes can be put to good use by allowing 9 sprites in a bucket.

The routine probably uses too much memory for many applications.

2017-09-12 11:49

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

Quoting ChristopherJam

Should be able to save about 1760 bytes by only including the last 12 bytes of the insertion code in every third routine; the preceding and succeeding routine for each complete routine can just jump to the nearest complete one (the part from the second sta *+4 onward).

Not sure I understand this. You need a separate fill routine for every possible Y-value/bucket.

2017-09-12 12:10

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

Btw, if you just want to sort 32 random numbers, the phase 2 code could be modified into

.bucketK
bcc .nextBucket
lda #K
pha
lda #K
pha
...
pha
sty .BucketK+1

and the fill code can be limited to

ldy .bucketK+1 
lda newBranchValue,y		
sta .bucketK+1

Resulting in 3*(220-32)+12*32 + 33.5*32 -> 32 lines. Put perhaps that application requires 256 buckets and then it takes 33.8 lines.

2017-09-12 16:19

ChristopherJam

Registered: Aug 2004
Posts: 1424

Quoting Color Bar

Not sure I understand this. You need a separate fill routine for every possible Y-value/bucket.

True, but every one of the 220 routines ends with

.hitableN
  sta *+4
  jmp ($xxxx) ;jump table for buckets >=128
.phase2N
  clc	                ;to branch always.
  ldy #27	   	;to restore the branch value of the bucket
  jmp .Bucket0

You can leave that tail off two thirds of them, eg only keeping it for buckets 1,4,7,10 etc.
The routine for bucket 0 would replace the bcs *+8 with a branch to the hitable lable in the the routine for bucket 1, as would the routine for bucket 2

2017-09-12 16:23

ChristopherJam

Registered: Aug 2004
Posts: 1424

Quoting Color Bar

The routine probably uses too much memory for many applications.

A quarter of all RAM? Yes, that's probably a tad excessive in practice :)

It'd be interesting to have a map of best worst case routine for different numbers of actors, and different limits on memory use... could even have CPU time as altitude

2017-09-12 17:30

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

Quoting ChristopherJam

Quoting Color Bar
Not sure I understand this. You need a separate fill routine for every possible Y-value/bucket.

True, but every one of the 220 routines ends with

.hitableN sta *+4 jmp ($xxxx) ;jump table for buckets >=128 .phase2N clc ;to branch always. ldy #27 ;to restore the branch value of the bucket jmp .Bucket0

You can leave that tail off two thirds of them, eg only keeping it for buckets 1,4,7,10 etc.
The routine for bucket 0 would replace the bcs *+8 with a branch to the hitable lable in the the routine for bucket 1, as would the routine for bucket 2

Yes, I understand it now. Thanks!

2017-09-12 17:41

Rastah Bar
Account closed

Registered: Oct 2012
Posts: 336

Quote: Quoting Color Bar
The routine probably uses too much memory for many applications.

A quarter of all RAM? Yes, that's probably a tad excessive in practice :)

It'd be interesting to have a map of best worst case routine for different numbers of actors, and different limits on memory use... could even have CPU time as altitude

I am wondering if there aren't any theoretical bounds known in computer science, about memory usage and/or/vs CPU time. But I guess these would have to be C64 specific to be of any use.

2017-09-13 06:58

ChristopherJam

Registered: Aug 2004
Posts: 1424

Got a new record now, I think.

We can pack 221 addresses into a 222 byte table by using the high byte of each address as the low byte of the next address.
If we use something like this:

for i in 0..73:
    .byt 64+(i%74)*(1+i//74*24)%74)

.. there are no two addresses less than 24 bytes apart.

The fill routine then drops down to this:

    ;code to fill out bucketK.   24 bytes, 36 cycles
fillK
    ldy .bucketK+1
    lda newBranchValue,y
    sta .bucketK+1         ; 12
    tay
    txa
    sta .bucketK+3,y       ;  9

    dex
    lda Yvalues,x
    sta *+4
    jmp ($xxxx)            ; 15

    ; 36*32 =  1152 cycle, 36 per actor

Just 24 bytes long :)
We avoid the check for x becoming negative by placing actor $ff at y=0, and having the 0 entry of the branch table point to a routine
that checks X to determine whether we've reached the end of the actor list, or are just dealing with an actor that has gone offscreen.

The bucket emptying code remains the same as in Color Bar's original:

.bucketK ;32 bytes
    bcc .nextBucket        ;Branches always: decrease this value by 3 with selfmodifying code in phase 1 for every new sprite in the bucket.
    !byt 0,0,0
    lda #sprite_index   ; repeat lda/pha eight times
    pha
...
    sty .bucketK+1    ;Restore branch (to skip all sprites for empty buckets). Call with y=$1e for max 8 sprites per bucket.
.nextBucket    ;unrolled

    ; 3 cycles for an empty bucket, 12 voor buckets with 1 sprite => 3*(220-32)+12*32 = 948 cycli , 29.6 per actor

The bucket filling routines are packed three to a page over 74 pages, so around 19kb. Each page has an average of 184 bytes empty, so we can probably pack the bucket emptying code into the gaps. Note that some of the buckets would then need to branch past a group of bucket filling routines, so we'd need to use both X and Y for the two different values to restore the branches to "this bucket is empty"

So - total time, 36*32+3*(220-32)+12*32= 220*3+45*32 = 2100 cycles. 65.6 per sprite, 33.33 rasters total.

For 37 sprites or more, we're down to under a raster per sprite :D

Previous - 1 | ... | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | ... | 21 - Next

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

eater
Airwolf/F4CG
algorithm
Guests online: 452

Top Demos

1 Next Level  (9.7)
2 13:37  (9.7)
3 Codeboys & Endians  (9.7)
4 Mojo  (9.6)
5 Coma Light 13  (9.6)
6 Edge of Disgrace  (9.6)
7 Signal Carnival  (9.6)
8 Wonderland XIV  (9.5)
9 Uncensored  (9.5)
10 Comaland 100%  (9.5)

Top onefile Demos

1 Nine  (9.7)
2 Layers  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.5)
6 Scan and Spin  (9.5)
7 Onscreen 5k  (9.5)
8 Grey  (9.5)
9 Dawnfall V1.1  (9.5)
10 Rainbow Connection  (9.5)

Top Groups

1 Artline Designs  (9.3)
2 Booze Design  (9.3)
3 Performers  (9.3)
4 Oxyron  (9.3)
5 Censor Design  (9.3)

Top Crackers

1 Mr. Z  (9.9)
2 OTD  (9.8)
3 Antitrack  (9.8)
4 Fungus  (9.8)
5 S!R  (9.8)

Page generated in: 0.064 sec.