Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > Sparse Bubble Sort
2020-01-06 04:52
DanPhillips

Registered: Jan 2003
Posts: 39
Sparse Bubble Sort

Hey,

Over the holidays I was messing with a "get as many sprites around with a proper multiplexor" background for a boot menu.

Had some success, unrolling and using some illegal opcodes, bit packing some data to reduce the amount of data read from memory and some algorithmic re working.

Final attempt sees 82 sprites bouncing (They all move at 50fps, each allows x,y,msb,color,definition and bounce off the left/right border and scanline 0 and $FD.

<Not sure if this is original, but thought it was worth pointing out)
While looking at the simple bubble sort and doing the obvious unrolling I noticed something about the data being read/re-read.

The project is in DASM, so DASM macro syntax...

The original:

MAC BubbleYSortMacro ;param {1} index, param {2} source offset
LDY YORDER+1+{1}
LDX YORDER+{1}
LDA YTAB+{2},X
CMP YTAB+{2},Y
BCS .NoNeedToSwap
STX YORDER+1+{1}
STY YORDER+{1}
.NoNeedToSwap
ENDM

The YTAB is double buffered.

New macros (probably only need 2, but 3 for added complexity)

MAC BubbleYSortMacro0 ;param 1 index, param 2 source offset
LDY YORDER+1+{1}
LDX YORDER+{1}
LDA YTAB+{2},X
CMP YTAB+{2},Y
BCS .NoNeedToSwap1
STX YORDER+1+{1}
STY YORDER+{1}
.if {1} > 0
LDX YORDER+{1}
LDA YTAB+{2},X
.endif
.NoNeedToSwap1
ENDM

; x needs to be {1}+1 a needs to be YTAB+{2},x
MAC BubbleYSortMacro1 ;param 1 index, param 2 source offset
LDY YORDER+{1}
CMP YTAB+{2},Y
BCC .NoNeedToSwap2
STY YORDER+1+{1}
STX YORDER+{1}
.if {1} > 0
LDY YORDER+{1}
.endif
.NoNeedToSwap2
ENDM

; y needs to be {1}+1
MAC BubbleYSortMacro2 ;param 1 index, param 2 source offset
LDX YORDER+{1}
LDA YTAB+{2},X
CMP YTAB+{2},Y
BCS .NoNeedToSwap3
STX YORDER+1+{1}
STY YORDER+{1}
.if {1} > 0
LDX YORDER+{1}
LDA YTAB+{2},X
.endif
.NoNeedToSwap3
ENDM

You use the 1st macro, then alternate the 2nd and 3rd.

The savings come from not reloading and alternating the comparison.
If no swapping is needed this saves 7 cycles for the 2nd macro and 3 for the 3rd macro.

In my boot menu background with 82 sprites this saves 360 cycles (average of 10 swaps needed per frame)

Amyway, not sure if this is new, but kinda funny looking at the same simple bubble sort for 30 odd years and not spotting this pattern :D

Cheers

Dan
 
... 2 posts hidden. Click here to view all posts....
 
2020-01-07 13:09
Oswald

Registered: Apr 2002
Posts: 5094
why not use the (),y to make the indirect step directly ?

lda (sprindex1),y
cmp (sprindex2),y
bcs noswap

lda sprindex1
ldx sprindex2
sta sprindex2
stx sprindex1
2020-01-07 16:48
DanPhillips

Registered: Jan 2003
Posts: 39
Quote: why not use the (),y to make the indirect step directly ?

lda (sprindex1),y
cmp (sprindex2),y
bcs noswap

lda sprindex1
ldx sprindex2
sta sprindex2
stx sprindex1


That would work (Assuming YTable is page aligned)

It would save 1 cycle per 2 sprites if no swaps were needed.
Would be 1 cycle better for even swaps and 3 cycles worse for odd.

But in this particular case I'd need 164 bytes of zero page.
Which would mean either the XSpeed or the YSpeed tables would need to move out of zero page, both of which would add at least a cycle per sprite overhead.


Cheers

Dan
2020-01-08 01:17
DanPhillips

Registered: Jan 2003
Posts: 39
And coding this up there's an overhead if your YTable is double buffered you need to update the (ZP) pairs, all of them, 3 cycles per sprite.

Cheers

Dan
2020-01-08 01:37
DanPhillips

Registered: Jan 2003
Posts: 39
Of could use the Y register :)

Cheers

Dan
2020-01-08 08:06
Oswald

Registered: Apr 2002
Posts: 5094
what is xspeed / yspeed table ? I think its enough to be able to adress this way only the Y tables. the per swap/noswap gain is not much sadly.
2020-01-08 09:09
ChristopherJam

Registered: Aug 2004
Posts: 1409
Speaking of making the swaps faster:

I did wonder about using the Y values to bucket the IDs, then just sorting the list of Y values to avoid all that indirection, but I think the overhead is way higher than any time likely to be saved in the sort routine proper.

nactors* {
    ldy sorted_indices+i ;3
    lax actorY,y         ;4
    sta sort_table,y     ;5
    lda bucket,x         ;4
    sta bucketnext,y     ;5
    tya                  ;2
    sta bucket,x         ;5   total 28*na, assuming last frame's sort indices are in zp
}
..then sort the sort_table (which just contains Y values), then..
nactors * {
    ldy sort_table+i    ; 3
    ldx bucket,y        ; 4
    lda bucket_next,x   ; 4
    sta bucket,y        ; 5
    sta sort_indices+i  ; 3  total 19*na
}

19+28=47 cycles of overhead. Need a lot of savings on the sort to justify that...

Nice that the linked list at each bucket doesn't need to be terminated. Each bucket is mentioned as many times in sort_table as there are entries in that bucket, so the correct number of indices will be pulled out post sort.
2020-01-08 21:28
Pararaum

Registered: Sep 2018
Posts: 11
You should try the Gnome Sort it is perfect for your task...
2020-01-09 04:27
ChristopherJam

Registered: Aug 2004
Posts: 1409
Hah! I've been thinking about trying to optimise a gnome sort (in particular, not putting the pot down while moving it left), but I didn't know it had a name.

Thanks, Pararaum :)


edit: oh wait... I just realised that the difference between gnome sort and insertion sort is that the gnome sort "forgets" the high water mark, and does redundant comparisons on its way back to the next unseen pot. Back to insertion sort it is, probably.
2020-01-09 06:46
Oswald

Registered: Apr 2002
Posts: 5094
Quote: Hah! I've been thinking about trying to optimise a gnome sort (in particular, not putting the pot down while moving it left), but I didn't know it had a name.

Thanks, Pararaum :)


edit: oh wait... I just realised that the difference between gnome sort and insertion sort is that the gnome sort "forgets" the high water mark, and does redundant comparisons on its way back to the next unseen pot. Back to insertion sort it is, probably.


you can remember where you started and skip those checks. read wiki as I did :)
2020-01-09 09:26
ChristopherJam

Registered: Aug 2004
Posts: 1409
Yes, but then (as the wikipedia article itself concedes) it's just insertion sort with a different name :P
Previous - 1 | 2 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
Dr. Doom/RAD
DanPhillips
Guests online: 79
Top Demos
1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.6)
5 Edge of Disgrace  (9.6)
6 What Is The Matrix 2  (9.6)
7 The Demo Coder  (9.6)
8 Uncensored  (9.6)
9 Comaland 100%  (9.6)
10 Wonderland XIV  (9.6)
Top onefile Demos
1 Layers  (9.6)
2 No Listen  (9.6)
3 Party Elk 2  (9.6)
4 Cubic Dream  (9.6)
5 Copper Booze  (9.6)
6 Rainbow Connection  (9.5)
7 Dawnfall V1.1  (9.5)
8 Onscreen 5k  (9.5)
9 Morph  (9.5)
10 Libertongo  (9.5)
Top Groups
1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Triad  (9.3)
5 Censor Design  (9.3)
Top Crackers
1 Mr. Z  (9.9)
2 Antitrack  (9.8)
3 OTD  (9.8)
4 Fungus  (9.8)
5 S!R  (9.8)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.052 sec.