[CSDb] - User Forums - Sparse Bubble Sort

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Sparse Bubble Sort

2020-01-06 04:52

DanPhillips

Registered: Jan 2003
Posts: 30

Sparse Bubble Sort

Hey,

Over the holidays I was messing with a "get as many sprites around with a proper multiplexor" background for a boot menu.

Had some success, unrolling and using some illegal opcodes, bit packing some data to reduce the amount of data read from memory and some algorithmic re working.

Final attempt sees 82 sprites bouncing (They all move at 50fps, each allows x,y,msb,color,definition and bounce off the left/right border and scanline 0 and $FD.

<Not sure if this is original, but thought it was worth pointing out)
While looking at the simple bubble sort and doing the obvious unrolling I noticed something about the data being read/re-read.

The project is in DASM, so DASM macro syntax...

The original:

MAC BubbleYSortMacro ;param {1} index, param {2} source offset
LDY YORDER+1+{1}
LDX YORDER+{1}
LDA YTAB+{2},X
CMP YTAB+{2},Y
BCS .NoNeedToSwap
STX YORDER+1+{1}
STY YORDER+{1}
.NoNeedToSwap
ENDM

The YTAB is double buffered.

New macros (probably only need 2, but 3 for added complexity)

MAC BubbleYSortMacro0 ;param 1 index, param 2 source offset
LDY YORDER+1+{1}
LDX YORDER+{1}
LDA YTAB+{2},X
CMP YTAB+{2},Y
BCS .NoNeedToSwap1
STX YORDER+1+{1}
STY YORDER+{1}
.if {1} > 0
LDX YORDER+{1}
LDA YTAB+{2},X
.endif
.NoNeedToSwap1
ENDM

; x needs to be {1}+1 a needs to be YTAB+{2},x
MAC BubbleYSortMacro1 ;param 1 index, param 2 source offset
LDY YORDER+{1}
CMP YTAB+{2},Y
BCC .NoNeedToSwap2
STY YORDER+1+{1}
STX YORDER+{1}
.if {1} > 0
LDY YORDER+{1}
.endif
.NoNeedToSwap2
ENDM

; y needs to be {1}+1
MAC BubbleYSortMacro2 ;param 1 index, param 2 source offset
LDX YORDER+{1}
LDA YTAB+{2},X
CMP YTAB+{2},Y
BCS .NoNeedToSwap3
STX YORDER+1+{1}
STY YORDER+{1}
.if {1} > 0
LDX YORDER+{1}
LDA YTAB+{2},X
.endif
.NoNeedToSwap3
ENDM

You use the 1st macro, then alternate the 2nd and 3rd.

The savings come from not reloading and alternating the comparison.
If no swapping is needed this saves 7 cycles for the 2nd macro and 3 for the 3rd macro.

In my boot menu background with 82 sprites this saves 360 cycles (average of 10 swaps needed per frame)

Amyway, not sure if this is new, but kinda funny looking at the same simple bubble sort for 30 odd years and not spotting this pattern :D

Cheers

Dan

2020-01-07 04:23

ChristopherJam

Registered: Aug 2004
Posts: 1378

Hi fellow Phillips.

Your holiday project is certainly the first I've heard of someone actually finishing something that used that insight.

I have the following snippet in some notes I made back in 2017 when I was making a start on a hybrid sorter, but I never completed that implementation.

    ldy INDICES+0

    ldx INDICES+1
    lda SORT_BY,x     ; SORT_BY[INDICES[1]]
    cmp SORT_BY,y     ; SORT_BY[INDICES[0]]
    bge next

next
    ldy INDICES+2
    cmp SORT_BY,y     ; SORT_BY[INDICES[2]]
    ble next2

next2
    ldx INDICES+3
    lda SORT_BY,x     ; SORT_BY[INDICES[3]]
    cmp SORT_BY,y     ; SORT_BY[INDICES[2]]
    bge next3

next3
    ldy INDICES+4
    cmp SORT_BY,y     ; SORT_BY[INDICES[4]]
    bge next4

next4

(It was going to start with a quick ocean pass but switch to something smarter as soon as it needed more than a couple of swaps for any given actor; Falco Paul suggested something similar a while back, but it's unclear whether he ever translated from java to 6502)

2020-01-07 08:01

Fungus

Registered: Sep 2002
Posts: 616

I've seen before a "look ahead" which checks subsequent y positions and keeps a count if it's more than one, and shifts the table down by the count and then puts the element in the shifted position where it goes. I suppose this is only really an optimization if you are having a non-static amount of sprites though and need to insert/remove an element.

2020-01-07 13:09

Oswald

Registered: Apr 2002
Posts: 5017

why not use the (),y to make the indirect step directly ?

lda (sprindex1),y
cmp (sprindex2),y
bcs noswap

lda sprindex1
ldx sprindex2
sta sprindex2
stx sprindex1

2020-01-07 16:48

DanPhillips

Registered: Jan 2003
Posts: 30

Quote: why not use the (),y to make the indirect step directly ?

lda (sprindex1),y
cmp (sprindex2),y
bcs noswap

lda sprindex1
ldx sprindex2
sta sprindex2
stx sprindex1

That would work (Assuming YTable is page aligned)

It would save 1 cycle per 2 sprites if no swaps were needed.
Would be 1 cycle better for even swaps and 3 cycles worse for odd.

But in this particular case I'd need 164 bytes of zero page.
Which would mean either the XSpeed or the YSpeed tables would need to move out of zero page, both of which would add at least a cycle per sprite overhead.

Cheers

Dan

2020-01-08 01:17

DanPhillips

Registered: Jan 2003
Posts: 30

And coding this up there's an overhead if your YTable is double buffered you need to update the (ZP) pairs, all of them, 3 cycles per sprite.

Cheers

Dan

2020-01-08 01:37

DanPhillips

Registered: Jan 2003
Posts: 30

Of could use the Y register :)

Cheers

Dan

2020-01-08 08:06

Oswald

Registered: Apr 2002
Posts: 5017

what is xspeed / yspeed table ? I think its enough to be able to adress this way only the Y tables. the per swap/noswap gain is not much sadly.

2020-01-08 09:09

ChristopherJam

Registered: Aug 2004
Posts: 1378

Speaking of making the swaps faster:

I did wonder about using the Y values to bucket the IDs, then just sorting the list of Y values to avoid all that indirection, but I think the overhead is way higher than any time likely to be saved in the sort routine proper.

nactors* {
    ldy sorted_indices+i ;3
    lax actorY,y         ;4
    sta sort_table,y     ;5
    lda bucket,x         ;4
    sta bucketnext,y     ;5
    tya                  ;2
    sta bucket,x         ;5   total 28*na, assuming last frame's sort indices are in zp
}

..then sort the sort_table (which just contains Y values), then..

nactors * {
    ldy sort_table+i    ; 3
    ldx bucket,y        ; 4
    lda bucket_next,x   ; 4
    sta bucket,y        ; 5
    sta sort_indices+i  ; 3  total 19*na
}

19+28=47 cycles of overhead. Need a lot of savings on the sort to justify that...

Nice that the linked list at each bucket doesn't need to be terminated. Each bucket is mentioned as many times in sort_table as there are entries in that bucket, so the correct number of indices will be pulled out post sort.

2020-01-08 21:28

Pararaum

Registered: Sep 2018
Posts: 11

You should try the Gnome Sort it is perfect for your task...

2020-01-09 04:27

ChristopherJam

Registered: Aug 2004
Posts: 1378

Hah! I've been thinking about trying to optimise a gnome sort (in particular, not putting the pot down while moving it left), but I didn't know it had a name.

Thanks, Pararaum :)

edit: oh wait... I just realised that the difference between gnome sort and insertion sort is that the gnome sort "forgets" the high water mark, and does redundant comparisons on its way back to the next unseen pot. Back to insertion sort it is, probably.

2020-01-09 06:46

Oswald

Registered: Apr 2002
Posts: 5017

Quote: Hah! I've been thinking about trying to optimise a gnome sort (in particular, not putting the pot down while moving it left), but I didn't know it had a name.

Thanks, Pararaum :)

edit: oh wait... I just realised that the difference between gnome sort and insertion sort is that the gnome sort "forgets" the high water mark, and does redundant comparisons on its way back to the next unseen pot. Back to insertion sort it is, probably.

you can remember where you started and skip those checks. read wiki as I did :)

2020-01-09 09:26

ChristopherJam

Registered: Aug 2004
Posts: 1378

Yes, but then (as the wikipedia article itself concedes) it's just insertion sort with a different name :P

Refresh

Subscribe to this thread: