| |
DanPhillips
Registered: Jan 2003 Posts: 39 |
Sparse Bubble Sort
Hey,
Over the holidays I was messing with a "get as many sprites around with a proper multiplexor" background for a boot menu.
Had some success, unrolling and using some illegal opcodes, bit packing some data to reduce the amount of data read from memory and some algorithmic re working.
Final attempt sees 82 sprites bouncing (They all move at 50fps, each allows x,y,msb,color,definition and bounce off the left/right border and scanline 0 and $FD.
<Not sure if this is original, but thought it was worth pointing out)
While looking at the simple bubble sort and doing the obvious unrolling I noticed something about the data being read/re-read.
The project is in DASM, so DASM macro syntax...
The original:
MAC BubbleYSortMacro ;param {1} index, param {2} source offset
LDY YORDER+1+{1}
LDX YORDER+{1}
LDA YTAB+{2},X
CMP YTAB+{2},Y
BCS .NoNeedToSwap
STX YORDER+1+{1}
STY YORDER+{1}
.NoNeedToSwap
ENDM
The YTAB is double buffered.
New macros (probably only need 2, but 3 for added complexity)
MAC BubbleYSortMacro0 ;param 1 index, param 2 source offset
LDY YORDER+1+{1}
LDX YORDER+{1}
LDA YTAB+{2},X
CMP YTAB+{2},Y
BCS .NoNeedToSwap1
STX YORDER+1+{1}
STY YORDER+{1}
.if {1} > 0
LDX YORDER+{1}
LDA YTAB+{2},X
.endif
.NoNeedToSwap1
ENDM
; x needs to be {1}+1 a needs to be YTAB+{2},x
MAC BubbleYSortMacro1 ;param 1 index, param 2 source offset
LDY YORDER+{1}
CMP YTAB+{2},Y
BCC .NoNeedToSwap2
STY YORDER+1+{1}
STX YORDER+{1}
.if {1} > 0
LDY YORDER+{1}
.endif
.NoNeedToSwap2
ENDM
; y needs to be {1}+1
MAC BubbleYSortMacro2 ;param 1 index, param 2 source offset
LDX YORDER+{1}
LDA YTAB+{2},X
CMP YTAB+{2},Y
BCS .NoNeedToSwap3
STX YORDER+1+{1}
STY YORDER+{1}
.if {1} > 0
LDX YORDER+{1}
LDA YTAB+{2},X
.endif
.NoNeedToSwap3
ENDM
You use the 1st macro, then alternate the 2nd and 3rd.
The savings come from not reloading and alternating the comparison.
If no swapping is needed this saves 7 cycles for the 2nd macro and 3 for the 3rd macro.
In my boot menu background with 82 sprites this saves 360 cycles (average of 10 swaps needed per frame)
Amyway, not sure if this is new, but kinda funny looking at the same simple bubble sort for 30 odd years and not spotting this pattern :D
Cheers
Dan |
|
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Hi fellow Phillips.
Your holiday project is certainly the first I've heard of someone actually finishing something that used that insight.
I have the following snippet in some notes I made back in 2017 when I was making a start on a hybrid sorter, but I never completed that implementation.
ldy INDICES+0
ldx INDICES+1
lda SORT_BY,x ; SORT_BY[INDICES[1]]
cmp SORT_BY,y ; SORT_BY[INDICES[0]]
bge next
next
ldy INDICES+2
cmp SORT_BY,y ; SORT_BY[INDICES[2]]
ble next2
next2
ldx INDICES+3
lda SORT_BY,x ; SORT_BY[INDICES[3]]
cmp SORT_BY,y ; SORT_BY[INDICES[2]]
bge next3
next3
ldy INDICES+4
cmp SORT_BY,y ; SORT_BY[INDICES[4]]
bge next4
next4
(It was going to start with a quick ocean pass but switch to something smarter as soon as it needed more than a couple of swaps for any given actor; Falco Paul suggested something similar a while back, but it's unclear whether he ever translated from java to 6502) |
| |
Fungus
Registered: Sep 2002 Posts: 686 |
I've seen before a "look ahead" which checks subsequent y positions and keeps a count if it's more than one, and shifts the table down by the count and then puts the element in the shifted position where it goes. I suppose this is only really an optimization if you are having a non-static amount of sprites though and need to insert/remove an element. |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
why not use the (),y to make the indirect step directly ?
lda (sprindex1),y
cmp (sprindex2),y
bcs noswap
lda sprindex1
ldx sprindex2
sta sprindex2
stx sprindex1 |
| |
DanPhillips
Registered: Jan 2003 Posts: 39 |
Quote: why not use the (),y to make the indirect step directly ?
lda (sprindex1),y
cmp (sprindex2),y
bcs noswap
lda sprindex1
ldx sprindex2
sta sprindex2
stx sprindex1
That would work (Assuming YTable is page aligned)
It would save 1 cycle per 2 sprites if no swaps were needed.
Would be 1 cycle better for even swaps and 3 cycles worse for odd.
But in this particular case I'd need 164 bytes of zero page.
Which would mean either the XSpeed or the YSpeed tables would need to move out of zero page, both of which would add at least a cycle per sprite overhead.
Cheers
Dan |
| |
DanPhillips
Registered: Jan 2003 Posts: 39 |
And coding this up there's an overhead if your YTable is double buffered you need to update the (ZP) pairs, all of them, 3 cycles per sprite.
Cheers
Dan |
| |
DanPhillips
Registered: Jan 2003 Posts: 39 |
Of could use the Y register :)
Cheers
Dan |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
what is xspeed / yspeed table ? I think its enough to be able to adress this way only the Y tables. the per swap/noswap gain is not much sadly. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Speaking of making the swaps faster:
I did wonder about using the Y values to bucket the IDs, then just sorting the list of Y values to avoid all that indirection, but I think the overhead is way higher than any time likely to be saved in the sort routine proper.
nactors* {
ldy sorted_indices+i ;3
lax actorY,y ;4
sta sort_table,y ;5
lda bucket,x ;4
sta bucketnext,y ;5
tya ;2
sta bucket,x ;5 total 28*na, assuming last frame's sort indices are in zp
}
..then sort the sort_table (which just contains Y values), then..
nactors * {
ldy sort_table+i ; 3
ldx bucket,y ; 4
lda bucket_next,x ; 4
sta bucket,y ; 5
sta sort_indices+i ; 3 total 19*na
}
19+28=47 cycles of overhead. Need a lot of savings on the sort to justify that...
Nice that the linked list at each bucket doesn't need to be terminated. Each bucket is mentioned as many times in sort_table as there are entries in that bucket, so the correct number of indices will be pulled out post sort. |
| |
Pararaum
Registered: Sep 2018 Posts: 11 |
You should try the Gnome Sort it is perfect for your task... |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Hah! I've been thinking about trying to optimise a gnome sort (in particular, not putting the pot down while moving it left), but I didn't know it had a name.
Thanks, Pararaum :)
edit: oh wait... I just realised that the difference between gnome sort and insertion sort is that the gnome sort "forgets" the high water mark, and does redundant comparisons on its way back to the next unseen pot. Back to insertion sort it is, probably. |
| |
Oswald
Registered: Apr 2002 Posts: 5094 |
Quote: Hah! I've been thinking about trying to optimise a gnome sort (in particular, not putting the pot down while moving it left), but I didn't know it had a name.
Thanks, Pararaum :)
edit: oh wait... I just realised that the difference between gnome sort and insertion sort is that the gnome sort "forgets" the high water mark, and does redundant comparisons on its way back to the next unseen pot. Back to insertion sort it is, probably.
you can remember where you started and skip those checks. read wiki as I did :) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Yes, but then (as the wikipedia article itself concedes) it's just insertion sort with a different name :P |