[CSDb] - User Forums - Optimizing span filler

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Optimizing span filler

2015-04-15 12:57

Bitbreaker

Registered: Oct 2002
Posts: 508

Optimizing span filler

To prove a quote recently made in another thread by Skid Row wrong, i am going to ask in public :-)
"Average programmers do ask friends or in forums (like Lemon64, Forum64, CSDB Forum...) and experienced programmers... well,they just don't need to ask! ;)"

I had optimized my span filler some time ago and used it on various occasions with different effects. I meanwhile shrank it down by size, but i would love to get a few things even faster, most of all the inner loop that looks like the following:

                * = $0010
fill
;x = x2
;y = y2

                lda #$f8
                sax <f_jmp+1            ;set initially, as it is not set on every turn later on
f_back                                  ;common entry point where all code segments reenter when done
                dey
f_yend          cpy #$00                ;forces carry to be set \o/
                bcc f_end

f_err           lda #$00                ;restore error
f_dx1           sbc #$00                ;do that bresenhamthingy for xend, code will be setup for either flat or steep slope
f_code          bcs +                   ;inx                    ;in case of flat slopes
bcs_start       dex                     ;bcs * - 3              ;
f_dx2           adc #$00
                sta <f_err+1
                lda #$f8
                sax <f_jmp+1            ;update start of span, depending on bit 7 stuff is rendered to buffer 1 or 2 (offset of $80 in the table)
                bne ++                  ;so buttugly, but need to skip
bcs_end
+
                sta <f_err+1            ;save error
++
                lda xstart,y            ;load previously calced x1
                sta <f_msk+1            ;setup mask without tainting X
                arr #$78                ;-> carry is still set, bit 7 always cleared. This way we generate values from $80 .. $bc, a range to which we adopt the memory layout of the row tables
                sta <f_jmp+2            ;update byte of jump responsible to select all code-segments that start with xstart
f_patt          lda patt_0,y            ;fetch pattern
f_msk           and maskl               ;apply mask for left edge
f_jmp           jmp ($1000)             ;do it! \o/
f_end
                rts

Keep in mind that this loop shall not be unrolled or even duplicated, as the entrypoint f_back is fix and used by all speedcode chunks that are entered through the indirect jump. Doing so would mean multiplying the speedcode chunks or giving them a variable entry point by introducing another indirect jump (which wastes another 2 cycles + setup)
The code at f_dx1 (3 bytes is self modifying, depending on if there's a steep or flat slope generated. Patterns are alternating each line, so the additional lookup (lda patt_0,y is somewhat necessary)

One of the many speedcode chunks could look like:

                sta .addr,y             ;write through, smart poly order avoids clashes
                lda (f_patt+1),y        ;refetch pattern, expensive, but at least less than sta patt, lda patt
                sta .addr + $080,y
                sta .addr + $100,y
                sta .addr + $180,y
                sta .addr + $200,y
                sta .addr + $280,y
                sta .addr + $300,y

                and maskr,x             ;right edge
                ora .addr + $380,y      ;need to ora here
                sta .addr + $380,y
                jmp f_back

So any suggestions on how to save further cycles? As you see, there's a few awkward and painful spots, like creating xstart table in a separate step, refetching the pattern again via lda (zp),y, the pattern lookup, the bne++ and the fact that i cannot use any register or facing a register store and load galore that will slow down things to death.
Or is this already the ultimate "optimum hut ab"?

... 10 posts hidden. Click here to view all posts....

2015-04-22 06:48

Bitbreaker

Registered: Oct 2002
Posts: 508

That is btw. what the loop currently looks like:

f_back
f_yend = * + 1  
                cpy #$00
                beq f_end
                dey
;--------RIGHT SLOPE-----------------------------------------
f_err = * + 1   
                lda #$00
f_dy = * + 1
                sbc #$00
f_code          bcs +
bcs_start       dex
f_dx = * + 1
                adc #$00
                sta <f_err
                lda #$f8
                sax <f_jmp+1
                bne ++
bcs_end
+
                sta <f_err
++
;--------LEFT SLOPE-----------------------------------------
f_err2 = * + 1  
                lda #$00
f_dy2 = * + 1   
                sbc #$00
f_code2         bcs +
bcs_start2      dec <f_x2

f_dx2 = * + 1   
                adc #$00
                sta <f_err2
                lda <f_x2
                arr #$78
                sta <f_jmp+2
                bne ++
bcs_end2
+
                sta <f_err2
++
;--------PATTERN SETUP and ---------------------------------
f_patt          lda patt_0,y
f_x2 = * + 1
                and maskl
f_jmp           jmp ($1000)

The overall gain so far due to a easier setup and faster loops is ~5%

Previous - 1 | 2 - Next

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

macx
LordCrass
Guests online: 88

Top Demos

1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.6)
5 Edge of Disgrace  (9.6)
6 What Is The Matrix 2  (9.6)
7 The Demo Coder  (9.6)
8 Uncensored  (9.6)
9 Comaland 100%  (9.6)
10 Wonderland XIV  (9.6)

Top onefile Demos

1 No Listen  (9.6)
2 Layers  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.6)
6 Dawnfall V1.1  (9.5)
7 Rainbow Connection  (9.5)
8 Onscreen 5k  (9.5)
9 Morph  (9.5)
10 Libertongo  (9.5)

Top Groups

1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Triad  (9.3)
5 Censor Design  (9.3)

Top Swappers

1 Derbyshire Ram  (10)
2 Jerry  (9.8)
3 Violator  (9.7)
4 Acidchild  (9.7)
5 Cash  (9.6)

Page generated in: 0.038 sec.