[CSDb] - User Forums - problems w. opening sidebordes+sprites

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > problems w. opening sidebordes+sprites

2011-08-30 22:02

Norrland

Registered: Aug 2011
Posts: 14

problems w. opening sidebordes+sprites

Hi there! First post here, but bare with me, I'm gonna spam you with some questions on semi-newbie level for a while..

I'll start with my problems with opening the sideborder. I'm showing a bitmap picture and the plan is to have 4 sprites in the sideborder, in the middle of the screen, all on the same y-pos. I have managed to make a stable irqroutine and have set $d015 to #$f0 to enable sprites 4-7 (sprites in order, the ones with lowest prio). I've also made sure that sprites 0-3 have totally different y-pos than sprites 4-7.
Later down the screen, at the same position as the sprites, I open the border with dec/inc $d016 (cycle 56), and everything goes well for the normal lines, but on bad lines I'll be 3 cycles late (if I've understood everything right) even though I do dec/inc $d016 right after the last lines dec/inc.
I've tried to follow Christian Bauers (+codebase, c-hacking, posts here and others) texts about the vic, rastertiming and opening borders, but I don't understand what I do wrong. In my code I'll do 20 nops between the dec/inc, which in my head equals to 52 cycles including dec/inc, and if that is right, suggests that the vic uses 11 cycles (63-52=11) (2/sprite and 3 for BA signal??) for fetching spritedata for 4 sprites on non-badlines.
If my calculations are right, I don't understand why I have problems on badlines, were I should have 23 cycles(?). Even if I skip one sprite, I'm still one cycle late, and I've read that 4 sprites should be possible..

The code inside irq, and screenshot showing my timing:

dec $d021 ;row1
inc $d021

.byte $ea, $ea, $ea, $ea, $ea, $ea, $ea ;20 st (40 cycles)
.byte $ea, $ea, $ea, $ea, $ea, $ea, $ea
.byte $ea, $ea, $ea, $ea, $ea, $ea

dec $d021 ;row2 6 cycles
inc $d021 ; 6 cycles,tot 52

.byte $ea, $ea, $ea, $ea, $ea, $ea, $ea ;20 st (40 cycles)
.byte $ea, $ea, $ea, $ea, $ea, $ea, $ea
.byte $ea, $ea, $ea, $ea, $ea, $ea

dec $d021 ;row3 6 cycles
inc $d021 ; 6 cycles,tot 52

dec $d020 ;row 4 BADLINE hell breaks loose... (nåja)
inc $d020

http://i1120.photobucket.com/albums/l491/lordborak/kastabort.png

Is my understanding right? Do I need to consider/setup anything else than described above?

2011-08-30 23:17

TWW

Registered: Jul 2009
Posts: 545

sty $d016
sta $d016,x
sty $d016
sta $d016

Just those few cycles less you need i recon :-)

2011-08-31 06:21

Mr. SID

Registered: Jan 2003
Posts: 424

What TWW said.
Also a little tip for timing. Put this in your code somewhere:

delay64:	nop
delay62:	nop
delay60:	nop
delay58:	nop
delay56:	nop
delay54:	nop
delay52:	nop
delay50:	nop
delay48:	nop
delay46:	nop
delay44:	nop
delay42:	nop
delay40:	nop
delay38:	nop
delay36:	nop
delay34:	nop
delay32:	nop			
delay30:	nop
delay28:	nop
delay26:	nop
delay24:	nop
delay22:	nop
delay20:	nop
delay18:	nop
delay16:	nop
delay14:	nop
delay12:	rts


delay63:	nop
delay61:	nop
delay59:	nop
delay57:	nop
delay55:	nop
delay53:	nop
delay51:	nop
delay49:	nop
delay47:	nop
delay45:	nop
delay43:	nop
delay41:	nop
delay39:	nop
delay37:	nop
delay35:	nop			
delay33:	nop
delay31:	nop
delay29:	nop
delay27:	nop
delay25:	nop
delay23:	nop
delay21:	nop
delay19:	nop
delay17:	nop
delay15:	.byte $04, $00	; NOOP $00 = 3 cycles
		rts

Then just do a jsr delay40 in your code instead of putting 20 nops into it. That way you'll make sure that changing your timing delays is not going to push your code around. Otherwise a branch might end up crossing a page border and you'll lose a cycle somewhere which usually takes a long time to find.

2011-08-31 08:01

MagerValp

Registered: Dec 2001
Posts: 1078

Quoting Mr. SID

Otherwise a branch might end up crossing a page border and you'll lose a cycle somewhere which usually takes a long time to find.

.assert >* = >delay, error, "delay loop crosses page boundary"

Though I wish ca65 had the ability to warn on branches across a page boundary.

2011-08-31 12:50

TWW

Registered: Jul 2009
Posts: 545

OR you could be cool and use Kickass and the following Pseudo-Commands (Unless RAM is hurting):

irq2:
    :IRQ_LeadIn #1
    ldx #$07
    dex
    bne *-1

    bit $ffff
    nop
    ldx #$00
d16:
    ldy #$00
    :SET_GRAPHICS_BANKS($4400,$5800)
    lda #$1b
    sta $d011
    
    lda #$0f

    sty x
    sta x,x  //32
    sty x
    sta x    //33
    :timer1  //34
    :timer1  //35
    :timer1  //36
    :timer1  //37
    :timer1  //38
    :timer1  //39
    :timer2  //3a & 3b
    :timer1  //3c
    :timer1  //3d
    :timer1  //3e
    :timer1  //3f
    :timer1  //40
    :timer1  //41
    :timer2  //42 & 43
    :timer1  //44
    :timer1  //45
    :timer1  //46
    ldx #106+21
    stx $d009
    stx $d00b
    stx $d00d
    stx $d00f
    ldx #$41
    stx $47fc
    ldx #$45
    stx $47fd
    ldx #$50
    stx $47fe
    ldx #$50
    stx $47ff
    ldx #$00    
    sty x
    sta x
    :timer1
    :timer1
    :timer2
    :timer1
    :timer1
    :timer1
    :timer1
    :timer1
    :timer1
    :timer2
    :timer1
    :timer1
    :timer1
    :timer1
    :timer1
    :timer1
    :timer2
    ldx #106+42
    stx $d009
    stx $d00b
    stx $d00d
    stx $d00f
    ldx #$42
    stx $47fc
    ldx #$46
    stx $47fd
    ldx #$50
    stx $47fe
    ldx #$50
    stx $47ff
    ldx #$00    
    sty x
    sta x
    :timer1
    :timer1
    :timer1
    :timer1
    :timer1



.pseudocommand timer1 {

    bit $ffff
    bit $ffff
    bit $ffff
    bit $ffff
    bit $ffff
    bit $ffff
    bit $ffff
    bit $ffff
    bit $ffff
    bit $ffff
    bit $ffff
    
    sty x
    sta x
}
.pseudocommand timer2 {
    bit $ffff
    bit $ffff
    bit $ffff
    bit $ffff
    bit $ffff
    bit $ffff
    bit $ffff
    bit $ffff
    bit $ffff
    bit $ffff
    bit $ffff
    sty x
    sta x,x
    sty x
    sta x

}

This should open 6 chars with sprites with the initial raster triggering before the 1st badline (adjust timing as necessary!) Here you also can cotrol $d016 HW scrolling aswell.

Cheers. Buy me a beer anytime!

2011-08-31 13:28

Mr. SID

Registered: Jan 2003
Posts: 424

I think this is cooler:

#pybegin
badline_offset = 2
for i in range(16):
    if i%8 == badline_offset:
        # bad line
        print "sta $d016,y"
    else:
        print "jsr delay44"
        print "sta $d016"
    print "stx $d016"
#pyend

2011-08-31 14:22

JackAsser

Registered: Jun 2002
Posts: 2014

Using 3 bytes to waste 4 cycles is just wrong, when you can use 2 bytes to waste 7 cycles. :) (bit $ffff, vs pha+pla)

2011-08-31 15:05

Frantic

Registered: Mar 2003
Posts: 1648

Yes, it is important to waste cycles with style.

2011-08-31 16:14

TWW

Registered: Jul 2009
Posts: 545

Alright, then this is the coolest :-)

    :open_border (2,16)

.macro open_border(badline_offset,total_lines) {
  .for (i=0;i<total_lines;i++) {
    jsr delay44
    .if i&7 == badline_offset:
      stx $d016
      sta $d016,y
    }
    stx $d016
    sta $d016
  }
}

Delay44:
    .for (i=0;i<4;i++) { pha pla }
    nop
    nop
    rts

+ it wastes cycles with style^^

ps. Untested and might be bugged!

2011-08-31 16:17

Mr. SID

Registered: Jan 2003
Posts: 424

Now we're getting somewhere... ;)

2011-08-31 19:44

TWW

Registered: Jul 2009
Posts: 545

Minor Bugfix 8-D

.var i = 0

    :open_border(2,16)

.macro open_border(badline_offset,total_lines) {
  .for (i=0;i<total_lines;i++) {
    jsr Delay44
    stx $d016
    .if([i&7] == badline_offset) {
      sta $d016,y
      stx $d016
    }
    sta $d016
  }
}

Delay44:
    .for (i=0;i<4;i++) { pha pla }
    nop
    nop
    rts

Looks ok when compiled. Not done any real test though so might need some cycle adjustment if my math is off.

2011-08-31 19:50

Norrland

Registered: Aug 2011
Posts: 14

Damn it!! I knew I did something wrong, I just didn't realize that I wasn't wasting the cycles with enough style... I probably become better on that in the future, but meanwhile, I'll stick to the NOPs...

Big thanks for the solution, works great! And I'll be using your tip with "delaytable" in the future when in need of careful timing.

You talk about beer, I want beer, you write crazy opening- sideborder-routines, my program works, everyone is happy!, but I still don't _understand_ why I don't have enough cycles left on badlines, 23 cycles should be enough for spritefetching & dec/inc? Does someone have a suggestion?

2011-08-31 20:13

Radiant

Registered: Sep 2004
Posts: 639

H Macaroni: Well, how many cycles you have available on a badline depend on what your code looks like, especially if you also have sprites enabled. :-) The key is the BA line on the VIC-II; BA turns low three cycles before the VIC needs exclusive memory access. If the CPU executes a read cycle while BA is low it stalls and AEC is set to low until the VIC is finished with its business, where the CPU will continue from where it was. Therefore you have 20-23 cycles available on a badline with no sprites, depending on your code.

2011-08-31 20:21

MagerValp

Registered: Dec 2001
Posts: 1078

delay44:
        nop
        jsr :+
:       jsr :+
:       rts

2011-08-31 21:03

Norrland

Registered: Aug 2011
Posts: 14

radiantx: thx

2011-09-01 06:48

JackAsser

Registered: Jun 2002
Posts: 2014

Quote: delay44: nop jsr :+ : jsr :+ : rts

Cool! Removed my post regarding how to calc when I realized my mistake...

2011-09-01 08:11

Radiant

Registered: Sep 2004
Posts: 639

MagerValp: Mind = blown

2011-09-01 08:33

Frantic

Registered: Mar 2003
Posts: 1648

@Magervalp: Nice, yes! Stylish, yes! I think I have seen this kind of delay code before somewhere though. Am I right? Is it in some routine in C=Hacking or something?

Anyway.. I don't think your particular little routine is correct. :) Unless you call the routine with a JSR, there will be one RTS too much and you'll end up in stack space when RTS'ing to Eternia. ...but if you DO call the routine with a JSR it will waste a total of 44+6 cycles (with the initial JSR to the routine included) rather than 44 cycles. ...or did I miss something? Maybe this is actually how it was supposed to be?

Not as elegant, but unless I did some mistake, this routine should provide a relatively stylish wasting of 44 cycles (with the initial 6 cycles of the JSR to the waste44 routine included):

		[SOME CODE HERE]
		jsr waste44
		[SOME CODE HERE]


waste44:	;the routine is 9 bytes
		nop
		jsr :+
		jsr :+
:		nop
		rts

2011-09-01 09:58

WVL

Registered: Mar 2002
Posts: 902

check!

6 jsr delay44

2 nop
6 jsr 1
2 nop
6 rts

6 jsr 2
2 nop
6 rts

2 nop
6 rts

total : 44

2011-09-01 11:31

MagerValp

Registered: Dec 2001
Posts: 1078

Yes, it depends on if you count 6 cycles of the calling jsr or not. I couldn't be arsed to count the cycles of the previous examples :P

Don't know if I found it somewhere or if I thought of it myself.

2011-09-01 14:35

TWW

Registered: Jul 2009
Posts: 545

Dealy44:
    lda #%00010000
    sec
    ror
    // PAGEBREAK HERE
    bcc *-1
    rts  // A exits with #8 and done in 7 bytes.

Another variant with 1 byte less ;-)

2011-09-01 15:41

Oswald

Registered: Apr 2002
Posts: 5094

you are making it overcomplicated. 1 byte less and primitive:

$10=5

6 jsr delay44

delay44
3 ldx $10
2 dex     ;
3 bpl *-3 ;x5=25 cycles +when exits 4 cycles -> 29
6 rts

2011-09-01 15:49

JackAsser

Registered: Jun 2002
Posts: 2014

Quote: you are making it overcomplicated. 1 byte less and primitive:

$10=5 6 jsr delay44 delay44 3 ldx $10 2 dex ; 3 bpl *-3 ;x5=25 cycles +when exits 4 cycles -> 29 6 rts

Yes, but you destroys teh regiztorz! == CHEAT! :D

2011-09-01 15:57

Oswald

Registered: Apr 2002
Posts: 5094

oh, didnt think about that :)

2011-09-01 18:13

TWW

Registered: Jul 2009
Posts: 545

plus you need to set $10 which will set you back 4 bytes ;-)

Trust me I gave it some thought^^

2011-09-01 18:17

Mace

Registered: May 2002
Posts: 1799

CYC #$2c

Super illegal opcode.

2011-09-01 21:25

Slammer

Registered: Feb 2004
Posts: 416

:pause #$2c (From codebase, a bit down on the page)
Just put in your own optimizations.

2011-09-02 04:57

Oswald

Registered: Apr 2002
Posts: 5094

Quote: plus you need to set $10 which will set you back 4 bytes ;-)

Trust me I gave it some thought^^

yeah, thats why you destroy A, plus $10 doesnt needs to be set each time, so in the long run it uses less mem anyway :)

2011-09-02 05:43

Martin Piper

Registered: Nov 2007
Posts: 722

I've often thought some kind of external tool that generates ASM would be good here. You tell the tool what registers/memory you want set to what value and the raster and cycle it should be set on. Then it figures out the necessary most optimal timed code including bad lines, sprite DMAs etc.

2011-09-02 06:23

Perplex

Registered: Feb 2009
Posts: 255

If you write a tool in some high level language to generate the timed code for you, better make it waste cycles in between the timing critical instructions by interleaving it with other useful code instead of just nops and the like.

2011-09-02 06:37

JackAsser

Registered: Jun 2002
Posts: 2014

Quote: If you write a tool in some high level language to generate the timed code for you, better make it waste cycles in between the timing critical instructions by interleaving it with other useful code instead of just nops and the like.

That is exactly what the code multiplexer in S:T Lars Meeting III - Invite does. It has two code snippets, one that updates the 4x4-effect using A,X and Y registers. And one that opens the border. The multiplexer then replaces all NOPs in the timing critical code with the code from the 4x4-updater, keeping track of A,X and Y usage etc.

2011-09-02 11:47

TWW

Registered: Jul 2009
Posts: 545

Quote: yeah, thats why you destroy A, plus $10 doesnt needs to be set each time, so in the long run it uses less mem anyway :)

A was 8 upon entry and is 8 upon exit.

lda #$08 <----- see, 8!
ldy #$00
OPEN DA BOOORDEEEERRZZZZZZZ!!!!!
Still 8 here!

You however, destroy X and add bytes ;-)

Come to think of it, one could probably use X as a counter to shorten this even further...

<-entrypoint (X set with badline offset $02 set with quantities of linesX8)
!: jsr delayXX
sta $d016
sty $d016
dex
bne !-
jsr delayXX-1
sta $d016
sty $d016
sta $d016,x
sty $d016
ldx #$07
jsr delayZZ
dec $02
bne !-+3
rts

Then again this is straig out of my ass^^

2011-09-02 12:59

Oswald

Registered: Apr 2002
Posts: 5094

cool, then I win by simply modifying my code to:

$10=6

delay44
3 ldx $10
2 dex ;
3 bne *-3 ;x5=25 cycles +when exits 4 cycles -> 29
6 rts

x needs to be 0, and my x wil be 0 :)

edit: you also need to be on a pagebreak, which is not a nice thing either :)

2011-09-02 13:50

Frantic

Registered: Mar 2003
Posts: 1648

Quote: cool, then I win by simply modifying my code to:

$10=6

delay44
3 ldx $10
2 dex ;
3 bne *-3 ;x5=25 cycles +when exits 4 cycles -> 29
6 rts

x needs to be 0, and my x wil be 0 :)

edit: you also need to be on a pagebreak, which is not a nice thing either :)

...but is it stylish?

2011-09-02 15:36

Oswald

Registered: Apr 2002
Posts: 5094

it does achive what it was intended to: do it in less bytes. changing the goal later, isn't that unfair ? :)

2011-09-02 17:10

TWW

Registered: Jul 2009
Posts: 545

Fair enough. Pagebreak isn't cool, but a still somewhat if someone rips your code and don't think about it (hehe).

You still need to do: sta/y/x $10 "somewhere" to ensure $10 = 6. Thus adding 2 bytes (maybee 2 more unless you have a state of 6 in one of your regs) and yielding a total consumption of 8 bytes vs. my 7 bytes.

I guess you would use 3 cycles more overall due to the sta/sty/stx $10 aswell ;-D

Refresh

Subscribe to this thread: