Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > Screen copy CPU intensive
2007-09-21 20:17
johncl

Registered: Aug 2007
Posts: 37
Screen copy CPU intensive

I am working on a little game for the C64 and have stumbled upon a slight problem. A smooth scroller is darn CPU intensive! Its a simple sidescroller where I want to scroll the upper 18 lines. The copy required to move the character and color ram is very expensive! I use kickassembler and at first I tried this:

.macro ScrollLines(screen,from,to) {
.var half = round([to-from]/2)
ldx #0
jmp !loop+
.align $100
!loop:
.for (var i=from;i<from+half;i++) {
lda screen+1+[i*40],x
sta screen+[i*40],x
lda SCREEN_COLOR+1+[i*40],x
sta SCREEN_COLOR+[i*40],x
}
inx
cpx #39
bne !loop-
ldx #0
jmp !loop+
.align $100
!loop:
.for (var i=from+half;i<=to;i++) {
lda screen+1+[i*40],x
sta screen+[i*40],x
lda SCREEN_COLOR+1+[i*40],x
sta SCREEN_COLOR+[i*40],x
}
inx
cpx #39
bne !loop-
}

So here I unroll a full column copy in two parts (because bne wont work if I unroll whole column). This even aligns the two loops so that the bne doesnt cross any page boundaries and end up costing another cycle. But still, this routine eats up my CPU big time.

I then tried a complete unroll of a pure copy:

.macro ScrollLines2(screen,from,to) {
.for (var i=from*40;i<=to*40;i++) {
lda screen+1+i
sta screen+i
lda SCREEN_COLOR+1+i
sta SCREEN_COLOR+i
}
}

This also copies the unecessary column too that I need to copy from my map data. But this complete unroll uses almost 2/3 of the screen raster time! Even with this I have to split up my copy so the top part is copied at the bottom redraw of the refresh and the bottom part on start of next screen refresh.

Am I doing something wrong here?
 
... 25 posts hidden. Click here to view all posts....
 
2007-09-22 09:46
JackAsser

Registered: Jun 2002
Posts: 2014
Quote: Thanks for the tips. I managed to make it work in my game with 17 lines scrolling by splitting the copy in two, top half at the bottom of the raster redraw and the bottom half in the beginning of next screen. But it eats a lot of the available CPU time indeed.

Just before I read the answers here I also thought about what you have written here, to split up the copy in two halves and do double buffering by being a scroll screen ahead in time (depensing on direction player is moving). Since I'll maximum do 4 pixel scrolls (more likely 2-3 pixels) I should be fine.

Oswald, thanks for the code reference too, I must study that in detail to understand it as I am not very educated in the tricks that are used here. Does it do anything special to increase the number of available cycles to the CPU for the copy?


The trick is called VSP. Both VSP and normal double buffered scroll needs a double buffer.

But for normal scrolling you have 8 pixels to complete the other buffer, that is, you need to paste 1000/8 chars every frame (assuming 1 pixel / frame scrolling).

With VSP you only need to update one column per 8 pixels. So you only need to paste 25/8 chars. This has to be done twice because you must update the other screen which will replace the current one as soon as you have scrolled 40 chars. So all in all 25/8*2 chars per frame.

STILL! None of these techniques eliminates the update of the $d800 area. At some point all 1000 $d800 nybbles has to be updated in just one frame.

If you code for C128 you may double buffer $d800 aswell though, significantly increase the scrolling speed.

If you wanna learn more about VSP, read about the bad lines and the VSP trick in general in the VIC-article. It's quite good explained there and not as mysterious as many tries to make it.

/Andreas
2007-09-22 11:50
Graham
Account closed

Registered: Dec 2002
Posts: 990
VSP is a bad idea since it crashes on 50% of the machines.
2007-09-22 12:03
JackAsser

Registered: Jun 2002
Posts: 2014
Quote: VSP is a bad idea since it crashes on 50% of the machines.

@Graham: U have to really beleive in it otherwise VSP will never work. It's like it knows u.
2007-09-22 13:45
doynax
Account closed

Registered: Oct 2004
Posts: 212
The standard approach for games is to define color attributes in large blocks instead of on a per-tile basis. So if you've got 4x4 blocks of equal color values for instance then you'd only have to update every fourth column when shifting the screen horizontally. Double buffered video matrix scrolling is important as well so you can split the work across several frames.

There are certain alternative hardware tricks too with better compatibility than VSP. I hacked together a charmode with 16-pixel tall tiles for a game project, which then only requires half as much effort to scroll.

All things considered though efficient scrolling is damned hard on the C64, especially when you want to combine both horizontal and vertical scrolling.
2007-09-22 18:09
PopMilo

Registered: Mar 2004
Posts: 146
@doynax: "There are certain alternative hardware tricks..."

Could you be more specific? :)
2007-09-22 18:33
doynax
Account closed

Registered: Oct 2004
Posts: 212
Quote: @doynax: "There are certain alternative hardware tricks..."

Could you be more specific? :)


Well.. To be honest I can't think of anything else that'd be useful off hand. But the hack I was talking about ought to qualify.

Basically I abort every other badline and switched between a pair of charsets every eight lines, which essentially creates a charmode with twice the normal tile height (i.e. half as many char indices/colors to manage and twice as much graphics data). By using the NMI timers kind of like in Ninja's FLI display it cost more than about 500 cycles per frame, what with the time regained from the badlines. And with the IRQ vector free for sprite multiplexing it's actually quite usable in a game, although screen splits and other cycle-exact effects are a mess.
A neat point is that the missed badlines still increase the video matrix pointer so by starting the display a row late you get color ram double buffering for vertical scrolling :)

Anyway, here's the proof-of-concept demo: http://www.minoan.ath.cx/~doynax/6502/Balls%20of%20the%20Scroll..
(I know it isn't all that fancy but I'm a novice when it comes to VIC hacking so I'd like to pretend I invented a new graphics mode ;)
2007-09-22 18:35
JackAsser

Registered: Jun 2002
Posts: 2014
Quote: Well.. To be honest I can't think of anything else that'd be useful off hand. But the hack I was talking about ought to qualify.

Basically I abort every other badline and switched between a pair of charsets every eight lines, which essentially creates a charmode with twice the normal tile height (i.e. half as many char indices/colors to manage and twice as much graphics data). By using the NMI timers kind of like in Ninja's FLI display it cost more than about 500 cycles per frame, what with the time regained from the badlines. And with the IRQ vector free for sprite multiplexing it's actually quite usable in a game, although screen splits and other cycle-exact effects are a mess.
A neat point is that the missed badlines still increase the video matrix pointer so by starting the display a row late you get color ram double buffering for vertical scrolling :)

Anyway, here's the proof-of-concept demo: http://www.minoan.ath.cx/~doynax/6502/Balls%20of%20the%20Scroll..
(I know it isn't all that fancy but I'm a novice when it comes to VIC hacking so I'd like to pretend I invented a new graphics mode ;)


IMO that IS a perfect use of the C64 HW capabilities. Good work! I'm impressed.
2007-09-22 19:13
doynax
Account closed

Registered: Oct 2004
Posts: 212
Quote: IMO that IS a perfect use of the C64 HW capabilities. Good work! I'm impressed.

Thank you :)

Hopefully I'll manage to finish that game sometime..
2007-09-22 20:26
johncl

Registered: Aug 2007
Posts: 37
That is a pretty nice demo of C64 trickery indeed! But for this game I will stick with more simple coding atm.

Even though my horizontal scroller works fine with 17 lines now, it seems I have to go for a double buffered solution simply because unrolling screen and color ram copies in both directions is just eating up all my memory. Since double buffering only requires me to unroll a color ram copy (6 bytes per char * 39 * 17 lines * 2 directions = 7956 bytes) and chances are that I dont need a full unroll but can do one using absolute,x indexing an still have enough cpu time for the rest. For the screen char data I will be fine since I will maximum have 4 pixel scrolls leaving a complete frame free for char copy (where I dont have to worry about where the raster is).

Its strange that this thing turned up to become a challenge after all. Which makes it all the more interesting a project. :)
2007-09-22 21:03
cadaver

Registered: Feb 2002
Posts: 1160
I don't know how much you need leftover rastertime, but I just want to mention that games like Turrican I/II/III do up-to-4-pixels-per-frame scrolling without unrolling either the screen or color copy routine. An example:

frame 1: shift screen data in hidden buffer
frame 2: shift color data, split in two halves
frame 3: repeat above...
Previous - 1 | 2 | 3 | 4 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
E$G/HF ⭐ 7
Guests online: 103
Top Demos
1 Next Level  (9.7)
2 13:37  (9.7)
3 Mojo  (9.7)
4 Coma Light 13  (9.6)
5 Edge of Disgrace  (9.6)
6 What Is The Matrix 2  (9.6)
7 The Demo Coder  (9.6)
8 Uncensored  (9.6)
9 Comaland 100%  (9.6)
10 Wonderland XIV  (9.6)
Top onefile Demos
1 No Listen  (9.6)
2 Layers  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.6)
6 X-Mas Demo 2024  (9.5)
7 Dawnfall V1.1  (9.5)
8 Rainbow Connection  (9.5)
9 Onscreen 5k  (9.5)
10 Morph  (9.5)
Top Groups
1 Performers  (9.3)
2 Booze Design  (9.3)
3 Oxyron  (9.3)
4 Censor Design  (9.3)
5 Triad  (9.3)
Top Organizers
1 Burglar  (9.9)
2 Sixx  (9.8)
3 hedning  (9.7)
4 Irata  (9.7)
5 Tim  (9.7)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.048 sec.