[CSDb] - User Forums - Faster charmap scrolling

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Faster charmap scrolling

2016-07-14 07:48

ChristopherJam

Registered: Aug 2004
Posts: 1409

Faster charmap scrolling

Some interesting asides over in the Pixeling forum about speeding up charmaps (cf Graphician for intense EF game). As Oswald pointed out, switching from tiles to a straight unpacked charmaps doesn't really save you much, as you can avoid dealing with tile indices for most of the screen by just copying most of the chars from within VM. Besides, even tile index reads can be amortised over multiple VM writes.

However, there are other possibilities. If you've got a little RAM to spare (eg because all your level data is in EF), then why not unroll the update loop into one hardcoded routine per column?

Could easily dedicate 5k to

	lda#$xx
	sta vm,x
	lda#$xx
	sta vm+40,x
	lda#$xx
	sta vm+2*40,x
	..
	lda#$xx
	sta vm+24*40,x

which gets you down to 7 cycles per char (14 if you also do video ram)
You only need to update one column of source each time you scroll one char, and call the columns in sequence with increasing values of X.
Might have to do divide into upper/lower half of screen to avoid tearing.

Of course, if you want to be really extravagant, you could generate a routine per column of level data, and skip any redundant loads by grouping identical indices; kind of like compiled sprites on PC.

That would eat shedloads of flash if you stored them all in advance of course (a tad less with duplicate removal), or you could try generating them on the fly

2016-07-14 07:59

JackAsser

Registered: Jun 2002
Posts: 2014

My latest full screen scroll code involves:

stx SCREEN1+$0000
stx SCREEN2+$0000
inx
stx SCREEN1+$0028
stx SCREEN2+$0028
inx
stx SCREEN1+$0050
stx SCREEN2+$0050
inx
stx SCREEN1+$0078
stx SCREEN2+$0078
inx
stx SCREEN1+$00a0
stx SCREEN2+$00a0
stx SCREEN1+$0001
stx SCREEN2+$0001
inx
stx SCREEN1+$00c8
stx SCREEN2+$00c8
stx SCREEN1+$0029
stx SCREEN2+$0029
inx
stx SCREEN1+$00f0
stx SCREEN2+$00f0
stx SCREEN1+$0051
stx SCREEN2+$0051
inx
stx SCREEN1+$0118
stx SCREEN2+$0118
stx SCREEN1+$0079
stx SCREEN2+$0079
inx
.
.
.

Segment (start, stop, size):
SCROLLER              007C00  0094CF  0018D0

For what and how it's used is a secret and will be revealed at a future demo party. :)

2016-07-14 08:01

ChristopherJam

Registered: Aug 2004
Posts: 1409

Haha, nice. I can think of a few things you could do with that..

2016-07-14 08:48

Oswald

Registered: Apr 2002
Posts: 5094

jackie, twister with half chars ? :)

2016-07-14 08:54

Oswald

Registered: Apr 2002
Posts: 5094

CJ, took me a minute until I got it, thats an awesome idea :) tho the usage is limited to horizontal scrollin.

2016-07-14 08:55

JackAsser

Registered: Jun 2002
Posts: 2014

Quote: CJ, took me a minute until I got it, thats an awesome idea :) tho the usage is limited to horizontal scrollin.

Free directional (mine that is). CJ's I dunno.

2016-07-14 09:20

ChristopherJam

Registered: Aug 2004
Posts: 1409

Quoting Oswald

CJ, took me a minute until I got it, thats an awesome idea :) tho the usage is limited to horizontal scrollin.

Thanks! Yes, I probably should have mentioned that limitation.

2016-07-14 14:45

oziphantom

Registered: Oct 2014
Posts: 490

Given you have your chars mapped out like so
$8000 row 1 of chars
$8100 row 2 of chars
$8200 row 3 of chars
....

$a000 row 1 of colours
$a100 row 2 of colours
.....

And you are double buffered. Screen 1 at $4400 and Screen 2 at $4800

You have a window defined as Lindex and Rindex which start at 0 and 39.
Lets also assume that you are viewing Screen 1 to start with
When you scroll left, such that it appears the player has moved right.
You use an unrolled loop to copy
4401 -> 4800
4402 -> 4801
.....
You then need to plot your new column, in this case in the right edge of screen 2.
so
ldx Rindex
lda $8000,x
sta $4827
lda $8100,x
sta $484f
lda $8200,x
...

then you need to do the CRAM so again unrolled loop
d801 -> d800
d802 -> d801
.....

and plot the CRAM side
ldx Rindex
lda $A000,x
sta $d827
lda $A100,x
sta $484f
lda $d900,x
...

So you need 4 unrolled Screen copy routines
Screen1 to Screen2 forwards
Screen2 to Screen1 forwards
Screen1 to Screen2 backwards
Screen2 to Screen1 backwards
and 2 CRAM copy routines
CRAM+1 to CRAM
CRAM to CRAM+1

And 4 column routines
LeftEdge Screen1
LeftEdge Screen2
RightEdge Screen1
RightEdge Screen2

Now to move you move your "window", so to get to the next char row you inc Rindex and Lindex, to go back you dec them.

But this only gets you a 256 char wide map. So you need to add in a RindexBank and LindexBank as well. Once you roll over you either inc/dec the bank. Or even look up what the next bank is in a bank map, allowing you to repeat banks for extra length, or warp around maps. Since the Banks are always at $8000 you don't need to use any pointers, or indexes. Your unrolled loops can now service up to 1MB worth of map. Eats a good 24~5K but with all the map data and other code being able to be stored in ROM, it don't think it is going to matter, and it gives top speed with free range "boundless" map size, with unique colour per 1x1 map tile and you don't need any timing critical VIC tricks to worry about patching to support NTSC. However you can't modify the map, so dynamic parts of the map must become "entities". Could also be modified to support up/down and even possibly 8 way scrolling versions.

2016-07-14 16:37

Oswald

Registered: Apr 2002
Posts: 5094

you're better off with reu, esp. since 1541u supports it so big user base. now reu can copy memory at 1 cycle / byte speed for scrolling.

2016-07-14 18:41

cadaver

Registered: Feb 2002
Posts: 1160

Oziphantom: Agreed with Oswald, I believe you can not get appreciably faster whole screen or color-RAM update with just the CPU no matter what you're doing.

Unrolling will help, as will writing the same value to both screen & color-RAM (basically you could have 16 chars of the same color, Quod Init Exit IIm does this), but still the load/store operations dominate the load.

However you're probably not going to do the screen update every frame, so use that to your advantage, e.g. instead of waiting, you can have the main program always calculate at least 1 frame ahead, while IRQs perform the screen update. This means that the large chunk of CPU time taken by it isn't as devastating, as your next frame may already be half ready when interrupted by the screen update, which hopefully leaves enough time to finish.

2016-07-14 19:17

Compyx

Registered: Jan 2005
Posts: 631

Seems like VSP would not be a bad idea, seeing how all scrolling is only horizontal. Saves a shitload of raster time, except for the one time you have to scroll colorram up.

But like Cadaver said, you can 'cheat' updating the colorram by carefully timing when it happens. But that might screw with any multiplexer, so perhaps move $d800-updating out of IRQ.

... 20 posts hidden. Click here to view all posts....

Previous - 1 | 2 | 3 - Next

Refresh

Subscribe to this thread: