| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Faster charmap scrolling
Some interesting asides over in the Pixeling forum about speeding up charmaps (cf Graphician for intense EF game). As Oswald pointed out, switching from tiles to a straight unpacked charmaps doesn't really save you much, as you can avoid dealing with tile indices for most of the screen by just copying most of the chars from within VM. Besides, even tile index reads can be amortised over multiple VM writes.
However, there are other possibilities. If you've got a little RAM to spare (eg because all your level data is in EF), then why not unroll the update loop into one hardcoded routine per column?
Could easily dedicate 5k to
lda#$xx
sta vm,x
lda#$xx
sta vm+40,x
lda#$xx
sta vm+2*40,x
..
lda#$xx
sta vm+24*40,x
which gets you down to 7 cycles per char (14 if you also do video ram)
You only need to update one column of source each time you scroll one char, and call the columns in sequence with increasing values of X.
Might have to do divide into upper/lower half of screen to avoid tearing.
Of course, if you want to be really extravagant, you could generate a routine per column of level data, and skip any redundant loads by grouping identical indices; kind of like compiled sprites on PC.
That would eat shedloads of flash if you stored them all in advance of course (a tad less with duplicate removal), or you could try generating them on the fly
|
|
... 20 posts hidden. Click here to view all posts.... |
| |
chatGPZ
Registered: Dec 2001 Posts: 11386 |
the whole "trick" is to run only the critical stuff in irq and then the rest in main loop.... which indeed may not always run at 50Hz then - but that doesnt really matter unless you have a *lolt* of *fast* moving objects and/or very complicated AI (and most games dont have, nor need, either) |
| |
cadaver
Registered: Feb 2002 Posts: 1160 |
Some of the info in the past rants may amount to bullshit, and for example a "lazy" AI round-robin update can make the game behave unpredicatably different in different situations, now I'd just advise to:
- Don't wait in the main program unless you're 1 frame ahead and can't buffer more stuff for the IRQs to show
- Game entity logic can be run at half framerate and sprite movements interpolated
- O(n^2) algorithms like collision detection can be most painful to the framerate, so optimize or try to avoid |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Ok, so running with the "take EF as read" assumption, and also sticking with horizontal scrolling as per the OP, I still think there's space for using a code generator for building speedcode to do the d800 update.
Oziphantom, is it safe to assume you're using pure MCM, hence only needing values 8 to 15? |
| |
Peacemaker
Registered: Sep 2004 Posts: 275 |
I have been recently working on a bitmap scroller and had an idea which fits in here very well i guess.
Some say its a problem to store the colorram each, lets say, every 8th frame, as it eats a lot of rastertime. That is true: here is a trick you could do if your scroller is not very fast (scrolls every frame, but every 4th or 8th frame which will work of course, even every 3rd).
You double buffer, if you prefer. You scroll the screen, as usual, at the same time you do the screen calcs, you store the d800 (vram) colors into a routine. a chunk of data every frame, to get down the rastertime. 1000/8 . If the routine is filled with the new values for the next colorram update, stop, and execute the display routine at colorram update if needed at for example 8th frame.
updateroutineeveryframewithachunkofdata (1000/8)
lda colorramsource,x
sta storehere1+1
lda colorramsource+1,x
sta storehere2+1
etc pp.
..................
call_this_routine_at_colorram_update:
storehere1:
lda #$00
sta $d800
storehere2:
lda #$00
sta $d801
etc.
whe the screen is update (d018 / dd00 switch if you use double buffer), you just call the colorram routine which displays the new colors in the same frame.
=)
i hope i could help |
| |
oziphantom
Registered: Oct 2014 Posts: 490 |
ChristopherJam That would be a question for the artist ;) I would think it is not though, I was also thinking that it might be possible to either have a 64char set or just use the first 64chars of the main set, with the top few rows being in ECBM mode. This way the far background could be in hires with more smaller pixels to make things look smaller, and the extra colours to help give it depth. Not sure it would be useful though.
Doing the speed code doesn't really save raster though, it saves you 1000clocks on the CRAM frame at the cost of 8000 clocks over the other frames right? |
| |
Perplex
Registered: Feb 2009 Posts: 255 |
Peacemaker: Nice if you have 5KB to spare for speedcode and need lots of time for other stuff besides D800-copying during the cruical frame(s). On the other hand it wastes a lot of cycles modifying the speedcode if you are doing other stuff like loading new bitmap data from disk in the background. I guess it all depends on what you'll be using it for. |
| |
Peacemaker
Registered: Sep 2004 Posts: 275 |
Perplex:
Sure, this method is ofcourse very useful for "D800-copying during the cruical frame(s)". It will work even better if you are using VSP, then you have actualy a lot of frames to fill the speedcode with new values. And then, a loader wont suffer that much =) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Oziphantom OK, I will redo my numbers allowing for 16 possible colours; think it means I'll have to switch to a section of speedcode per pair of columns, otherwise I lose 600 cycles of my savings.
I'll do that tomorrow, but for now here are my 8 colour numbers:
At eight possible colours with one section of speedcode per character column, it only takes me around 5700 cycles to update all of d800, so a savings of 2300 cycles over the direct copy within d800 + column fetch from EF.
Building a section of speedcode only takes around 2200 cycles, including fetching new values from EF; an easy cost to bear even if you were scrolling four pixels per frame. (hah; just noticed the build cost is similar to the savings on the update frames; all I've done is balance the load a little) |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Sorry, just realised where your 1000 cycles came from. Yes, it's true, my first suggestion above takes 7000 cycles for a full update, but also you only need a very quick update of one of the 39 segments of code each 8 pixels; should take around 300 cycles to fill from EF. The rest are reused by using progressively lower values of X to select a destination column.
Peacemaker's solution has a lower runtime (6000 cycles), but you cannot reuse the speedcode as the destinations are hardcoded and unindexed.
The 5700 cycle version I was referring to in my last comment only performs eight immediate loads per column, but to get down to that I have to use a counting-sort to reorder the stores, which is considerably slower. At least I get to use sta (zp,x) in the speedcode generator :D |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1409 |
Slack bastard time. Generate a new segment of speedcode every two charscrolls, use the hidden char column so that the same set of 20 double-width columns can be used twice.
; 20*50*5 = 5000 cycles for storing values
; 20*16*2 = 640 cycles for loading values to write
; 20* 7 = 140 cycles for double-decrementing x and skipping to next routine
; TOTAL 5780 cycles
It'd be about 90 cycles less if only 19 routines were called, and special case code was done for first/last column, but I'm heading back to drive coding for now. I'll write up the speedcode generator another day. |
Previous - 1 | 2 | 3 - Next |