[CSDb] - User Forums - Release id #150179 : Differential Scrolling

You are not logged in - nap

CSDb User Forums

Forums > CSDb Entries > Release id #150179 : Differential Scrolling

2016-08-28 23:19

Compyx

Registered: Jan 2005
Posts: 631

Release id #150179 : Differential Scrolling

Guess I'll start the discussion then,

I don't see any advantage of using this technique over traditional ways of scrolling, and I'll leave VSP out of the equation for now.

Seems to me that when using actual 'level' data using a tile based approach would be faster and a lot less memory hungry. Not to mention it would allow scrolling backwards, something this differential scrolling would have a hard time to do.

Also I don't get the 'full screen' update when scrolling left-to-right at one pixel at a time, seems incredibly wasteful.

Well, I said my piece, now for the actual coders ;)

2016-08-28 23:30

chatGPZ

Registered: Dec 2001
Posts: 11119

i can see the idea, but i cant see a useful application :)

2016-08-28 23:41

Fungus

Registered: Sep 2002
Posts: 618

Seems like a c64 implementation of the PC idea used for Commander Keen by John Carmack. I can see the use of it there, but not on C64 where it had hardware scroll capabilities and tiles.

None the less, interesting.

2016-08-28 23:49

Compyx

Registered: Jan 2005
Posts: 631

I once tried an RLE approach to bitmap scrolling, compressing all bitmap data with RLE and decompressing a column during scrolling. Sucked. Just encoding all bitmaps into a 'charset' with 16-bit indici/pointers was much faster, and simpler.

2016-08-29 00:40

CRT

Registered: Oct 2012
Posts: 87

I don't think it's something new. Delta-compression, delta-updates is something I like to use often when it fits my needs. Still a really nice example.

Edit: The most obvious day to day example that comes to mind is mpeg based video in one form or another.

This works nicely on C64 for small differences. But it's good to have an escape backup routine to replace the entire screen if need be.

2016-08-29 02:21

White Flame

Registered: Sep 2002
Posts: 136

If I understand the concept correctly, this only really works when you have large areas of a single character, which isn't going to be applied to very good looking scenes.

However, I found it worked well for scrolling colorram in Hawkeye 2. There, you're going to be much more likely to have large swaths of the same byte, and only updating the edges between colors can save a lot of time chasing the raster. Plus, depending on how you implement it, it can scroll bidirectionally.

2016-08-29 06:23

Mr. SID

Registered: Jan 2003
Posts: 421

Yes nothing new here. I also used the same technique for color ram scrolling in Canabalt. For actual background graphics with lots of details this does not provide a benefit, except if you have large areas of sky which is all the same character.

2016-08-29 07:53

Carrion

Registered: Feb 2009
Posts: 317

Quote: i can see the idea, but i cant see a useful application :)

And it was my question too...

How can we exploit that? What gfx modes (hires? multi?) can a graphician use and how big it can be?

This looks tempting especially because of the parallax effect :)

2016-08-29 19:31

White Flame

Registered: Sep 2002
Posts: 136

The parallax doesn't have anything to do with the char scrolling routine, though.

2016-08-30 19:41

Sokrates

Registered: Jun 2014
Posts: 7

I just had this idea and found no information that this was done on the c64 before, so I wrote this little demo.

Now I see that this idea was implemented before, at least for color ram scrolling.

But beside the technique is not completely new, I think this demo is still useful to share this knowledge with a broader audience than before.

So maybe we'll see more fancy use cases in the future :-)

2016-08-31 01:26

White Flame

Registered: Sep 2002
Posts: 136

Oh, I just remembered another time I used this technique: BASIC animations! :-D Move the cursor, PRINT the animation item which also prints enough spaces beside it to erase the prior frame (or leave smears to animatedly draw a border or smoke trails or whatever). It makes fluid-enough large PETSCII animations quite feasible when you're an 8 year old who doesn't know asm yet. :-)

2016-08-31 08:03

ChristopherJam

Registered: Aug 2004
Posts: 1378

Of course, now to combine this technique with the LDA# optimised speedcode from Faster charmap scrolling

2016-08-31 21:46

Claus_2015

Registered: Oct 2012
Posts: 53

While this has surely been made before, there seem to be some possible applications:

1. 8 pixel/frame scrolling (or even 16 pixel), which is hardly possible using traditional screen copy

2. Anything that is not traditional horizontal/vertical scrolling, e.g. 3d tunnel stuff etc.

I also thought about something like this in the past, but quickly assumed that it would take too much memory. But when looking at the numbers, it is actually comparable to what you need for tile based maps.

2016-09-01 08:11

enthusi

Registered: May 2004
Posts: 675

It is not too uncommon to directly write map-data into screen for games. No copy to *-1 but a full offset store.
Sometimes even lda map,x sta screen,y
In particular for horizontal scrollers. You dont have any limitations in scrolling speed then.

2016-09-02 09:49

Fungus

Registered: Sep 2002
Posts: 618

We've discussed such things before too.

If you have a tile map and use character sets with even offsets you can use bitwise manipulations to speed up plotting and just plot the whole screen. The bigger the tiles the more you save. With splits you can have multiple tiles sets too and even do it dynamically as long as the region doesn't share tiles with another, that will require duplication.

I'd like to see an implementation of the delta scrolling for color memory however, just to get an idea of it.

2016-09-02 10:48

oziphantom

Registered: Oct 2014
Posts: 478

My Squid Jump basically does the delta for CRAM that way I can actually scroll fast enough. Their is a preview on here if you want to have a look.

2019-04-24 06:59

TBH

Registered: Mar 2010
Posts: 20

I've been working on a similar scroll routine and popped over to CSDB to see if anyone had already discussed the technique. Of course they had!

Anyway, here's my take on it, for a single direction horizontal scroller (R-Type, etc) that scrolls chars and colour data.

Each column (scrolling right to left) can be represented by the barest minimum of code where there is a difference from the column on its left:
1) One LDA# per unique character
2) One indexed STA$ per unique char
3) One indexed STA$ per unique colour

Where colours share the same lower 4 bits as a char, that LDA is shared (up to 16 char/colour can be mapped this way) to avoid frequent loads of colour values.

The writes are indexed. The absolute value is always the address of Column 0 ($0400, $0428, etc). the index register (assuming .X here) indicates the column to write. To run the screen shift, X is set to 0 and the correct place in the buffer is JSR'd.

So, during spare cycle time, the information for a column is streamed to a code generator. The generator emits the absolute fastest code to generate the column.

The code to draw the rightmost two columns would look something like this, assuming two 2x2 block tiles are vertically stacked at the top of the screen, with a clear blue sky char in the column immediately left:

...
lda #TopLeftBrickChar ;(lower 4 bits is 9 - brown)
sta CharRow0,X
sta ColorRow0,X
sta CharRow2,X
sta ColorRow2,X

lda #BottomLeftBrickChar ;(lower 4 bits is 9 - brown)
sta CharRow1,X
sta ColorRow1,X
sta CharRow3,X
sta ColorRow3,X

inx ; next column

lda #TopRightBrickChar ; No need to change colour
sta CharRow0,X
sta CharRow2,X

lda #BottomRightBrickChar ; No need to change colour
sta CharRow1,X
sta CharRow3,X

rts

The code is written to a circular buffer. The address of each column code (immediately proceeding the INX) is stored in LO/HI tables (pointer, circular buffer). If the end of the code buffer is encountered while writing code then a JMP StartOfBuffer is written and the code is written from the start of the buffer. This can also be precomputed and added in the data stream as a control code.

To scroll, load the index register (.X) with 0, and JSR to the address of the column that is currently leftmost (ie pointer in the address table). This will shift the screen as fast as technically possible.

Next, update the address pointer and add another column-worth of code to the code buffer (this operation can be spread across hardware-scroll frames). When the new column code is appended, the existing RTS is replaced with INX.

The number of columns in the code buffer can be more than 40, of course. It depends on how much space is available.

If changes are to be made to specific locations on the video or colour matrix, such as changing a char, then use an indexed source (such as the screen matrix itself):

lda CharRow0 + 1,X
sta CharRow0,X

The size of the code buffer can be programmatically determined by a simple parse of the map data.

The data for the column needs to convey information to the code generator such as the char/colour and the leftmost column's row addresses (use an index and lookup table, or a bitfield). This information can be assembled from tile data or some other compression technique. It's all deterministic, so there is a lot of leeway for storage, as well as deferring the actually code construction across several screen frames that have cycles to spare.

The screen shift's cycle usage is dependent on the complexity of the screen data, so there will be constraints on level design. But that's always been the case.

2019-04-24 10:02

ChristopherJam

Registered: Aug 2004
Posts: 1378

Very nice, TBH. Related ideas were also discussed at Faster charmap scrolling, albeit just for colour ram updates.

Are you currently leaning toward generating the speedcode directly from a traditional tilemap, or are you using some other kind of compressed hints?

2019-04-24 11:37

Oswald

Registered: Apr 2002
Posts: 5020

this could be extended to - not to reload A if the same value has to be writen to other rows aswell - which I prefer, it could restrict the gfx having the color info aswell in lower nibble? not being able to reuse chars with different colors. maybe its not an issue.

2019-04-24 15:49

TBH

Registered: Mar 2010
Posts: 20

@ ChristopherJam, regarding speedcode:

I think it's best the routine not care how the map is stored. It should only require (repeatedly) a column of chars and colours. I'm leaning toward a hybrid solution though, which analyses the game map during initialisation.

For example, at the start of a game level it would request each of the columns from the tile routine and use that data to generate and store hints for each of those columns. During the actual game the hints would be accessed so each tile column could be quickly sorted into to a "one-to-many" relationship between the loads and stores.

Alternatively, if there was enough spare RAM, the raw column data for loads (values) and stores (row indices) could be used: TopLeftBrickChar, 0, 2, BottomLeftBrickChar, 1, 3 ... and so on.
This is where a compressed bitfield would be useful: TopLeftBrickChar, %10100000, BottomLeftBrickChar, %01010000 ... and so on.

@ Oswald, regarding colour info:

The char/colour alignment allows up to 16 chars per colour - which although restrictive, is also optional.

The code generator would still have the option of emitting a separate unshared LDA operation for the colour value. I see such an optimised arrangement of char definitions done per-level, so the char definitions would quite likely be rearranged differently for each level of a game.

However, such optimisation may negatively impact other char-based optimisations - such as collision range tests (eg where platform chars are from 10 to 50, wall chars are from 51 to 80, etc).

I think some precalculation would have to be used, mostly an analyse of the map data to produce intermediate values useful for the code generation stage.

I've some spare time tomorrow, so will try to cobble my code together with some real world map data, and post the results.

2019-04-25 00:39

ChristopherJam

Registered: Aug 2004
Posts: 1378

Ah, so the speedcode would be generated from the chars and colours, but possibly with some initial processing when the level is loaded.

As for the char/colour alignment, I'm wondering if that could be fruitfully avoided by for each column
- initialising a colour->charcode mapping to [0..15]
- then reading through the character codes and overwriting mapping[charcode&15] with charcode
- then using mapping[colour] instead of colour when subsequently building the speedcode for the column.

That way the speedcode could make use of any charcodes that happen to line up with colour codes, but without constraining the artists to a set ratio.

2019-04-25 12:11

Oswald

Registered: Apr 2002
Posts: 5020

for this "one to many" problem I found lda ($00,x) adressing useful.

set up the zeropage so that each consecutive zp pointer points to a table.

like: $10 -> $1000, $12 -> $1100, etc.

then when you scan a column of colors simply load the (asl-ed) color value into the X register which will select you the table where you note down where that color was used. do an inc $xx,x aswell to increase the pointer for the next entry.

with this you can scan the column fast to find where same color / char was used.

2019-04-25 13:14

ChristopherJam

Registered: Aug 2004
Posts: 1378

Yup, I was using sta(zp,x) for the code generator for my d800 scroller; one pointer for each chunk of code, one chunk of code for each colour.

2019-04-26 03:37

TBH

Registered: Mar 2010
Posts: 20

I tried a different method to handle the one-to-many problem. Basically, an inelegant iteration through a constantly reducing search-space.

Below is the routine I wrote to handle it. It is in C64 Prg Studio format.

The new columns of chars and colours are output to the NewChars and NewCols tables respectively.

The tables of 25 chars and 25 colours are adjacent, and the routine treats them as a single 50 element table for the purpose of ignoring unrequired writes. A table of AND values (25 values of 255 and 25 values of 15) is used to avoid conditional logic for the char-colour value comparison. The table OldValues maintains the previous columns of chars and colours that are compared to find differences.

Basically, it walks through the list of values that are to be written to text or colour matrices, compares each with those proceeding it, marks matches as "handled" so they are not checked again, and copies new values into OldValues. If a matched colour is found, the index (25-49) is placed in the sequence of indices for that value.

The chars and colours are treated as a single unit as much as possible (hence the use of the ValueAnd table).

Where possible, char values will be co-opted for writes to colour memory. If none is found for a colour then a separate LDA for the color value is used.

The resulting values and row indices (text row indices are 0-24, colour rows are 25-49) are placed into the Scheduler table in the format:
Char/Colour value, Row Index [, Row Index...], Index_Sequence_Terminator [...], Value_Sequence_Terminator

For example:
Char_TopBrick, Row_0, Row_2, Terminator, Char_BottomBrick, Row_1, Row_3, Terminator, Colour_Red, Row_25, Row_26, Row_27, Row_28, Terminator, Terminator

Note that the Terminator value is 0, and indexing starts at 0. But in the Scheduler table, a row index of 0 will always be the first element in any row index sequence proceeding a value, so will never be tested as a potential termination by the Code Generator.

The Code Generator (not listed here) sequentially reads the Scheduler table (which contains the most recent set of formatted column information), and constructs code using opcode constants, the Scheduler's values and addresses from lookup tables retrieved using the Scheduler's row indices.

; ----------
NewValues
NewChrs bytes cRows                 ; 25
NewCols bytes cRows

OldValues bytes 255*cCombinedRows   ; 50
ValueAnd bytes 255*cRows, 15*cRows

Scheduler bytes cTotalUniqueRW      ; 92
cRowCnt = cCombinedRows - 1

ScanValues
         ; Init Scheduler address
         lda #<Scheduler
         sta StoreToScheduler + 1
         lda #>Scheduler
         sta StoreToScheduler + 2

         ldx #0
@Loop    lda NewValues,X
         beq @Next                  ; Marked as handled, so skip
         cmp OldValues,X
         beq @Next                  ; No difference, so skip

         ; Add to scheduler
         sta OldValues,X
         jsr StoreToScheduler       ; Add the Load Value to Scheduler

         ; Search for duplicate Char and Colour Load Values
         txa
         tay
         jmp @Index                 ; Store index of New Value before search

@SLoop   lda NewValues,Y
         beq @Cont                  ; Marked as handled, so skip
         cmp OldValues,Y
         beq @Cont                  ; No difference, so skip
                  
         ; Compare value with original
         and ValueAnd,Y
         cmp NewValues,X
         bne @Cont
         
         ; Mark NewValue as having been handled
         lda NewValues,Y
         sta OldValues,Y
         lda #0
         sta NewValues,Y

         ; Add the Index Value to Scheduler
         tya
@Index   jsr StoreToScheduler       

@Cont    iny
         cpy #cCombinedRows
         bne @SLoop

         ; Add Index Terminator to Scheduler
         lda #0
         jsr StoreToScheduler

@Next    inx
         cpx #cCombinedRows
         bne @Loop

         ; Add Value Terminator to Scheduler
         lda #0
         jmp StoreToScheduler

StoreToScheduler
@SMC     sta $FFFF
         inc @SMC + 1
         bne @Exit
         inc @SMC + 2
@Exit    rts

2019-04-26 07:05

Oswald

Registered: Apr 2002
Posts: 5020

that seems super complicated. this is the loop with (,x)

lda colorvalues,y
asl
tax ;can be just lax with pre asl
tya
sta (occurancetables,x)
inc occurancecounter,x
iny
cpy #25
bne -

in the end each table belongs to a color value, each has a counter that says that color needs to be sta'-ed how many times and each entry in the table is the Y value the color was seen at.

2019-05-03 05:25

TBH

Registered: Mar 2010
Posts: 20

Quote:

that seems super complicated. this is the loop with (,x)

It does seem complicated, but I haven't found a better solution. It's doing a lot in there:
- selecting only chars and colours that need to be written to the column
- combining char and colour LDA if lower nybbles match

I couldn't find a way to apply the ZP vector bucket method used for chars. It needs either 256 vectors and tables to handle all potential 256 chars (plus another 16 for colours) or else reuse a limited amount of vectors, which means loops and vector rewrites.

So I simplified the differential routine by removing the schedule table and embedded the code generation in its place.

This works surprisingly well. With the test data (in my earlier post) used, a full-screen differential screen shift for both char and colour is completed in less than 2800 CPU cycles and the maximum code length is about 1800 bytes.

The code generation for the columns (using the test data) never exceeds 3500 cycles and I think that speed can be optimised by a further 50%. For a 1px/frame scroll the code generation can be split across the 7 fine-scroll frames (about 500 CPU cycles/frame before optimisation).

Even without splitting the code generation across frames, it significantly lessens the CPU load for full-colour scrolling, and removes the need for a second screen buffer.

One thing though - if there is a practical way to extend the 16-colour sort (zp,X) method to all characters then could someone enlighten me? I might have overlooked some blindingly obvious method.

2019-05-03 14:01

Oswald

Registered: Apr 2002
Posts: 5020

one way would be doing 2 pass, skipping half of the values with bmi bpl, and forbidding use of chars #$00 and #$80 to avoid $00/$01. + and #$7f for the > $7f bytes.

2019-05-03 14:04

Oswald

Registered: Apr 2002
Posts: 5020

can go as far as storing the map twice, so the BMI BPL AND #$7f and the ASL can be all skipped. Because the 2 versions have the char values already asl-ed and stored twice for =<$7f and >=$80 values. Going this far the differential checking could be also precomputed.

...aand one more edit: instead of char values and 2 maps, 16bit pointers could be stored that are pointing to the occurence tables directly + the precomputed differential info stored somehow.

...and one more: the whole map could be just info on how to generate those speedcodes.

2019-05-04 00:50

TBH

Registered: Mar 2010
Posts: 20

Thanks, Oswald. Those comments clarify a few things.

It now seems obvious there are two ways to approach this:

1) Just-In-Time:
Low-friction. Generate a chunk of speedcode from raw char and colour values for each column prior to the text/colour matrix shift. It doesn't care how the screen map is stored.

This is what my code currently does.

Regarding 2-pass sort, since this would prevent a program using zeropage for other things, then perhaps half of ZP could be made available by using 4 passes, with BMI and BVS to test bits 7 and 6.

2) Precalculate speedcode info:
Use the screen map data to precalculate info on how to generate the speedcode.

This info would either replace the screen map statically (ie done before or during compilation/assembly), probably only if it didn't require more RAM than the stored map (compressed/tiles/etc). Or else it could be done at runtime before scrolling begins and temporarily stored in spare RAM until no longer needed (eg generate this info once at the start of each game level).

2019-05-09 17:02

TBH

Registered: Mar 2010
Posts: 20

This little coding exercise has turned out quite interesting.

I realised that much of the task was simply one of grouping, not sorting, so tried a variation of the previously mentioned zp vector and table method: generate a linked list of indexes for each set of matching values.

The snippet of code below is for the linked-list routine. It groups chars and colours together for use by the code generation routine.

The routine examines each member of the combined char and colour column once only, discards unchanging chars and colours , inserts a terminator or index in a 256-byte master table, and updates an 8-bit link in another 50-byte link table at the index of the char or colour.

The code generation loop just rolls backward through the links, fetches the index for use in a CPU index register to access the various tables and spits out code in the form LDA #First_Occurence_Of_Value : STA $First_Row_In_List : STA $Second_Row_In_List, etc..

Even unoptimised, the CPU time to do this is negligible. The grouping routine is about half a raster per char and colour, and a little bit over half a raster line to output each STA for those chars and colours. The test data averages around 400 char and colour changes onscreen, and uses about 40 rasters to examine and generate a full column-worth of code (keep in mind this work can be split across non-shift frames), and 50 raster lines to execute the shift-code, which is effectively a full screen shift (40x25 chars and colours).

GroupValues
         ldx #255
         stx GroupCount
         ldx #49

@Loop1   lda NewValues,X   ; Get a value
         cmp OldValues,X   ; Compare with old values
         beq @Next
         sta OldValues,X   ; Modify OldValues

         tay
         lda GroupLinkPtr,Y         ; 
         sta GroupLinks,X           ; Terminator or index
         bpl @Another               ; Has it been encountered before? No = Bit7
         
         ; store Current rX Index to GroupLinkStart. This is copied to the next group member's link table
@New     inc GroupCount
         ldy GroupCount
         txa
         sta GroupLinkStart,Y
         ldy NewValues,X

         ; Change Link at LinkPtr then put new index to LinkPtr
@Another txa
         sta GroupLinkPtr,Y

@Next    dex
         bpl @Loop1

This piece of code generates the shift-code for the column (some of the support stuff wrapping it has been removed for clarity):

         ldx #0
         stx zCodePtr

@Loop2   ldy GroupCount
         bmi @Cleanup               ; No new chars
         dec GroupCount
                  
         ldx GroupLinkStart,Y
         ldy NewValues,X
         ldx GroupLinkPtr,Y         ; Most recent Link Index found for that group

         lda #255
         sta GroupLinkPtr,Y         ; Cleanup as we go

         ; --------------------
         ; Write lda #nn
         ;
         ldy zCodePtr

         lda #cLDA_imm
         sta (zVecCode),Y
         iny
         lda NewValues,X
         sta (zVecCode),Y
         iny
         sty zCodePtr

         ;TestEndOfBuf prior to each STA 
@Loop2I  ldy zCodePtr
         lda zCodeEOB + 1
         bne @Cont         ; HI > 0, so skip rest of test
         cpy zCodeEOB
         bmi @Cont         ; LO < BufEnd, so exit (eg $4000 < $40FF)
@Reset   jsr CodeBufReset
         ldy zCodePtr

         ; --------------------
         ; Write sta $nnnn
         ;
@Cont    lda #cSTA_x
         sta (zVecCode),Y
         iny
         lda RowLO,X
         sta (zVecCode),Y
         iny
         lda RowHI,X
         sta (zVecCode),Y
         iny
         sty zCodePtr

         lda GroupLinks,X  ; Get index of next group element 
         tax
         bpl @Loop2I
         jmp @Loop2

         ; Cleanup
@Cleanup ldy zCodePtr
         lda #cINX
         sta (zVecCode),Y
         iny

         lda #cRTS
         sta (zVecCode),Y
         sty zCodePtr

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

LDX#40
deetsay
Mike
Acidchild/Padua
tlr
Case/Padua
Oswald/Resource
icon/The Silents, Sp..
Guests online: 142

Top Demos

1 Next Level  (9.8)
2 Mojo  (9.7)
3 Coma Light 13  (9.7)
4 Edge of Disgrace  (9.6)
5 Comaland 100%  (9.6)
6 No Bounds  (9.6)
7 Uncensored  (9.6)
8 Wonderland XIV  (9.6)
9 Memento Mori  (9.6)
10 Bromance  (9.5)

Top onefile Demos

1 It's More Fun to Com..  (9.7)
2 Party Elk 2  (9.7)
3 Cubic Dream  (9.6)
4 Copper Booze  (9.5)
5 TRSAC, Gabber & Pebe..  (9.5)
6 Rainbow Connection  (9.5)
7 Wafer Demo  (9.5)
8 Dawnfall V1.1  (9.5)
9 Quadrants  (9.5)
10 Daah, Those Acid Pil..  (9.5)

Top Groups

1 Nostalgia  (9.3)
2 Oxyron  (9.3)
3 Booze Design  (9.3)
4 Censor Design  (9.3)
5 Crest  (9.3)

Top Diskmag Editors

1 Jazzcat  (9.4)
2 Magic  (9.4)
3 hedning  (9.2)
4 Elwix  (9.1)
5 A Life in Hell  (9.1)

Page generated in: 0.072 sec.