Log inRegister an accountBrowse CSDbHelp & documentationFacts & StatisticsThe forumsAvailable RSS-feeds on CSDbSupport CSDb Commodore 64 Scene Database
You are not logged in - nap
CSDb User Forums


Forums > C64 Coding > Improved clock-slide
2017-02-28 07:21
lft

Registered: Jul 2007
Posts: 369
Improved clock-slide

If you use timer-based jitter correction, or just VSP, here's a way to shave off one cycle:

http://codebase64.org/doku.php?id=base:improved_clockslide
 
... 16 posts hidden. Click here to view all posts....
 
2017-02-28 13:49
JackAsser

Registered: Jun 2002
Posts: 1989
Quote: I'm afraid that would leave the page-boundary in the wrong place.

True. at $00fx then. But this of course applies to the standard method aswell.
2017-02-28 14:58
Copyfault

Registered: Dec 2001
Posts: 466
So simple, so beautiful ;)

Now that I come to think of this, it should be possible to shave off another byte by utilising that other additional branch cycle (in case of a taken branch).

Assuming that the accu holds the no. of bytes to skip (which is common for this approach) we could do the following:
;-----------------------------
;A=0..n-1=no.of bytes to skip
;must have been calculated
;directly before to ensure
;correct setting of the z-flag
;-----------------------------
     sta bra+1
bra  bne *
     nop       ;2 cycles
     lda #$a9  ;2
     lda #$a9  ;2
     lda #$a9  ;2
     lda $ea   ;3
;-----------------------------
;page break here
;-----------------------------
code ...

If A=0, the branch is not taken; thus the total sum of cycles will be 11. In case of a non-vanishing A, the NOP-instruction is skipped (-2 cycles) but the additional "branch taken"-cycle comes in. Mind that if you want to slide down to 2 cycle-delay the page break is mandatory! A one-cycle delay is not possible but this is the same with the "BPL"-instruction which always comes with that additional cycle.
2017-02-28 15:07
Frantic

Registered: Mar 2003
Posts: 1627
I think the best part is that LFT wrote a Codebase article about it *before* I had to ask him about it.
2017-02-28 19:56
lft

Registered: Jul 2007
Posts: 369
Copyfault, that is an excellent improvement!
2017-03-02 10:35
ChristopherJam

Registered: Aug 2004
Posts: 1377
Oh that's gorgeous. Nice work, both of you!
2017-03-02 12:08
Frantic

Registered: Mar 2003
Posts: 1627
@Copyfalut: Don't hesitate to write about that improvement on Codebase. If LFT don't mind, perhaps it can be written as an extension of his article?
2017-03-02 22:38
Copyfault

Registered: Dec 2001
Posts: 466
Reading the reactions to my idea makes me happy&smile :))

But to be fair: my ideas are not that much of an optimization as it looks like on first glance!

Sticking to lft's example, the routine can cope with a jitter of 10 which means 11 different latencies (or cycle delay states as I prefer to call it). Applying my "optimization" the number of delay states drops by one.
;-----------------------
;cycle no. taken from
;lft's example
;-----------------------
                 ;32..41 (31 not poss. with the opt)
                 ;A=0..9 (10 not poss.)
     sta bra+1   
                 ;36..45
bra  bne *       
                 ;38 (only for A=0)
     nop         
                 ;40 (branch taken, nop skipped)
     lda #$a9    
                 ;42
     lda #$a9    
                 ;44
     lda #$a9    
                 ;46
     lda $ea     
                 ;49
;-----------------------------
;page break here
;-----------------------------
code ...

You can easily see that the delay state "A=10" is not coped anymore after the optimization (it would most probably lead to a crash due to a branch to "code+1") whereas it is fully treated in lft's approach.

Thus the idea I had is more of a "cosmetic kind". In order to fully take advantage of that extra "branch taken" cycle, one would have to ensure that the very first byte of the clock slide code is reached by a taken branch (for the "A=0"-case, i.e. the one which has to compensate the most cycles) and also just by passing that branch instruction (usually the "A=1"-case), but this would require some more touch-up of the accu before starting the actual dejitter part.

So before feeding the codebase I better ask here if you still want the idea to be added there.
2017-03-02 23:18
Copyfault

Registered: Dec 2001
Posts: 466
Speaking of that touch-up of the accu in my previous post, one could do it using table lookup. So the code would be smth like
;---------------------------------
;trying to stick to lft's example
;with all the cycle numbers
;---------------------------------
                 ;23..33
     ldx timer   
                 ;27..37
     lda table,x
                 ;31..41
                 ;A=0..10
     sta bra+1   
                 ;35..45
bra  bpl *       
                 ;38 (35+3 for A=0 or 36+2 for A=$ff)
     nop         ;40
     lda #$a9    ;42
     lda #$a9    ;44
     lda #$a9    ;46
     lda $ea     ;49
code ...

table
     !by $09,$08,$07,$06,$05,$04,$03,$02,$01,$ff,$00         

This way all 11 different delay states can be coped with but at the cost of extra Bytes for the table. Now if I want to be smart, I'd align the table to also have a page break for the "A=0"-case ;))
;---------------------------------
                 ;23..33
     ldx timer   
                 ;27..37
     lda table,x ;if timer holds the max-val, the table access reads above the page end -> extra cycle!
                 ;32..41
                 ;A=0..9
     sta bra+1   
                 ;36..45
bra  bpl *       
                 ;39 (36+3 for A=0 or 37+2 for A=$ff)
     lda #$a9    ;41
     lda #$a9    ;43
     lda #$a5    ;45
     nop         ;48 (ends one cycle earlier as the first dejitter cycle is the lookup-table penalty cylce)
code ...

table
     !by $08,$07,$06,$05,$04,$03,$02,$01,$ff,$00
;-------------------------------------------------
;page break here
;-------------------------------------------------
     !by $00         

Needs even one byte less for the clock slide part... but ofcourse, any advantage is eaten up by all the drawbacks like page-break requirements (now for that table also!), need for an index Register, higher "minimum overhead cost" (lda #const: sbc timer is cheaper in this respect!), etc.

But maybe this idea qualifies a Little better for a contribution to the mighty codebase?!??

[Edit]
Oops, that was too optimistic ;)) Ofcourse is must be
;---------------------------------
                 ;23..33
     ldx timer   
                 ;27..37
     lda table,x ;if timer holds the max-val, the table access reads above the page end -> extra cycle!
                 ;32..41
                 ;A=0,0,$ff,1,..,8 (see table)
     sta bra+1   
                 ;36..45
bra  bpl *       
                 ;39 (36+3 for A=0 or 37+2 for A=$ff)
     lda #$a9    ;41
     lda #$a9    ;43
     lda #$a9    ;45
     lda $ea     ;48 (ends one cycle earlier as the first dejitter cycle is the lookup-table penalty cylce)
code ...

table
     !by $08,$07,$06,$05,$04,$03,$02,$01,$ff,$00
;-------------------------------------------------
;page break here
;-------------------------------------------------
     !by $00         

The clock slide part is ofcourse _one_ byte less, not two ;p
2017-03-04 06:11
ChristopherJam

Registered: Aug 2004
Posts: 1377
Mind like a sieve. Look what I found on an old disk image from somewhere around 1989-1992.

Pretty sure I got the BPL from John West, after he independently discovered VSP some time around 1989-90




edit: argh, all this proves is that we *didn't* discover copyfault's improvement. I need some more sleep.

Also, welcome to my horrible source code from before I recanted my 'all cross developing is cheating' stance.
2017-03-04 07:53
oziphantom

Registered: Oct 2014
Posts: 478
6510+?
Previous - 1 | 2 | 3 - Next
RefreshSubscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search   for   in  
All times are CET.
Search CSDb
Advanced
Users Online
icon/The Silents, Sp..
centaur2/TREX
bugjam
Alakran_64
Grue/Extend
The MeatBall
katon/Lepsi De
Didi/Laxity
eryngi
Operator Teleksu
Guests online: 56
Top Demos
1 Next Level  (9.8)
2 Mojo  (9.7)
3 Coma Light 13  (9.7)
4 Edge of Disgrace  (9.6)
5 Comaland 100%  (9.6)
6 No Bounds  (9.6)
7 Uncensored  (9.6)
8 The Ghost  (9.6)
9 Wonderland XIV  (9.6)
10 Bromance  (9.6)
Top onefile Demos
1 It's More Fun to Com..  (9.8)
2 Party Elk 2  (9.7)
3 Cubic Dream  (9.6)
4 Copper Booze  (9.5)
5 Rainbow Connection  (9.5)
6 TRSAC, Gabber & Pebe..  (9.5)
7 Onscreen 5k  (9.5)
8 Wafer Demo  (9.5)
9 Dawnfall V1.1  (9.5)
10 Quadrants  (9.5)
Top Groups
1 Oxyron  (9.3)
2 Nostalgia  (9.3)
3 Booze Design  (9.3)
4 Censor Design  (9.3)
5 Crest  (9.3)
Top Original Suppliers
1 Derbyshire Ram  (9.5)
2 Black Beard  (9.4)
3 hedning  (9.2)
4 Baracuda  (9.1)
5 Irata  (8.5)

Home - Disclaimer
Copyright © No Name 2001-2024
Page generated in: 0.041 sec.