[CSDb] - User Forums - SID envelope rate counter phase alignment

Welcome to our latest new user maak ! (Registered 2024-04-18)

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > SID envelope rate counter phase alignment

2015-05-05 00:55

ChristopherJam

Registered: Aug 2004
Posts: 1370

SID envelope rate counter phase alignment

<Post edited by moderator on 17/3-2020 08:25>

Continuing the discussion that forked off Lft's post about Avoiding the ADSR bug in the decay phase.

I've been refining the part of the SID envelope reset that gets the internal rate counter into one of a number of states that are all equal modulo 9. LFT's state table was highly informative, but I still couldn't quite see what I was doing, so I wrote a script to generate diagrams from rate counter limit sequences.

In each of the images below, the horizontal axis is time, and the vertical has a pixel set for each possible rate counter value at that time. They are colour coded with (rate counter - time)%9, so any potential values that are equal modulo nine show up as streams of equal colour. Horizontal pink bars show the counter limit values. All but the last two have had the tops cut off so we can focus on the interesting bit, but you can see the highest rate counter reached in the annotations above the images.

original: 567 cycles, highest rc is 535 (runlog)

This was the code I initially contributed to the discussion. It takes forever! Only harvesting a result every two cycles through the 32 rate limit is extraordinarily wasteful (slaps self). This image is half the scale of the others just to avoid it breaking the page format.

lft: 342 cycles, highest rc is 324 (runlog)

LFT's contribution. Vastly improved

lft_compacted: 279 cycles, highest rc is 261 (runlog)

I then compacted this a bit, by dropping a redundant nine cycles from each iteration.

sieve: 248 cycles, highest rc is 252 (runlog)

First attempt at a different approach. The single cycles at Attack=0 should be doable by using an INC instruction, but that relies on SID reading zero. Not sure if this is safe?
Only way it could be faster would be to use the 220 cycle limit for the stream that would otherwise require eight 32 cycle resets.

sieve2: 280 cycles, highest rc is 288 (runlog)

A safer sieve, that only uses 4 cycle writes. Downside of the sieve is it still lets the rate count get pretty high.

bottle: 280 cycles, highest rc is 99 (runlog)

LFT's comment about recapturing got me thinking. The next phase of the reset would run faster if we could bottle as many streams as possible into the 63 cycle limit (attack=2).
We can only manage seven of them, but that still brings down max rc by a factor of three!

bottle2: 280 cycles, highest rc is 90 (runlog)

..a slight reshuffle of the last couple of iterations, and we save enough cycles to use a rate limit of 95 for whatever comes next

bottle3: 375 cycles, highest rc is 72 (runlog)

This last one's definitely more of theoretical interest, but for another 95 cycles we can group all the possible states into a single tight packet.

The benefit of all of the above is less than I thought when I first set out, as it was only a day or two ago that I finally looked into the envelope overflow/underflow, and made sense of LFT's remarks about using env3=0xff; hence there's only a couple of loops required at a rate limit of 95, one to drop back down to env3=0xfe, and another to recapture at decay=0
I'd initially thought I was saving thousands of cycles as we rose from env3=0xee, but it's not to be. C'est la vie!

Still, onward to implementation; bottle2 should still save a few raster lines at the point in time that there's work to be done by CPU.

2015-05-05 09:29

Mixer

Registered: Apr 2008
Posts: 421

Table from the script source. This table is also at resid sources and quoted elsewhere.

Never figured out why following timing is calculated with 1.0Mhz instead of 985248, or whether it makes any difference, as it is the full cycles that count. Though there are some curious roundings.

rcp=[
9, # 2ms*1.0MHz/256 = 7.81 # rate 0
32, # 8ms*1.0MHz/256 = 31.25
63, # 16ms*1.0MHz/256 = 62.50 # rate 2
95, # 24ms*1.0MHz/256 = 93.75
149, # 38ms*1.0MHz/256 = 148.44 # rate 4
220, # 56ms*1.0MHz/256 = 218.75
267, # 68ms*1.0MHz/256 = 265.63 # rate 6
313, # 80ms*1.0MHz/256 = 312.50
392, # 100ms*1.0MHz/256 = 390.63 # rate 8
977, # 250ms*1.0MHz/256 = 976.56
1954, # 500ms*1.0MHz/256 = 1953.13
3126, # 800ms*1.0MHz/256 = 3125.00
3907, # 1 s*1.0MHz/256 = 3906.25
11720, # 3 s*1.0MHz/256 = 11718.75
19532, # 5 s*1.0MHz/256 = 19531.25
31251 # 8 s*1.0MHz/256 = 31250.00
]

Also, any thoughts on verifying "the method of getting to known state" with real sid(s)? Test/Measure.

2015-05-05 09:41

ChristopherJam

Registered: Aug 2004
Posts: 1370

The counter limits have all been verified by both SID measurements and die photographs; I suspect the comments are just an attempt to see how they line up with the numbers from the datasheet.

As for verifying "the method of getting to known state", I implemented the one labelled "original" a few years back, and it works!

"bottle2" combined with LFT's epilogue I'll hopefully get to in a few days. I partly published the above just so I'd stop tinkering with graphs and get on with writing some actual code..

2015-05-05 13:41

lft

Registered: Jul 2007
Posts: 369

This is highly encouraging! Looking forward to the first actual implementation of Perfect Restart now. Keep up the good work!

2015-05-06 06:50

Frantic

Registered: Mar 2003
Posts: 1627

In a way it is kinda funny how people/we still struggle to this day with the bugs in the sid that once came about in a period of rushed hardware development. :)

2015-05-06 13:40

ChristopherJam

Registered: Aug 2004
Posts: 1370

Indeed. If only they'd loaded the limit value and compared to zero instead of the other way round, eh?

2015-05-11 17:51

ChristopherJam

Registered: Aug 2004
Posts: 1370

OK, core of an implementation's now up at Codebase64 (cf A new kind of hard-restart )

Test harness attached, feedback welcome.

I just used bottle1 in the end; debugging the timing of the epilogue kept me more than busy enough. Thanks for all your help and encouragement, lft and Frantic!

2015-05-12 14:02

Frantic

Registered: Mar 2003
Posts: 1627

Nothing less than a milestone!

2015-05-13 07:52

Pex Mahoney Tufvesson

Registered: Sep 2003
Posts: 50

Cool! I love this! / Pex
---
Have a noise night!
http://mahoney.c64.org

2015-05-13 10:43

Stainless Steel

Registered: Mar 2003
Posts: 966

Ok now everybody implement this into their player/editor (GLENN, GEIR!! SDI!!!) :D

2015-05-13 12:29

Frantic

Registered: Mar 2003
Posts: 1627

Now I didn't analyze this in full detail yet but if I understand correctly the whole algorithm would have to be repeated 3 times (~30 raster lines) in order to perform this kind of stabilization on all three channels of the SID, right? I mean, that 63 cycle bottle isn't "wide" enough to allow stabilization of all three channels (or any combination of two channels) in the same loop, because there is not enough time to write to all three sid channels inside of it, right? If so, it would be cool to have a version of this thing that allows more than one channel to be stabilized at the same time. Should be possible to do that in less than ~30 lines, right? And best of all would of course be if it was possible to specify flexibly which of the three channels that should be stabilized... :)

2015-05-13 18:09

Mixer

Registered: Apr 2008
Posts: 421

Voice 1 and Voice 2 need that lenghty manipulation.

Voice 3 is a special case because ENV can be read. LFSR is at start when ENV value changes, thus there is a short period after ENV change when to do safe changes.

Smart implementation of above would be nice addition to the general case. (the kind that can be implemented in play routine)

2015-05-13 18:56

lft

Registered: Jul 2007
Posts: 369

Mixer, that's a good point! In my original code in the other thread (Avoiding the ADSR bug in the decay phase.), it should be possible to interleave code for restarting two voices. So in my original estimate of 10 rasterlines overhead, I was taking this into account (i.e. 5 rasterlines for the speedcode, times two because of the three voices). Now, that was an estimate, and ChristopherJam's figure is based on facts. But it would be really nice indeed if this could be squeezed down into 5 lines in total by using the ENV3 register as you suggest. Hmm...

2015-05-13 22:15

ChristopherJam

Registered: Aug 2004
Posts: 1370

Excellent points all, especially about voice 3 being readable. Should even be able to use that information to adjust the phase, just by letting it run at a limit of 32 for a number of loops dependent on the current phase.

Voices 1 and 2 should be doable in parallel to some extent, as per lft's original. The bottle I used is a 36 cycle loop with writes at cycles 3, 27 and 35, so interleaving additional writes to a second voice at cycles 19, 7 and 15 is pretty easy.

However, the implementation of new hard-restart (NHR? Need a better name for what it does..) I posted to codebase
then spends another five lines on the recapture and overflow, largely because bottle1 required a rate limit of 149 (ADSR=4x44)

That last phase might be doable at speed 3 (rate limit 96) if one switched to bottle2, but implementation becomes harder, and the bottles for the two voices would then have to be only partially overlapped, as there are some fairly dense updates towards the end (cf diagram and log above)

bottle3 may become worthwhile after all, as at least then there's a luxurious 23 cycles between the arrival of each chain of potential rc=0 events :)

2015-05-13 22:26

ChristopherJam

Registered: Aug 2004
Posts: 1370

Helpfully, if the voice is reset from a stable raster interrupt, it doesn't matter which raster line the reset is done on - the seven 9 cycle loops fit happily into a 63 cycle raster line! (sorry NTSC..)

Switching between rate limits of 9 and 63 (speeds 0 and 2) would then be safe as long as it's always done in the first nine cycles of a raster line.

Using the other speeds safely would be harder; you'd have to count frames at each rate to calculate the phase shift, then time the rate limit change accordingly..

2015-05-14 05:41

Oswald

Registered: Apr 2002
Posts: 5017

Fast Restart? :) btw no badlines and sprites allowed over that routine, right?

2015-05-14 07:26

ChristopherJam

Registered: Aug 2004
Posts: 1370

Quote: Fast Restart? :) btw no badlines and sprites allowed over that routine, right?

Sadly it's anything but fast - it still needs an ordinary hard-restart (OHR) before that last ten rasters' worth of speedcode.

There may be circumstances when the ratecounter can be recaptured more efficiently mind.

And yes, no DMA or interrupts allowed during the last 600 cycles.

Perhaps Stabilized Hard-Restart?

2015-05-14 08:13

Frantic

Registered: Mar 2003
Posts: 1627

...or simply "stable hard restart"?

2015-05-15 00:19

Mixer

Registered: Apr 2008
Posts: 421

Would this be what they call 'pushing the envelope?'

2015-05-15 01:39

ChristopherJam

Registered: Aug 2004
Posts: 1370

@Mixer: Hah! Yes, I guess it is :D

@Frantic: "stable hard restart" works for me.

2015-05-16 10:47

Laurent

Registered: Apr 2004
Posts: 40

mmmmh I am still missing the point, and sorry if the following is off-topic :)

I thought when you set R to 0 in release state there was indeed a high probability that we have to wait about 0x8000 cycles before the RC gets caught between values 0 and 8. Then setting attack to any value and gate to 1 should start the attack "instantly" with a jitter of 9 cycles isn't it ? I wonder why it isn't good enough as a "known state" ?

However, I have seen many players that seemed to voluntarily triggered a second delay bug before attack restart. This can be especially good for kick/drum instruments that mostly have a 0 attack and starts with a noise waveform.
To do this, after they caught RC between 0 and 8 in release state, they update R to a high value, wait more than 9 cycles so that RC > 9, set AD value (with A = 0), and set gate to 1, hence always triggering another delay bug.

I tried to understand this motivation (it's plain speculation) : there is always a 33ms delay before attack starts, so the noise waveform, held during 2 frames in the instrument table, is only heard for 7ms, which sounds much better than a 20ms noise held one full frame (like in AHX tunes for example).
To avoid the 33ms delay, we would need 2x tune and AD+GATE updated before SR, so that noise would be heard for 10ms, or 3x tune so it would be heard for 6.7ms...

I am certainly missing something big here !!! Please explain the overall goal :)

2015-05-16 13:33

Laurent

Registered: Apr 2004
Posts: 40

To illustrate the "virtuous" side of the ADSR delay bug for the attack, here's voice #2 of /MUSICIANS/P/Prosonix/Hoff_Lars/Cowshit_Jam.sid

How it's originally playing thanks to the 33ms delay (7ms noise)
https://dl.dropboxusercontent.com/u/55933213/cowshit_jam_7ms_no..

If ADSR delay bug was not forced (at best we would have 20ms noise, like in this example):
https://dl.dropboxusercontent.com/u/55933213/cowshit_jam_20ms_n..

This is not an isolated tune, many exploit this feature to shorten the noise time at attack, and it was a major problem for most emulators before we finally got cycle-based ones, like resid.

2015-05-16 16:06

ChristopherJam

Registered: Aug 2004
Posts: 1370

Well, I originally had no intent on using stable hard-reset (SHR) directly in music players; I developed the original 80ms routine so I could do some cycle-exact reads of env3 with the intent of developing a more accurate envelope model than reSID provided. There are still un-emulated bugs that result in some envelope settings seeming safe under emulation that nonetheless occasionally fail or glitch on the real hardware. A faster SHR means I can measure envelopes more quickly, and gather data to compare emulation with hardware in less time per run.

That said, having a player that performed an SHR even once per pattern, when combined with calling the music from a stable raster interrupt, would at least give the musician the possibility to experiment with 'unsafe' envelope settings confident that any envelope glitches will be identical every time the track is played.

SHR could also be used to develop a player that avoids the ADSR bug altogether even with currently 'unsafe' values, by tracking the exact state of RC and only switching the RC limit when RC is below it (of course, it could also trigger the bug on demand :D ) Such a player however, would be a pretty major undertaking for which SHR is but one component.

As for the virtuous ADSR at start of note, yes I believe you are correct; it allows one to have a half frame of a given waveform without resorting to using a 2x player.

2016-02-19 10:00

lft

Registered: Jul 2007
Posts: 369

Quoting ChristopherJam

sieve: 248 cycles, highest rc is 252
...
First attempt at a different approach. The single cycles at Attack=0 should be doable by using an INC instruction, but that relies on SID reading zero. Not sure if this is safe?

No, unfortunately not. Reading the SID will get you the last value that was on the bus, which is a VIC fetch. You could try to time the code so the VIC fetches a zero at the right moment.

However, the real reason I'm responding to this now is that I figured out a different approach that could allow this to be optimised even further: Exploit the bug that selects the decay rate for one cycle when enabling the gate. This allows us to briefly open the bottle, as it were, and in a very clean way allow each possible phase to slip out at the right moment.

So, for instance, if you start with a normal hard restart (clearing ADSR and the control register for two frames) and then do:

lda #$0f
sta $d405
ldx #$00

sta $d404
stx $d404
sta $d404
stx $d404
sta $d404
stx $d404
sta $d404
stx $d404
sta $d404
stx $d404
sta $d404
stx $d404
sta $d404
stx $d404
sta $d404
stx $d404
sta $d404

...then, in just 76 cycles you've distributed the phases into nine possible locations, exactly eight cycles apart.

That's just an illustration, of course, because we need to get them nine cycles apart. But I'm thinking that if we do something similar using rate 1 (32 cycles) as base, we should be able to reach that goal in no more than 32*8 cycles.

2016-02-19 11:40

ChristopherJam

Registered: Aug 2004
Posts: 1370

Quoting lft

That's just an illustration, of course, because we need to get them nine cycles apart.

No no, this is excellent. Eight cycles apart is exactly what we need, because each hole has to catch a different phase. Should be able to shave a good 200 cycles off the SHR this way. Nice work!

Must be said, I've not yet tested to see if there's a similar nybble selection bug with gate off; that might become pertinent at this point, but at worst that would just necessitate setting sustain to a magic number during the bottling.

2016-02-19 13:13

lft

Registered: Jul 2007
Posts: 369

But the point of having them nine cycles apart is that we can then set the decay rate to 0, make use of the synchronous transition from attack to decay, and in that way bring every phase down to the same value. If they are spaced eight cycles apart, they'll still be in different locations once we bottle them up again.

2016-02-19 13:32

ChristopherJam

Registered: Aug 2004
Posts: 1370

Right you are. I realised that a few minutes ago over the washing up and came back here to admit I'm an idiot :)

Yes; as you pointed out, the holes need to be in the 32 cycle ceiling.

2020-03-17 12:28

ChristopherJam

Registered: Aug 2004
Posts: 1370

Quoting lft

Reading the SID will get you the last value that was on the bus, which is a VIC fetch. You could try to time the code so the VIC fetches a zero at the right moment.

Apparently the bus in question is the SIDs internal bus (cf (Ab)use of dummy accesses), not the system bus - so, much more controllable! But yes, using the single cycle at decay rate is still a saner option.

(also, cheers to the mods for letting me fix the image links at the start of this topic!)

Refresh

Subscribe to this thread: