[CSDb] - User Forums - New VSP discovery

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > New VSP discovery

2013-07-19 21:01

lft

Registered: Jul 2007
Posts: 369

New VSP discovery

First off, this is what we already knew: VSP causes the VIC chip to briefly
place a logically undefined value on the DRAM address lines during the
halfcycle following the write to d011. If the undefined value coincides with
the RAS signal, every memory cell with an xxx7 or xxxf address is at risk of
getting corrupted. The relative timing of the undefined value and RAS depends
on several factors including temperature.

We also knew that the undefined value could be delayed slightly if VSP was
triggered by setting the DEN bit instead of modifying YSCROLL. This was enough
to avoid a crash on some machines.

I wanted to investigate whether there were other ways of controlling the timing
of the undefined value. Based on a combination of educated guesswork, luck and
plenty of trial-and-error, I could observe the following: The timing depends on
the specific 3-bit value that is written to YSCROLL, as well as the 3-bit value
that was stored in YSCROLL previously.

This means that we can trigger VSP using one of 56 methods (eight different
YSCROLL values for various rasterlines, seven non-matching YSCROLL values to
switch from), each with slightly different timing.

Using the techniques from my Safe VSP demo, I created a tool that would trigger
VSP many times, check if memory got corrupted, and keep track of the number of
crashes caused by each of the 56 methods. I then looked for a pattern in these
statistics.

Intriguingly, if I arranged the 56 crash counters in a grid with the vertical
axis corresponding to the rasterline and the horizontal axis corresponding to
the exclusive-or between the rasterline and the dummy value that was stored in
d011 prior to the VSP, then the crashes would tend to occur only in a subset of
the columns. When my crash prone c64 is powered on, the VIC chip is cold, and
there are no crashes. Within a minute, crashes start to appear in column 7
(meaning that all three bits of YSCROLL were flipped). As the chip heats up,
more crashes begin to appear in columns 3, 5 and 6 (two bits flipped). After
several more minutes, crashes show up also in columns 1, 2 and finally 4 (a
single bit flipped), but by this time, there are no longer any crashes in
columns 5, 6 or 7. Finally, when the VIC chip has reached a stable working
temperature, my machine no longer crashes.

This is what it might look like four minutes after power-on:

Now, let me stress that I only have one VSP-crashing c64, and these results
might not carry over to other machines. I hope they do, though. I would very
much like you (yes, you!) to run VSP Lab (described below) on your crash prone
machines and report what happens.

Is this useful? Short answer: Yes, very. But it hinges on whether the behaviour
of my c64 is typical. Even without the mentioned regularity in the columns, it
would be possible to find a few safe combinations for a given machine and a
given temperature. But the regularity makes it so much more practical and also
easier to explain to all C64 users, not just coders.

Let's refer to the seven columns as "VSP channels". For a given machine at a
given temperature, some of these channels are safe, and possibly some of them
are unsafe. It takes about 5-10 minutes for the VIC chip to reach its working
temperature. If you know that e.g. VSP channel #5 is safe on your machine, and
you can somehow tell a demo or game to use that specific channel, then VSP
won't crash.

My measurement tool evolved into a program called VSP Lab, depicted above,
which you can use to find out which VSP channels are safe to use on your
machine. It triggers a lot of VSP operations and visualises the crashes in a
grid, where each column corresponds to a VSP channel. Remember that a cold and
a hot VIC behave differently, so don't trust the measurements until about ten
minutes after power-on. You can reset the grid highlights using F1 to see if
channels which were unsafe before have become safe.

Demos and games could prompt the user for a VSP channel to use, or try to
determine it automatically using the same technique that VSP Lab is based on.

From a coding point of view, all you then have to do in order to implement
crash-free VSP, is to prepare the value X that you'll write to d011 to trigger
VSP, and the value Y which is X ^ vsp_channel. Then, on the rasterline where
you want to do VSP, you just wait until the time is right and do:

        sty $d011
        stx $d011

On the VSP Lab disk image, there's a small demo effect that you can run. It
will ask you for a VSP channel to use, and if you give it a safe number, it
should not crash.

This technique is so simple and non-intrusive that it's quite feasible to patch
existing games and demos, VSP-fixing them.

Also, this discovery explains the old wisdom that if you attempt VSP more than
once per frame, the routine will be more likely to crash. Here's why: In a demo
effect, you typically perform VSP on a fixed rasterline, so the value you write
to d011 will be constant. It is reasonable to assume that the old value of
YSCROLL will also be constant. Therefore, a given VSP effect will consistently
end up in the same VSP channel. On a machine with N safe VSP channels, the
probability of survival is therefore p = N / 7. If you do VSP on two different
rasterlines, each VSP will likewise end up in a channel, but not necessarily
the same one. The probability that both end up in a safe channel is p*p. If we
assume that most crash prone machines have at least one safe channel, we have
0 < p < 1 and therefore p*p < p. Q.E.D. To verify this, I patched vice to
report the channel every time VSP was performed. Sure enough, VSP&IK+
consistently uses VSP channel 1, as does Royal Arte. Krestage 3 uses VSP
channel 2. The intro of Tequila Sunrise, which performs VSP twice per frame,
uses VSP channels 1 and 3, and so does Safe VSP.

Finally, I will attempt to explain the observed behaviour at the electronical
level. Suppose each bit of YSCROLL is continually compared to the corresponding
bit in the Y raster counter, using XOR gates. The outputs of the XOR gates are
routed to a triple-input NOR gate, the output of which is therefore high if and
only if the three bits match. A triple-input NOR gate in NMOS would consist of
a pull-up resistor and three pull-down transistors. But the output of the NOR
gate is not a perfect boolean signal, because the transistors are not ideal.
When they are closed, they act like small-valued resistors, pulling the output
towards -- but not all the way down to -- ground potential. When YSCROLL
differs from the raster position by three bits, all three transistors
contribute, and the output reaches a low voltage. When the difference is two
bits, only two transistors pull, so the output voltage is slightly higher. For
a one-bit difference, the voltage is even higher (but still a logic zero, of
course). When we trigger VSP, all transistors stop pulling the voltage down,
and because of the resistor, the output voltage will begin to rise. But the
time it needs in order to rise to a logic one depends on the voltage at which
it begins. Thus, the more bits that change in YSCROLL, the longer it takes
until the match signal is asserted.

I have a fair amount of confidence in this theory, but need more data to
confirm it. And, once again, this is only of practical use if the average crash
prone machine has safe channels, like mine does. So please check your
equipment! I'm looking forward to your reports.

... 43 posts hidden. Click here to view all posts....

2013-07-21 12:27

ChristopherJam

Registered: Aug 2004
Posts: 1378

What am I talking about, lft has already included such a pattern in the gap. edit to suggestion- keep an eye out for the phase of the red/blue bars :)

2013-07-25 20:20

lft

Registered: Jul 2007
Posts: 369

Time to back up my claims with more data.

I added a logging feature to VSP Lab (VSP Lab V1.1), that counts the number of crashes
on each VSP channel for one hour and plots a histogram. One minute corresponds
to five horizontal pixels, and the start of each minute is indicated with a
small dot below the chart. There is no built-in function to save the chart to
disk, but with a monitor cartridge it's just a matter of:

S"FILENAME",8,A000,AA00

Hedning gallantly lent me five machines, which I have measured several times.
The results largely confirm my theories, but the following additional facts
have surfaced:

1. The working temperature rises quickly for approximately ten minutes, then it
keeps rising slowly for at least an hour, probably more. This can be seen in
the charts where the machine was started cold: Changes in the set of safe
channels are close together towards the left end of the graph and more
drawn out towards the right.

The machine also appears to cool down rather quickly when you switch off the
power.

2. At power-on, the machine selects randomly among a handful of "behaviours".
It will stick with this behaviour across resets (neither the VIC chip, the RAM
nor the system clock generator are connected to the reset signal), but can
change if you do a power-cycle. Remember that the 1541 Ultimate reset button by
default also resets the emulated 1541, so there's no need to power-cycle a C64
unless you're going to plug/unplug hardware.

Here are the logs from two of Hedning's machines that we can call H4 (serial
number 491239) and H5 (311520). They are not displayed in chronological order.
Rather, I have identified two behaviours for each machine, and sorted the
charts according to these.

H5, behaviour 1:

Cold start:

Warm start:

H5, behaviour 2:

Cold start:

Warm start:

The last one of those is possibly a third behaviour for this machine.

H4, behaviour 1:

Cold start:

Warm start:

To quote Devia in a different thread:

"in my experience a c64 running VSP code with no problems, might fail after a
power cycle while continuing to run stable between resets."

Clearly, behaviour 1 of machine H4 (once it's warmed up) is precisely such a
stable mode, but a power-cycle might drop us into:

H4, behaviour 2:

Cold start:

(not captured)

Warm start:

Both machines H4 and H5 use the new VIC chip and the short motherboard.
However, Soundemon has reported VSP crashes on a 5-luma VIC in a breadbox.

Five bits are flipping as the address bus goes from $ff to $07. Three major
factors contribute to the timing of the five bit transitions with respect to
the RAS signal: The power-on behaviour, the number of set bits in the VSP
channel number, and the temperature. Apart from this, the rasterline and the
specific VSP channel will also affect the timing due to process variations, but
this is less pronounced.

The five bits don't necessarily flip at the same time. The relative timing of
the bit-flips appears to be stable for a given machine. If any of these
coincide with RAS, there is risk for a crash.

As you can see from the charts, there is a general pattern of progressing from
the 3-bit channel (7) via the 2-bit channels (3, 5, 6) to the 1-bit channels
(1, 2, 4). On channel 7, the bit-flips are late. As temperature goes up, they
get even later, and the bit-flip that coincided with RAS gets pushed out of the
critical zone. Meanwhile, if we perform VSP on channel 3, 5 or 6, the bit-flips
are a bit earlier and don't coincide with RAS. As temperature goes up, a
bit-flip gets pushed into the critical zone. Sometimes, like in the first two
charts, we see crashes disappearing from channels 1, 2 and 4 only to appear
later in channel 7. What we see is the next of the five bit transitions
approaching RAS.

Depending on the relative timing of the five bit-flips, there could be machines
which consistently crash on all channels. If you have such a machine, I'm
afraid these techniques will not help you.

Otherwise, the practical method for avoiding VSP crashes goes something like
this: Power on the machine. Run VSP Lab for ten minutes and look at the log.
Note which channels are unsafe; try to predict the next thing that will happen,
based on the ordering described above. E.g. if some 1-bit channels recently
became unsafe, then the remaining 1-bit channels might also become unsafe
within the next hours. On the other hand, if some 1-bit channels recently
became safe, then you should be more worried about channel 7 becoming unsafe.
Note down what VSP channel is safe, and select this channel when watching demos
or playing games. Don't power-cycle the machine, but use the reset button!
Every time you power-cycle, you have to do the whole procedure again, running
VSP Lab and looking at the log, although you won't have to wait the full ten
minutes if the machine is already warm.

If your machine crashes on all channels when warm, but not when cold, then you
might be able to fix it by mounting a fan over one of the ports at the back of
the machine.

If a machine survives several hours of VSP Lab without a single crash, and you
can repeat this several times with power-cycles in between, then you can be
fairly confident that the machine will never crash on VSP.

I propose the following:

1. Demo/game coders should include a VSP channel selector (if VSP is used). We
should add options in Vice to crash on a subset of the channels, to help with
debugging.

2. For home use, a machine only needs to be safe on one channel. This will
finally allow many people to watch the latest demos on real hardware.

3. Compo machines should be safe on all channels. This can be verified using
VSP Lab.

The great benefit is #2, which requires some effort on the part of the coders
(#1). A side-benefit of VSP Lab is that party organisers now have an easy way
to verify #3.

2013-07-25 22:46

hedning

Registered: Mar 2009
Posts: 4598

Very interesting! Thanx to lft I now know that I have a VSP-safe C64C ready for the BFP and future Gubbdata compos. ;)

2013-07-26 02:07

chatGPZ

Registered: Dec 2001
Posts: 11121

Quote:

Demo/game coders should include a VSP channel selector (if VSP is used).

i honestly doubt anyone would bother. while the whole issue is very interesting by itself - i really dont see how it is a major problem in practise.

being able to find vsp-crashing machines is nice though. and then using the good old hammer method to fix them =P

2013-07-26 04:51

ChristopherJam

Registered: Aug 2004
Posts: 1378

@lft, is there a difference in colour phase displayed in the $3fff area when H5 is exhibiting behaviour 1 to when it is exhibiting behaviour 2?

2013-07-26 05:10

soci

Registered: Sep 2003
Posts: 473

Now that we have this nice testing tool it would be interesting to see what happens when RAS and other signals are manipulated in hardware ;)

2013-07-26 08:03

Frantic

Registered: Mar 2003
Posts: 1627

Quote: Very interesting! Thanx to lft I now know that I have a VSP-safe C64C ready for the BFP and future Gubbdata compos. ;)

Classy! :)

2013-07-26 20:27

lft

Registered: Jul 2007
Posts: 369

Quoting ChristopherJam

@lft, is there a difference in colour phase displayed in the $3fff area when H5 is exhibiting behaviour 1 to when it is exhibiting behaviour 2?

To some extent.

I increased the contrast by changing to white background, then wrote down the colour carrier phase in terms of the order of the artefact colour bars (magenta/cyan, and how much of the first one is visible), along with the observed behaviour. I did this about ten times, and for some phases I would always see the same behaviour, whereas for other phases the behaviour was unpredictable. With such a low number of samples, the predictable ones might of course in fact be unpredictable.

2013-08-02 18:42

Zer0-X

Registered: Aug 2008
Posts: 78

Despite taking numerous screenshots of the clock phases still ended up with one gap. But 16 powerup states can be identified for the clocks.

2013-08-04 11:28

WVL

Registered: Mar 2002
Posts: 886

Interesting screenshots (is it really called that?). I wonder if there's a simple way to reliably detect what powerup-state you're in. I know you can't determine with software (or is it?) and you would need at least some extra hardware. I can imagine some simple piece of addon electronics that measure the powerup state and put the result on some register that the ROM can read and display on the screen on powerup.

Like

**** commodore 64 basic v2 ****

64K ram system 38911 basic bytes free

powerup state : 12

ready.

Then the user could decide to powerup again to get to a stable state for his/her machine.

Previous - 1 | 2 | 3 | 4 | 5 | 6 - Next

Refresh

Subscribe to this thread: