| |
Zer0-X Account closed
Registered: Aug 2008 Posts: 78 |
VSP crash (not solved yet)
I recently found my C64C to be very prone to crash with certain demos and finally managed to create a reproducible crash while banging the $d011 register so I hooked up my logic analyzer and here are some logs of the event taking place.
http://oms.wmhost.com/misc/VSP_Crash_100MHz.zip
A testprogram was looped at address $0ff0. It could've been placed at pretty much any $xxf0 address and it still would crash within few seconds. Running the code at lower offset on the memorypage quite effectively prevented it from crashing on the machine used for testing. A shorter version with only inc/dec/inc/jmp not crossing the page boundary crashes.
The symptoms were always the same; low address byte of the 2nd inc at $xxf7 and/or the opcode of the jmp at $xxff are suddenly trashed. The byte at $xxf7 ends up being 0x00, 0x01, 0x10 or 0x1d. Byte at $xxff ends up being 0x0c, 0x40, 0x48 or 0x4d.
As a post-work the decoupling caps of the memorychips in the C64C used were replaced and a new thick wire delivering power directly to the memorychips was soldered in place. This had no effect and the machine still crashes with this code, as well as with Booze Design demos Royal Arte and Tsunami for example. Powersupply used is a C128 PSU with C64 powercable soldered next to the C128 powercable.
Logfiles VSP Crash 100MHz 3_31.csv/txt have the actual crash event taking place.
|
|
... 98 posts hidden. Click here to view all posts.... |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: Royal Arte has been running for the past 16 hours now and still no visible glitches. Previously that demo (or rather the Royal Arte logopart) would've worked for 10 minutes at best.
And the fix is of such generic type that it can be made into C64, C64C and C128 all with minimum effort. Should any side-effect whatsoever to be observed, removing it will be just as easy. Ofcourse then we would have to find yet another way to fix the bug.
WVL: Wish I had more powersupplies to test more than one machine at a time. Tho if you can work on some of your old sources I can give it a testdrive.
Now to test this on my other C64C, which is less crashy, and one of the C128s that trashes whole lot of memory when the crash happens.
What is the current fix you're using? (You said you removed the caps). |
| |
Zer0-X Account closed
Registered: Aug 2008 Posts: 78 |
Better not spread too much false information.
Yesterday I "fixed" another C64C and had two machine that wouldn't crash. Then cleaned up some solderwork and I now have two machines that crash just as they did before. Have to spend some time with those during the weekend and figure out what went wrong.
At first it was so promising... and then it VSP'd. |
| |
Zer0-X Account closed
Registered: Aug 2008 Posts: 78 |
Current situation:
- Royal Arte works just fine.
- Dutch Breeze works just fine.
- Tequila Sunrise crashes after a few minutes.
On the C64C:
- Separate cable delivering 5V for the DRAMs.
- Separate cable connecting grounds between VIC and DRAMs.
- Tripled the capacitance of the DRAM decoupling caps.
- Added terminating resistor on RAS.
- Added pull-up resistor on RAS.
Previously the ripple on the DRAM powerlines was around ~600mV, now ~250mV. NEC datasheets say to ensure proper operation of their DRAMs the ripple needs to be kept below 500mV.
Before pull-up RAS was only climbing up to around 3.5V, now it gets all the way to 5V.
But as Tequila Sunrise still crashes something doesn't add up...
|
| |
lft
Registered: Jul 2007 Posts: 369 |
After staring at those logs for an unhealthy amount of time, I think I see it.
Here's the relevant part of "VSP Crash 100MHz 3_31.txt". We enter at the end of
the CPU halfcycle where $41 was written to d011 to enable the badline condition
in the middle of a line.
Time Addr Data R/W RAS CAS BA PH0
-60.01us | 0XD0 | 0X41 0 0 1 1 1
-60us | 0XD0 | 0X41 0 0 1 1 1
-59.99us | 0XD0 | 0X41 0 0 1 1 1
-59.98us | 0XD0 | 0X41 0 0 1 1 0 VIC halfcycle begins.
-59.97us | 0XD0 | 0X41 0 0 1 1 0
-59.96us | 0XD0 | 0X41 1 1 1 1 0
-59.95us | 0XD0 | 0X41 1 1 1 1 0
-59.94us | 0X10 | 0X41 1 1 1 1 0
-59.93us | 0X10 | 0X41 1 1 1 1 0
-59.92us | 0X10 | 0X41 1 1 1 1 0
-59.91us | 0XD0 | 0X01 1 1 1 1 0
-59.9us | 0XD0 | 0X41 1 1 1 1 0
-59.89us | 0XFF | 0X41 1 1 1 1 0 VIC prepares to open row ff to
-59.88us | 0XFF | 0X41 1 1 1 1 0 perform an idle fetch at 39ff.
-59.87us | 0XFF | 0X41 1 1 1 1 0
-59.86us | 0XFF | 0X41 1 1 1 1 0
-59.85us | 0XFF | 0X41 1 1 1 1 0
-59.84us | 0XFF | 0X41 1 1 1 1 0
-59.83us | 0XFF | 0X41 1 1 1 1 0
-59.82us | 0XFF | 0X41 1 1 1 1 0
-59.81us | 0XFF | 0X41 1 1 1 0 0 Badline detected! BA pulled low.
-59.8us | 0XD7 | 0X41 1 0 1 0 0 RAS pulled while address lines
-59.79us | 0XC7 | 0X41 1 0 1 0 0 transition from ff to 00.
-59.78us | 0X07 | 0X41 1 0 1 0 0
-59.77us | 0X00 | 0X41 1 0 1 0 0
-59.76us | 0X00 | 0X41 1 0 1 0 0
-59.75us | 0X38 | 0X41 1 0 1 0 0
-59.74us | 0X78 | 0X41 1 0 1 0 0
-59.73us | 0X78 | 0X41 1 0 1 0 0
-59.72us | 0X78 | 0X41 1 0 1 0 0
-59.71us | 0X78 | 0X41 1 0 1 0 0
-59.7us | 0X38 | 0X41 1 0 1 0 0
-59.69us | 0X38 | 0X41 1 0 0 0 0 It seems that VIC suddenly decided
-59.68us | 0X38 | 0X41 1 0 0 0 0 to read from 3800 instead.
-59.67us | 0X38 | 0X41 1 0 0 0 0
-59.66us | 0X38 | 0X00 1 0 0 0 0
-59.65us | 0X38 | 0X00 1 0 0 0 0 Data at 3800 is 00.
What happens is that VIC is about to do an idle fetch, but suddenly decides to
read from another address instead. This causes RAS to transition to a low level
while the address lines are unstable, so the row multiplexer inside the DRAM
enters a metastable state. This means that it will flicker rapidly between
several rows, connecting and disconnecting them from the column lines. Every
involved row will bleed out some (or all) of its charge, and the column lines
end up containing a mixture of the bits involved. Eventually, the row
multiplexer probably decides on one of the rows, and this one will get
refreshed (with the garbage on the column lines) when RAS is released. The
other rows may or may not have been corrupted in the process.
I can think of two ways forward now. One is to try to come up with a hardware
fix. For instance, it seems that RAS is always pulled low at a certain time
with respect to the current halfcycle. Could we perhaps move it a bit earlier
during each VIC halfcycle? (Note: This might cause problems if the address
lines are changed later under some circumstances, so it calls for extensive
testing.)
The other approach is more software based. Where does the address 3800 come
from? Can we manipulate the VIC state somehow to cause it to read other
addresses here? The other two flibug fetches read from 11ff (as would be
expected from the graphics mode used when the trace was taken). The goal would
be to figure out a way to always make this address have an LSB that matches the
LSB of what VIC would have fetched in that halfcycle if the badline weren't
triggered (which would be ff during refresh and idle mode). That way, no
address lines would change during the RAS strobe, and the metastability would
be avoided. It remains to see if it's possible to achieve a stable VSP this
way, and existing demos would still crash, of course.
Finally, a remark on the phenomenon where a crash prone machine can sometimes
power on in a stable mode. This is a complete shot in the dark, but here's my
guess: The system clock oscillator starts at a random phase with respect to the
power supply ripple. Since there is an integer number of clock cycles per 50 Hz
frame, this phase remains constant (or drifts slowly, depending on the accuracy
of the crystal) once the machine is running. Since the supply voltage affects
the fall and (particularly) rise times of the signals in the c64, the timing
around the critical RAS strobe moment would be slightly different depending on
this phase relationship, and thus affect whether metastability occurs. This
also fits with the observation that better power supplies (with less ripple)
remove the bug in some cases.
|
| |
Frantic
Registered: Mar 2003 Posts: 1646 |
I've said it before, but now I have to say it again: I love this thread! lft may definitely be onto something! |
| |
Kabuto Account closed
Registered: Sep 2004 Posts: 58 |
The frequency of PAL C64 is 50.12 Hz and thus it should quickly desync compared to the (expected) 50 Hz supply voltage.
Another little-known fact about the VIC that *might* contribute in some cases:
The VIC uses a shift register for adressing its 40 12-bit buffer cells that are fetched during bad lines. At the start of a non-idle line (wherever it starts - i.e. at row start or later in case of VSP) a set bit gets shifted in and is shifted by 1 step in every cycle (also during idle lines and in sideborders).
However, when the VSP occurs near the right edge there's still a set bit in the shift register left from previous line when the next line starts, leading to 2 set bits at once.
What happens now depends on circumstances:
* When the next line is an idle line (possible on C128 via $D030 tricks), it won't shift in another set bit at line start (since it's idle), but since there's still a bit set in idle mode it'll apply buffer contents to the idle pattern.
* When this next line is a bad line, it'll have 2 set bits at once and thus write to 2 buffers near the left edge but correct that by overwriting just the right-edge part near the right edge.
* When the next line is a non-bad line (the default), strange things happen. When the VIC reads from both buffer cells at once it mangles both contents more or less. It's AND most of the time but it varies a lot, depending on buffer cell, time (= flickering in some cells), temperature (when the VIC heats up the flickering changes noticeably), model (C128 just ANDs much more often than C64) and individual VIC. And interestingly the result of that is written back into both cells so the bug is also apparent in further lines of the same char row.
This circuity is atypical - instead of the usual NMOS logic stuff (1 by default, 0 wins in case of multiple conflicting writes) this is 0 by default (idle) and in case of conflicts sometimes 0 wins but most of the time 1.
Maybe this causes more power than usual to be drained.
[Edit: clarified a bit]
Something different: someone claimed that VSP crashes are due to refresh being messed up. But that doesn't sound that logical to me. The VIC refreshes the same memory cell every 3.3 milliseconds, so even when disturbed a few times per frame the refresh cycles should still be well within limits of the RAMs being used. What lft wrote above makes much more sense IMO. |
| |
Zer0-X Account closed
Registered: Aug 2008 Posts: 78 |
I hadn't even realised earlier that the address lines weren't stable at that moment!
All I paid attention at was how there was ALWAYS a badline just before the memory got trashed.
|
| |
Kabuto Account closed
Registered: Sep 2004 Posts: 58 |
There's a mistake in my previous post: I meant "and in case of conflicts sometimes 1 wins but most of the time 0" and not the other way round as I accidentally wrote. |
| |
lft
Registered: Jul 2007 Posts: 369 |
We don't know yet whether the VIC always switches from the expected bitmap fetch to some dummy address that ends with $00. Hopefully Zer0-X will provide more data to back it up. But *IF* that is true, then here's a suggestion for a stable VSP routine:
At cycle 54..57 of line $30, create a badline condition (as if you wanted to redraw the last character row of the previous frame). Also make sure to select a character based graphics mode. Furthermore, on the character line in the VIC line buffer (e.g. the last character line from the previous frame), make sure that every character code ends with 5 zero bits. This way, at line $31 RC will be zero, and if we don't do anything, VIC will fetch bitmap data from font locations that end with $00. But if we do VSP somewhere on line $31 (possibly selecting a bitmap mode at the same time), the address lines will already be $00 when VIC changes its mind and tries to read from the dummy address. Hence, there won't be a metastability hazard anymore.
Anyway, I'm just jotting this down quickly before lunch in a feeble attempt to get it off my mind, and I haven't tried it (I don't even have a VSP crashing machine). If you want to be the first to code a stable VSP, here's your opportunity. =) If it doesn't work, this still tells us that the dummy address may not predictably end in $00, which is useful information for further research. |
| |
lft
Registered: Jul 2007 Posts: 369 |
Addendum: Of course, you have to do this on every frame. So you have to somehow fill the VIC line buffer with safe character codes at the end of each frame. I suggest you use a spare video matrix filled with zero bytes or something, and switch to it before the last badline. |
Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 - Next |