| |
Zer0-X Account closed
Registered: Aug 2008 Posts: 78 |
VSP crash (not solved yet)
I recently found my C64C to be very prone to crash with certain demos and finally managed to create a reproducible crash while banging the $d011 register so I hooked up my logic analyzer and here are some logs of the event taking place.
http://oms.wmhost.com/misc/VSP_Crash_100MHz.zip
A testprogram was looped at address $0ff0. It could've been placed at pretty much any $xxf0 address and it still would crash within few seconds. Running the code at lower offset on the memorypage quite effectively prevented it from crashing on the machine used for testing. A shorter version with only inc/dec/inc/jmp not crossing the page boundary crashes.
The symptoms were always the same; low address byte of the 2nd inc at $xxf7 and/or the opcode of the jmp at $xxff are suddenly trashed. The byte at $xxf7 ends up being 0x00, 0x01, 0x10 or 0x1d. Byte at $xxff ends up being 0x0c, 0x40, 0x48 or 0x4d.
As a post-work the decoupling caps of the memorychips in the C64C used were replaced and a new thick wire delivering power directly to the memorychips was soldered in place. This had no effect and the machine still crashes with this code, as well as with Booze Design demos Royal Arte and Tsunami for example. Powersupply used is a C128 PSU with C64 powercable soldered next to the C128 powercable.
Logfiles VSP Crash 100MHz 3_31.csv/txt have the actual crash event taking place.
|
|
... 98 posts hidden. Click here to view all posts.... |
| |
Zer0-X Account closed
Registered: Aug 2008 Posts: 78 |
Current situation:
- Royal Arte works just fine.
- Dutch Breeze works just fine.
- Tequila Sunrise crashes after a few minutes.
On the C64C:
- Separate cable delivering 5V for the DRAMs.
- Separate cable connecting grounds between VIC and DRAMs.
- Tripled the capacitance of the DRAM decoupling caps.
- Added terminating resistor on RAS.
- Added pull-up resistor on RAS.
Previously the ripple on the DRAM powerlines was around ~600mV, now ~250mV. NEC datasheets say to ensure proper operation of their DRAMs the ripple needs to be kept below 500mV.
Before pull-up RAS was only climbing up to around 3.5V, now it gets all the way to 5V.
But as Tequila Sunrise still crashes something doesn't add up...
|
| |
lft
Registered: Jul 2007 Posts: 369 |
After staring at those logs for an unhealthy amount of time, I think I see it.
Here's the relevant part of "VSP Crash 100MHz 3_31.txt". We enter at the end of
the CPU halfcycle where $41 was written to d011 to enable the badline condition
in the middle of a line.
Time Addr Data R/W RAS CAS BA PH0
-60.01us | 0XD0 | 0X41 0 0 1 1 1
-60us | 0XD0 | 0X41 0 0 1 1 1
-59.99us | 0XD0 | 0X41 0 0 1 1 1
-59.98us | 0XD0 | 0X41 0 0 1 1 0 VIC halfcycle begins.
-59.97us | 0XD0 | 0X41 0 0 1 1 0
-59.96us | 0XD0 | 0X41 1 1 1 1 0
-59.95us | 0XD0 | 0X41 1 1 1 1 0
-59.94us | 0X10 | 0X41 1 1 1 1 0
-59.93us | 0X10 | 0X41 1 1 1 1 0
-59.92us | 0X10 | 0X41 1 1 1 1 0
-59.91us | 0XD0 | 0X01 1 1 1 1 0
-59.9us | 0XD0 | 0X41 1 1 1 1 0
-59.89us | 0XFF | 0X41 1 1 1 1 0 VIC prepares to open row ff to
-59.88us | 0XFF | 0X41 1 1 1 1 0 perform an idle fetch at 39ff.
-59.87us | 0XFF | 0X41 1 1 1 1 0
-59.86us | 0XFF | 0X41 1 1 1 1 0
-59.85us | 0XFF | 0X41 1 1 1 1 0
-59.84us | 0XFF | 0X41 1 1 1 1 0
-59.83us | 0XFF | 0X41 1 1 1 1 0
-59.82us | 0XFF | 0X41 1 1 1 1 0
-59.81us | 0XFF | 0X41 1 1 1 0 0 Badline detected! BA pulled low.
-59.8us | 0XD7 | 0X41 1 0 1 0 0 RAS pulled while address lines
-59.79us | 0XC7 | 0X41 1 0 1 0 0 transition from ff to 00.
-59.78us | 0X07 | 0X41 1 0 1 0 0
-59.77us | 0X00 | 0X41 1 0 1 0 0
-59.76us | 0X00 | 0X41 1 0 1 0 0
-59.75us | 0X38 | 0X41 1 0 1 0 0
-59.74us | 0X78 | 0X41 1 0 1 0 0
-59.73us | 0X78 | 0X41 1 0 1 0 0
-59.72us | 0X78 | 0X41 1 0 1 0 0
-59.71us | 0X78 | 0X41 1 0 1 0 0
-59.7us | 0X38 | 0X41 1 0 1 0 0
-59.69us | 0X38 | 0X41 1 0 0 0 0 It seems that VIC suddenly decided
-59.68us | 0X38 | 0X41 1 0 0 0 0 to read from 3800 instead.
-59.67us | 0X38 | 0X41 1 0 0 0 0
-59.66us | 0X38 | 0X00 1 0 0 0 0
-59.65us | 0X38 | 0X00 1 0 0 0 0 Data at 3800 is 00.
What happens is that VIC is about to do an idle fetch, but suddenly decides to
read from another address instead. This causes RAS to transition to a low level
while the address lines are unstable, so the row multiplexer inside the DRAM
enters a metastable state. This means that it will flicker rapidly between
several rows, connecting and disconnecting them from the column lines. Every
involved row will bleed out some (or all) of its charge, and the column lines
end up containing a mixture of the bits involved. Eventually, the row
multiplexer probably decides on one of the rows, and this one will get
refreshed (with the garbage on the column lines) when RAS is released. The
other rows may or may not have been corrupted in the process.
I can think of two ways forward now. One is to try to come up with a hardware
fix. For instance, it seems that RAS is always pulled low at a certain time
with respect to the current halfcycle. Could we perhaps move it a bit earlier
during each VIC halfcycle? (Note: This might cause problems if the address
lines are changed later under some circumstances, so it calls for extensive
testing.)
The other approach is more software based. Where does the address 3800 come
from? Can we manipulate the VIC state somehow to cause it to read other
addresses here? The other two flibug fetches read from 11ff (as would be
expected from the graphics mode used when the trace was taken). The goal would
be to figure out a way to always make this address have an LSB that matches the
LSB of what VIC would have fetched in that halfcycle if the badline weren't
triggered (which would be ff during refresh and idle mode). That way, no
address lines would change during the RAS strobe, and the metastability would
be avoided. It remains to see if it's possible to achieve a stable VSP this
way, and existing demos would still crash, of course.
Finally, a remark on the phenomenon where a crash prone machine can sometimes
power on in a stable mode. This is a complete shot in the dark, but here's my
guess: The system clock oscillator starts at a random phase with respect to the
power supply ripple. Since there is an integer number of clock cycles per 50 Hz
frame, this phase remains constant (or drifts slowly, depending on the accuracy
of the crystal) once the machine is running. Since the supply voltage affects
the fall and (particularly) rise times of the signals in the c64, the timing
around the critical RAS strobe moment would be slightly different depending on
this phase relationship, and thus affect whether metastability occurs. This
also fits with the observation that better power supplies (with less ripple)
remove the bug in some cases.
|
| |
Frantic
Registered: Mar 2003 Posts: 1647 |
I've said it before, but now I have to say it again: I love this thread! lft may definitely be onto something! |
| |
Kabuto Account closed
Registered: Sep 2004 Posts: 58 |
The frequency of PAL C64 is 50.12 Hz and thus it should quickly desync compared to the (expected) 50 Hz supply voltage.
Another little-known fact about the VIC that *might* contribute in some cases:
The VIC uses a shift register for adressing its 40 12-bit buffer cells that are fetched during bad lines. At the start of a non-idle line (wherever it starts - i.e. at row start or later in case of VSP) a set bit gets shifted in and is shifted by 1 step in every cycle (also during idle lines and in sideborders).
However, when the VSP occurs near the right edge there's still a set bit in the shift register left from previous line when the next line starts, leading to 2 set bits at once.
What happens now depends on circumstances:
* When the next line is an idle line (possible on C128 via $D030 tricks), it won't shift in another set bit at line start (since it's idle), but since there's still a bit set in idle mode it'll apply buffer contents to the idle pattern.
* When this next line is a bad line, it'll have 2 set bits at once and thus write to 2 buffers near the left edge but correct that by overwriting just the right-edge part near the right edge.
* When the next line is a non-bad line (the default), strange things happen. When the VIC reads from both buffer cells at once it mangles both contents more or less. It's AND most of the time but it varies a lot, depending on buffer cell, time (= flickering in some cells), temperature (when the VIC heats up the flickering changes noticeably), model (C128 just ANDs much more often than C64) and individual VIC. And interestingly the result of that is written back into both cells so the bug is also apparent in further lines of the same char row.
This circuity is atypical - instead of the usual NMOS logic stuff (1 by default, 0 wins in case of multiple conflicting writes) this is 0 by default (idle) and in case of conflicts sometimes 0 wins but most of the time 1.
Maybe this causes more power than usual to be drained.
[Edit: clarified a bit]
Something different: someone claimed that VSP crashes are due to refresh being messed up. But that doesn't sound that logical to me. The VIC refreshes the same memory cell every 3.3 milliseconds, so even when disturbed a few times per frame the refresh cycles should still be well within limits of the RAMs being used. What lft wrote above makes much more sense IMO. |
| |
Zer0-X Account closed
Registered: Aug 2008 Posts: 78 |
I hadn't even realised earlier that the address lines weren't stable at that moment!
All I paid attention at was how there was ALWAYS a badline just before the memory got trashed.
|
| |
Kabuto Account closed
Registered: Sep 2004 Posts: 58 |
There's a mistake in my previous post: I meant "and in case of conflicts sometimes 1 wins but most of the time 0" and not the other way round as I accidentally wrote. |
| |
lft
Registered: Jul 2007 Posts: 369 |
We don't know yet whether the VIC always switches from the expected bitmap fetch to some dummy address that ends with $00. Hopefully Zer0-X will provide more data to back it up. But *IF* that is true, then here's a suggestion for a stable VSP routine:
At cycle 54..57 of line $30, create a badline condition (as if you wanted to redraw the last character row of the previous frame). Also make sure to select a character based graphics mode. Furthermore, on the character line in the VIC line buffer (e.g. the last character line from the previous frame), make sure that every character code ends with 5 zero bits. This way, at line $31 RC will be zero, and if we don't do anything, VIC will fetch bitmap data from font locations that end with $00. But if we do VSP somewhere on line $31 (possibly selecting a bitmap mode at the same time), the address lines will already be $00 when VIC changes its mind and tries to read from the dummy address. Hence, there won't be a metastability hazard anymore.
Anyway, I'm just jotting this down quickly before lunch in a feeble attempt to get it off my mind, and I haven't tried it (I don't even have a VSP crashing machine). If you want to be the first to code a stable VSP, here's your opportunity. =) If it doesn't work, this still tells us that the dummy address may not predictably end in $00, which is useful information for further research. |
| |
lft
Registered: Jul 2007 Posts: 369 |
Addendum: Of course, you have to do this on every frame. So you have to somehow fill the VIC line buffer with safe character codes at the end of each frame. I suggest you use a spare video matrix filled with zero bytes or something, and switch to it before the last badline. |
| |
Kabuto Account closed
Registered: Sep 2004 Posts: 58 |
@lft: that won't work, unfortunately.
VSP only works by leaving _idle mode_ in the middle of a line by causing a bad line condition because the VIC doesn't increment its character pointer in idle mode so it ends up having it incremented by less than 40 chars at the end of the line.
It won't work if the line starts non-bad non-idle because also in that mode the character pointer is incremented so by the end of the line the VIC has incremented it by the usual 40 chars.
What your suggestion does: it causes a character row starting at line $31 but without a bad line. So it'll re-use the characters and colors still in its buffers from the last row of the previous frame. Activating a bad line condition in the middle of row $31 now just makes the VIC fetch the remainder of the first row (after 3 "FLI bug" chars of course).
Edit:
It looks like it's impossible to secure VSP. The low address byte of idle fetch is always $FF but the low address byte of a normal fetch won't ever reach that value - unless of course you've used VSP before on the same frame :D |
| |
DeeKay
Registered: Nov 2002 Posts: 363 |
This being-inbetween-states thingy kinda reminds me of what Jens and Ninja found out when they investigated the bug that the MMC Replay would not load properly and crash when some sprite (I think it was Sprite 6) was turned on. Maybe they can share what they found out when measuring the address lines on the expansion port? ;-) |
Previous - 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 - Next |