| |
chatGPZ
Registered: Dec 2001 Posts: 11136 |
(Ab)use of dummy accesses
For the next release of my "No more Secrets" doc i am preparing a chapter related to the dummy access which happen when the CPU performs an internal operation. Once again i am looking for some examples on how to (ab)use it :) I guess everyone knows "inc $d019" - but i am sure there is more than this. And not only with RMW instructions. So if you have anything in your mind - just drop it here!
here are some related notes which i pasted together. feel free to proofread and point out mistakes :) |
|
| |
JackAsser
Registered: Jun 2002 Posts: 1990 |
Lovely stuff!!!
” Most 1-Byte instructions will fetch PC+1 after the opcode fetch”
All 1-byte right? |
| |
Fred
Registered: Feb 2003 Posts: 284 |
A good example of the usage of fetch next opcode (NewPC) is to acknowledge NMI after executing the RTI instruction:
JMP $DD0C
;DD0C 40 RTI
This will execute the RTI instruction at $DD0C but since it will also read the next opcode, it will perform a read at $DD0D which will acknowledge the NMI.
Some music routines (like the 8bit digi routine from THCM) make use of this to win some cycles to end the NMI routine. |
| |
Fred
Registered: Feb 2003 Posts: 284 |
The music routine from Fred Gray performs a read and write on IO with:
LDA #$40
STA $D404
INC $D404
Which will toggle the gate bit of the control register of the SID chip. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11136 |
JA: yeah, dunno why i wrote "some" :) (the byte after opcode will simply be fetched always)
Fred: ok, this saves one cycle :) |
| |
Fred
Registered: Feb 2003 Posts: 284 |
Another usage of a dummy read cycle is the following code:
LDX #$F0
LDA $DC1D,x
This will do a dummy read at $DC0D and a normal read at $DD0D. With this way you can acknowledge both CIA timers in one instruction.
The way it works is that the CPU will first read the address without correcting the high byte in cycle T3:
T0 Fetch opcode
T1 Fetch low order byte of Base Address
T2 Fetch high order byte of Base Address
T3 Fetch data (no page crossing)
T4 Fetch data from next page
T0 Next Instruction
and then it fetches the data from the next page at cycle T4. |
| |
Copyfault
Registered: Dec 2001 Posts: 466 |
Feeding data to the sprite pattern pipe for a sprite that is displayed "far out right" which did not have its DMA-cycles before comes to mind.
IIRC, the only way to get all three pattern bytes filled correctly you needed aSTA VIC_REG,x at the correct position in the rasterline, s.t. the 4th cycle occurs at the first sprite DMA-cycle and the 5th (the W-cycle) at the 2nd sprite DMA-cycle. This way, the sprite pattern byte was filled with byte read in 4th cycle from the (uncorrected!) VIC-adress | ghostbyte | byte stored in 5th cycle , so the internal operation cycle was mandatory to achieve this. |
| |
Compyx
Registered: Jan 2005 Posts: 631 |
Copyfault: are you talking about the 'x-stretch' effect you get with dysp's when the low-index sprites go too far into the right border? |
| |
chatGPZ
Registered: Dec 2001 Posts: 11136 |
Quote:The music routine from Fred Gray performs a read and write on IO
thats not abusing the dummy accesses though.... it relies on the floating bus value (what a terrible idea =P) |
| |
tlr
Registered: Sep 2003 Posts: 1723 |
Quote: Copyfault: are you talking about the 'x-stretch' effect you get with dysp's when the low-index sprites go too far into the right border?
It refers to the long mysterious $ff glitches usually appearing at the top of sprite #0 when moved far right (it appears for all sprites but most not visible). Those can be controlled by placing the right values on the internal vic-bus in two adjacent cycles.
There's a length discussion in a thread somewhere. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11136 |
it would be great if you'd post an actual working example, that would safe a lot of time =) |
| |
tlr
Registered: Sep 2003 Posts: 1723 |
Quote: it would be great if you'd post an actual working example, that would safe a lot of time =)
Copyfault implemented that here: Sideborder Sprite Data Fetch TestProg, presumably based on my suggestion in post #12 of this thread: Sprite data fetch in sideborder |
| |
Copyfault
Registered: Dec 2001 Posts: 466 |
Sorry for not posting the links.
Thanks a bunch, tlr! And ofcourse, the basic idea for getting the pattern bytes at the right positions of the sprite data pipe was yours!!! |
| |
Fred
Registered: Feb 2003 Posts: 284 |
Quote: Quote:The music routine from Fred Gray performs a read and write on IO
thats not abusing the dummy accesses though.... it relies on the floating bus value (what a terrible idea =P)
The dummy write in the INC instruction is still executed but perhaps in this example has no audible impact on the output of the SID. The dummy access writes the same value to the SID again. I can imagine that when the gate bit was already set and with the INC instruction set again in the first write, that it will have impact on the ADSR state.
Also it will write 2 times to the SID with 1 cycle difference which is not possible with any other instruction. This cycle difference may be abused with any IO register.
Some example of my CIA test routine which makes use of the dummy access:
https://sourceforge.net/p/vice-emu/bugs/740/ |
| |
chatGPZ
Registered: Dec 2001 Posts: 11136 |
Ah yes, it would indeed introduce a small difference in timing. ok :) |
| |
chatGPZ
Registered: Dec 2001 Posts: 11136 |
come on coronapeoples, this cant be all! :) |
| |
tlr
Registered: Sep 2003 Posts: 1723 |
You could always use RMW instructions to make double grey dots. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1380 |
Quoting GroepazQuote:The music routine from Fred Gray performs a read and write on IO
thats not abusing the dummy accesses though.... it relies on the floating bus value (what a terrible idea =P)
I though reads from SID were explicitly zero, rather than floating bus value?
I'm fairly sure one of the iterations in developing a stable hard restart used one cycle blips of the gate bit to allow RC to escape at known times, but it didn't turn out to be the most optimal. I'll have a rummage. |
| |
Fred
Registered: Feb 2003 Posts: 284 |
Quoting ChristopherJam Quoting GroepazQuote:The music routine from Fred Gray performs a read and write on IO
I though reads from SID were explicitly zero, rather than floating bus value?
From the documentation of resid-fp:
Reading a write only register returns the last char written to any SID register. The individual bits in this value start to fade down towards zero after a few cycles. All bits reach zero within approximately $2000 - $4000 cycles. It has been claimed that this fading happens in an orderly fashion, however sampling of write only registers reveals that this is not the case. NB! This is not correctly modeled. The actual use of write only registers has largely been made in the belief that all SID registers are readable. To support this belief the read would have to be done immediately after a write to the same register (remember that an intermediate write to another register would yield that value instead). With this in mind we return the last value written to any SID register for $2000 cycles without modeling the bit fading. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1380 |
Oh, damn. Thanks for that, Fred.
I guess you could safely write 0 then 1 if you first wrote a zero to some other SID register before INC $d404, but this is all sounding a bit flaky now :) |
| |
tlr
Registered: Sep 2003 Posts: 1723 |
Are we counting things like inc $d016;dec $d016 in this btw? More related to BA i guess but if there weren't so many dummy cycles it wouldn't work. |
| |
Fred
Registered: Feb 2003 Posts: 284 |
Quote: Oh, damn. Thanks for that, Fred.
I guess you could safely write 0 then 1 if you first wrote a zero to some other SID register before INC $d404, but this is all sounding a bit flaky now :)
That's why the music routine of Fred Gray first writes to $D404 and then immediately increases it to toggle the gate bit like:
STA $D404
INC $D404 |
| |
CyberBrain Administrator
Posts: 392 |
Quote: That's why the music routine of Fred Gray first writes to $D404 and then immediately increases it to toggle the gate bit like:
STA $D404
INC $D404
I wouldn't say so, since that snippet writes to the same register that it addresses with the INC afterwards, which doesn't really abuse the dummy-write.
In the snippet, the INC dummy-write just writes whatever was already there, which to my knowledge doesn't cause any side effect for $D404. (Am i wrong?)
But according to the resid-fp documentation, an INC doesn't necessarily have to set the register to the same value that the register already had in its first write cycle.
It can set the register to any value V you want in the first write cycle, and then set the register to V+1 one cycle later (at the second write cycle).
This could for example be used to toggle the gate and then toggle it again the next cycle. (Not sure if that's useful)
// Example - assume the gate-bit is 1 here
lda #%xxxxxxx0 // <- Select whatever waveform, etc, bits you want here, but keep bit 0 zero.
sta $D4xx // <- Some SID-register we don't use and is not audible (pulse-width for example)
inc $D404 // <- Cycle 5: set gate=0. Cycle 6: set gate=1.
Normally the minimum delay between toggling would be 4 cycles. Same could be done for any other write-only SID register (+ with ROL/ROR/ASL/LSR).
I wonder how reliable it is reading the write-only registers - it sounds like it is pretty reliable when done within "a few cycles" judging from the wording from resid-fp. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11136 |
its pretty reliable even with a surprising number of cycles gap... see the "bitfade" tests :) |
| |
CyberBrain Administrator
Posts: 392 |
Really interesting! There must be some use-case for this (perhaps controlling the internal counters as ChristopherJam mentioned)
An update to the example i posted: Since reading a write-only register returns the last written value to ANY register (not just to write-only registers), you don't even have to trash the value of a register for this trick to work. Just write to one of the read-only registers. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11136 |
Also a good way to show the finger to shitty replacements =P |
| |
Frantic
Registered: Mar 2003 Posts: 1628 |
Quote: Also a good way to show the finger to shitty replacements =P
Ah.. yes! I'll remember that! :) |
| |
Compyx
Registered: Jan 2005 Posts: 631 |
Quoting tlrAre we counting things like inc $d016;dec $d016 in this btw? More related to BA i guess but if there weren't so many dummy cycles it wouldn't work.
I wouldn't count that under the dummy writes/reads. You're just wasting cycles with a RMW instruction to inhibit sprite 0 DMA screwing with opening the border. |
| |
ChristopherJam
Registered: Aug 2004 Posts: 1380 |
Quoting ChristopherJamI'm fairly sure one of the iterations in developing a stable hard restart used one cycle blips of the gate bit to allow RC to escape at known times, but it didn't turn out to be the most optimal. I'll have a rummage.
"sieve" at SID envelope rate counter phase alignment - which just reminds me I need to fix the hosting of those images and runlogs..
But yes, interesting that the readback is perfectly reliable if you're safely under 2000 cycles. Presumably one could first write to one of the undefined registers between $1d and $1f to much the same effect. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11136 |
Hey, is that all? *push* :) |
| |
S.E.S.
Registered: Apr 2010 Posts: 19 |
If you want to have raster splits that are exactly 5 cycles wide, you can use ldx #$ff
ldy #$05
lda #$00
sty $d021
sta $cf22,x I don't know if anybody actually used that in an intro or a demo, though :) |
| |
Oswald
Registered: Apr 2002 Posts: 5025 |
make em 3 cycles wide :) |
| |
Hoogo
Registered: Jun 2002 Posts: 102 |
What about the color of the colorram in multicolor FLI? |
| |
chatGPZ
Registered: Dec 2001 Posts: 11136 |
what about it? |
| |
Hoogo
Registered: Jun 2002 Posts: 102 |
Is it an example for "side effect of fetching next opcode" for your purpose? |
| |
chatGPZ
Registered: Dec 2001 Posts: 11136 |
yes and no - this particular case is already described in detail in the pdf ("Blackmail FLI") :) |
| |
chatGPZ
Registered: Dec 2001 Posts: 11136 |
Quote: Lovely stuff!!!
” Most 1-Byte instructions will fetch PC+1 after the opcode fetch”
All 1-byte right?
btw, its not all of them .... the various JAM opcodes will stall before that happens. not that it matters :) |
| |
JackAsser
Registered: Jun 2002 Posts: 1990 |
Quote: btw, its not all of them .... the various JAM opcodes will stall before that happens. not that it matters :)
Haha wtf! That doesn’t count!! :D |
| |
CyberBrain Administrator
Posts: 392 |
Ok, let me take a stab at abusing the dummy write-cycle of RMW instructions:
$3FFF (ghostbyte) under ROM
As we know, when reading a byte from ROM, the CPU reads from the ROM, but when writing to a byte in the ROM, the write falls through to the RAM beneath it.
So with an RMW instruction you can actually write 2 values to a byte in RAM, 1 cycle apart, where none of the two written values are the value that was already present.
Usually not a useful thing to do, but together with the VIC we could exploit this:
Put the VIC in bank 2 or 3 and enable the KERNAL/BASIC ROM. Then an INC (for example) can write to the ghostbyte twice, 1 cycle apart - and the first write doesn't necessarily have to write what was already there!
Unfortunately what you can write at the first dummy cycle is limited to what is in the ROM at the chosen ghostbyte address (4 possibilities).
What you can write at the 2nd write cycle also depends on that value as well as which RMW-instruction you use (so we have 6 possibilities per ghostbyte address for the second write-cycle).
Let's look at which possibilities of pixels we have:
First wcycle Second wcycle:
| INC DEC ASL ROL (C=1) LSR ROR (C=1)
- $B7FF in ROM contains $B0 = %10110000. %10110001 %10101111 %01100000 %01100001 %01011000 %11011000
- $BFFF in ROM contains $E0 = %11100000. %11100001 %11011111 %11000000 %11000001 %01110000 %11110000
- $F7FF in ROM contains $D1 = %11010001. %11010010 %11010000 %10100010 %10100011 %01101000 %11101000
- $FFFF in ROM contains $FF = %11111111. %00000000 %11111110 %11111110 %11111111 %01111111 %11111111
|
+----> (*)
The choice marked with (*) ($FFFF and INC) might be useful in practice to create a *single cycle* wide $FF pattern! ($00 -> $FF -> $00)
Just do an INC $FFFF somewhere the ghostbyte is visible. (And init the ghostbyte to $00 in advance)
(This can of course be repeated all the way throughout the border, the x-pos can be changed, can be done multiple times per rasterline, etc etc)
Perhaps one can even be creative and use the other patterns for something too...
Charset/Bitmap
Instead of the ghostbyte, the same could be done for charset/bitmaps (but not sprites or the screen) - maybe there is an application there too.
For example (using precise timing) a charset-byte could could be set to $FF at the exact time it is read by the VIC, using the dummy-write of an INC, and then to $00 immediately after, at the second write-cycle, so that it is $00 next time it is rendered by the VIC (instead of LDA #$FF, STA $xxxx, LDA #$00; STA $xxxx).
When repeating the 7th pixel-line of a text-line using linecrunch, for example, this could make the charset-byte $FF on one raster line and $00 on the next with only one INC $xxxx instruction.
That could of course be repeated again and again, every 2nd line, so that the charset-byte alternates between $00 and $FF every rasterline...
(Of course, this requires that there is a $FF or $00 byte in the ROM at that location - but other values might be nice too)
But I haven't really found a totally perfect use-case for this yet, so not sure if it's useful in practice. Ideas? |
| |
tlr
Registered: Sep 2003 Posts: 1723 |
_Very_ cool! We approves. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11136 |
YES! keep it coming! :) |
| |
Compyx
Registered: Jan 2005 Posts: 631 |
Perhaps I'm missing something, but aren't the 'alternate' ghostbytes at $B9FF and $F9FF (when using ECM)? |
| |
CyberBrain Administrator
Posts: 392 |
Yes, you're absolutely right of course. Their values unfortunately doesn't become much nicer:
$B9FF = $A0
$F9FF = $D2 |
| |
Compyx
Registered: Jan 2005 Posts: 631 |
Nope. But the 'trick' is very interesting though, never would have thought of it :) |
| |
Oswald
Registered: Apr 2002 Posts: 5025 |
nice trick, but 3fff can be set to any byte in a 8 pixel wide area with another trick. |
| |
Compyx
Registered: Jan 2005 Posts: 631 |
Pray tell. |
| |
tlr
Registered: Sep 2003 Posts: 1723 |
Quote: nice trick, but 3fff can be set to any byte in a 8 pixel wide area with another trick.
I assume you are refering to exploiting the difference in pipeline delay between changing the videomode and changing the fetched graphics data to be fetched?
That is a neat trick, but it is dependant on the type of VIC-II (old/new) and it takes more cycles per instance.
The inc $ffff trick is only 6 cycles per instance. |
| |
Hoogo
Registered: Jun 2002 Posts: 102 |
Dec $dc0d after a Timer A IRQ occurred would acknowledge that IRQ and stop further ones. But I don't see a good reason to do that. |
| |
Rastah Bar
Registered: Oct 2012 Posts: 336 |
Hoogo, perhaps when you no longer need that timer interrupt, f.e. as an exit strategy to move on to a new demo part?
Oswald, are you referring to the vertical border scrollers in Krestage? But aren't these 7 pixels wide?
CyberBrain, great idea! |
| |
chatGPZ
Registered: Dec 2001 Posts: 11136 |
reminds me of a subtle bug i created long ago which made me pull my hair out. somehow i thought its a good idea to ACK the timer irq by
lda $dc0d
sta $dc0d
aaaargs |
| |
Oswald
Registered: Apr 2002 Posts: 5025 |
Quote: Hoogo, perhaps when you no longer need that timer interrupt, f.e. as an exit strategy to move on to a new demo part?
Oswald, are you referring to the vertical border scrollers in Krestage? But aren't these 7 pixels wide?
CyberBrain, great idea!
maybe 7,dont know :)
interesting it doesnt work on old vic ? why did they change it? why is it needed? interesting details here :) |
| |
chatGPZ
Registered: Dec 2001 Posts: 11136 |
"They" didnt change it - its a sideeffect of how the signal propagation delays in the chip, and so it changed when they moved from one process to another. |
| |
lft
Registered: Jul 2007 Posts: 369 |
It is possible to start a REU transfer by writing to address $ff00, which is useful when you want to transfer to or from memory in the $d000-$dfff range. But sometimes you don't want to trash the byte at $ff00, so you end up starting the transfer like this:
lda $ff00
sta $ff00
However, it turns out you can use any RMW instruction:
inc $ff00
The dummy write causes the REU to immediately take over the bus, so the second write-request from the CPU doesn't reach the memory chips. The incremented value never gets written into RAM. Three cycles saved. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11136 |
woa, thats... unexpected (to me at least). nice! |
| |
MagerValp
Registered: Dec 2001 Posts: 1059 |
That’s certainly unexpected. So what happens during the last cycle, bus conflict? |
| |
lft
Registered: Jul 2007 Posts: 369 |
The 6502 has two inputs, /RDY (Ready) and /AEC (Address Enable Control). RDY tells the CPU to pause execution, but it is only obeyed during read cycles. AEC immediately disconnects the CPU from the buses (address, data, and the read/write signal).
The VIC chip has two outputs, BA (Bus Available) and AEC (Address Enable Control). During normal operation, VIC asserts AEC (which is connected to AEC on the CPU) on every other half-cycle in order to read e.g. font bits. It has to work immediately, i.e. asynchronously, because it needs to be fast enough for half-cycle operations.
When VIC needs to halt the CPU, it first pulls BA low for three cycles, to ensure that the CPU is on a read cycle. Then it asserts AEC in order to access memory on both half-cycles.
The expansion port has an output, BA, and an input, /DMA. BA comes from the VIC. But /DMA is connected to both /RDY and /AEC. That is, it tells the CPU to pause, but it also immediately disconnects the CPU from the buses.
The REU monitors BA so it can pause an ongoing transfer during badlines and sprite fetches. But otherwise, it pulls /DMA and just assumes that the bus is free. The engineers must have assumed (wrongly) that the CPU will always trigger a transfer on the last cycle of an instruction, so that the next cycle is guaranteed to be a read (to fetch the next instruction).
Instead, due to the double-write of our RMW instruction, part of the CPU will attempt to place an address and data value on the buses, and set the read/write line to write. But the CPU is disconnected from the buses because /DMA is held low, and therefore /AEC. The bits never reach the actual bus lines; they dissipate into a small amount of heat. |
| |
Zyron
Registered: Jan 2002 Posts: 2381 |
:o |
| |
MagerValp
Registered: Dec 2001 Posts: 1059 |
Neat! |
| |
chatGPZ
Registered: Dec 2001 Posts: 11136 |
that calls for a test program that breaks emulation :) |
| |
oziphantom
Registered: Oct 2014 Posts: 478 |
Quoting lftThe engineers must have assumed (wrongly) that the CPU will always trigger a transfer on the last cycle of an instruction, so that the next cycle is guaranteed to be a read (to fetch the next instruction).
Or explicitly put in the programming guide/datasheet.
'Thou shall not write to thing with anything else other than ABS.' |
| |
Oswald
Registered: Apr 2002 Posts: 5025 |
I dont get it, if REU starts to use bus as soon as VIC signals on BA to processor to stop, then it will lead to a bus conflict ? |
| |
JackAsser
Registered: Jun 2002 Posts: 1990 |
Quote: I dont get it, if REU starts to use bus as soon as VIC signals on BA to processor to stop, then it will lead to a bus conflict ?
Nobody said that... :) It monitors BA... I don't know, but I would assume the REU will not transfer when BA is low, even though technically could use cycles 0..2 there. It might do, I dunno. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11136 |
but where is the test program? :) |
| |
Walt
Registered: May 2004 Posts: 47 |
I am working on a REU demo and I had to do some test code because I experienced different behavior when using REU to magic byte on top of 8 sprites in a row.
Timing was different on VICE 2.4, VICE 3.x, real REU (Thanks Hedning :)), 64 Ultimate and 1541 Ultimate 2+...
The C64 Ultimate and VICE 2.4 behaved the same and real REU and VICE 3.x behaved the same (Nice work on VICE :))
So yeah, a test program for BA and REU would be nice... |
| |
chatGPZ
Registered: Dec 2001 Posts: 11136 |
there are related test programs, but not for the ff00 trigger with inc as lft described :) see https://sourceforge.net/p/vice-emu/code/HEAD/tree/testprogs/REU/ |
| |
chatGPZ
Registered: Dec 2001 Posts: 11136 |
.... so i wrote one -> https://sourceforge.net/p/vice-emu/code/HEAD/tree/testprogs/REU..
works as LFT described on real hardware. fails in VICE :) |