| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
6502 instruction decoding
Why is it that STA-opcodes always gets an implicit page-crossing penalty in comparison to their LDA counter parts?
F.e.
lda abs,x is 4/5 depending on page crossing
sta abs,x is always 5 independent of page crossing
|
|
| |
Graham Account closed
Registered: Dec 2002 Posts: 990 |
LDA $1234,X
1. fetch opcode
2. fetch low byte of absolute address
3. fetch high byte of absolute address, add X to low byte
4. read byte from calculated address, increase high byte of address if X + LO overflowed
5. if X + LO overflowed, read byte again
If the same happened for STA, the 6502 would sometimes write a byte to a wrong address when doing step 4, so STA will always don't do the write at step 4 and always wait for step 5. |
| |
Fresh
Registered: Jan 2005 Posts: 101 |
What Graham said.
The cpu is not able to check cycle hi byte decide whether to write. In lda, by the time it checks it, it has already loaded a (possibly) correct value. But then again, that's just reading. |
| |
Slammer
Registered: Feb 2004 Posts: 416 |
Interesting. Do you have some documentation describing the internal working of the processor? |
| |
Fresh
Registered: Jan 2005 Posts: 101 |
The good old c64doc
http://www.viceteam.org/plain/64doc.txt
Or you can try the 'real' thing:
http://www.visual6502.org
|
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: What Graham said.
The cpu is not able to check cycle hi byte decide whether to write. In lda, by the time it checks it, it has already loaded a (possibly) correct value. But then again, that's just reading.
Stupid me, but it's quite obvious... :) So... those reads actually occur, not that it matters on the C64, but another platform with full IO-address decoding would perhaps suffer from implicit IRQ-acks f.e. due to those reads then?
|
| |
Copyfault
Registered: Dec 2001 Posts: 478 |
It always made me wonder why with STA ind,x the chip developers went for a rather consequent solution (only ONE write cycle @ calculation cycle 5) whereas it seems somewhat unpolished for the RMW-opcodes (TWO write cycles: unaltered value @ calc cycle 5, changed value @ calc cycle 6).
Most probably they had no other chance; writing the same value to a register that has just been read from it should not really cause troubles.
A good documentation should be AAY64 by Ninja. At least the internal processor states are also listed there. |
| |
Graham Account closed
Registered: Dec 2002 Posts: 990 |
It should be mentioned that STA $1234,X reads from the same address as LDA $1234,X in step 4.
And yes, this wrong read can acknowledge IRQs. Example:
LDX #$FF
LDA $DC0E,X
This would read from $DC0D in cycle 4 and $DD0D in cycle 5, acknowledging both, IRQs and NMIs :)
|
| |
Graham Account closed
Registered: Dec 2002 Posts: 990 |
Quoting CopyfaultIt always made me wonder why with STA ind,x the chip developers went for a rather consequent solution (only ONE write cycle @ calculation cycle 5) whereas it seems somewhat unpolished for the RMW-opcodes (TWO write cycles: unaltered value @ calc cycle 5, changed value @ calc cycle 6).
It needs 1 cycle to actually do the ALU action.
Quote:Most probably they had no other chance; writing the same value to a register that has just been read from it should not really cause troubles.
It does have it's effects. Acknowledging Raster IRQs with DEC $D019 only works because RMW writes the same value back that it reads. A clean way to acknowledge VIC2 IRQs would be LDA $D019 STA $D019.
|
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
RMW has two write cycles? Wow. So, on Nintendo, with auto incrementing vram-pointer when writing to $2007, this would mean that it would increment the pointer TWO times if I do say, INC $2007. Or rather, THREE times because of the one load cycle as well? |
| |
TWW
Registered: Jul 2009 Posts: 545 |
Quote: It should be mentioned that STA $1234,X reads from the same address as LDA $1234,X in step 4.
And yes, this wrong read can acknowledge IRQs. Example:
LDX #$FF
LDA $DC0E,X
This would read from $DC0D in cycle 4 and $DD0D in cycle 5, acknowledging both, IRQs and NMIs :)
That is amazing! 8-D
so no more:
lda $dc0d
lda $dd0d
just:
ldx #$ff
lda $dc0e,X
cool! |
| |
MagerValp
Registered: Dec 2001 Posts: 1078 |
Quoting GrahamA clean way to acknowledge VIC2 IRQs would be LDA $D019 STA $D019.
Or rather LDA #$FF : STA $d019. |
| |
iAN CooG
Registered: May 2002 Posts: 3194 |
I find asl $d019 way simpler, all bits get read/written. |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: I find asl $d019 way simpler, all bits get read/written.
Checking out visual6502.org and the forums there it became clear that the actual instruction decoding is just an internal PLA in the CPU. It takes the current opcode and cycle into an array and get an operation value out which it executes.
I want a dump of this ROM. There is one online but it's obviously totally flawed, and the dude who did it also said he didn't know what lines that were what.
Using the visual6502 one can transcribe it properly, but it's a pain to do it. Anybody did it already? |
| |
MagerValp
Registered: Dec 2001 Posts: 1078 |
Quoting JackAsserUsing the visual6502 one can transcribe it properly, but it's a pain to do it. Anybody did it already?
No me, but I'm also interested in it. It'd make a pretty awesome cycle exact 6502 emulation core. |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: Quoting JackAsserUsing the visual6502 one can transcribe it properly, but it's a pain to do it. Anybody did it already?
No me, but I'm also interested in it. It'd make a pretty awesome cycle exact 6502 emulation core.
Exactly + less code to implement it probably given you implement the ~130 sub operation (compared to 256 special case instruction as we do today more or less) |
| |
Zer0-X Account closed
Registered: Aug 2008 Posts: 78 |
So you're after the table like this (from the "wrong" CPU tho)?:
http://oms.wmhost.com/misc/6502_inst.png
|
| |
chatGPZ
Registered: Dec 2001 Posts: 11386 |
Quote:Exactly + less code to implement it probably given you implement the ~130 sub operation (compared to 256 special case instruction as we do today more or less)
not really... if you look at a typical cycle exact cpu core, it already works with a lookuptable and sub operations very similar to what the cpu does :) |
| |
JackAsser
Registered: Jun 2002 Posts: 2014 |
Quote: So you're after the table like this (from the "wrong" CPU tho)?:
http://oms.wmhost.com/misc/6502_inst.png
Yes, but I have deciphered the die-scan now into a proper table. There are errors though that I must fix first. |