| | Sasq
Registered: Apr 2004 Posts: 157 |
Talk on emulating 6502 (and also targeting it using C++)
Held this talk a while ago, in front of an audience that mostly hadn't even heard of the 6502 :)
https://www.youtube.com/watch?v=ZSwl4UEBFss |
|
... 1 post hidden. Click here to view all posts.... |
| | Repose
Registered: Oct 2010 Posts: 227 |
Interesting, I want to listen to that!
I wrote an unusual emulator. I didn't want to make an assembler and emulator just for microbenchmarking, so I cheated - all instructions are functions. (This is rather bad, but shows the concept)
using namespace std;
//6502 program counter
uint16_t pc;
// 6502 registers
uint16_t sp, a, x, y;
// 6502 flags
uint16_t n, v, b, d, i, z, c;
// 6502 RAM
uint16_t ram[65536];
//helper variables
uint32_t instructions = 0; //keep track of total instructions executed
uint32_t clockticks6502 = 0;
const uint16_t reset_vector = 0xFFFC;
// addressing modes for instructions
const uint16_t mode_imm = 1;
const uint16_t mode_zp = 2;
const uint16_t mode_abs = 3;
const uint16_t mode_adapt = 4;
const uint16_t mode_ind_idx = 5;
uint16_t read16(uint16_t addr) {
// read 2 bytes with wraparound
return ram[addr] + ram[(uint16_t) (addr + 1)] * 0x100;
}
uint32_t read32(uint16_t addr) {
// read 4 bytes with wraparound
return ram[addr] + ram[(uint16_t) (addr + 1)] * 0x100 + ram[(uint16_t) (addr + 2)] * 0x10000 + ram[(uint16_t) (addr + 3)] * 0x01000000;
}
void write16(uint16_t addr, uint16_t value) {
ram[addr] = value & 0xFF;
ram[(uint16_t) (addr + 1)] = value >> 8;
}
uint16_t read_reg(uint16_t reg) {
// return the value of register reg
if (reg == A_reg) return a;
if (reg == X_reg) return x;
if (reg == Y_reg) return y;
}
void clear_ram() {
uint16_t i = 0;
do {
ram[i] = 0;
i++;
}
while(i != 0); // wrap around of uint16
}
void setflags(uint16_t value) {
if (value == 0) {
z = 1;
} else {
z = 0;
}
if (value >= 128) {
n = 1;
} else {
n = 0;
}
}
void setorg(uint16_t value) {
pc = value;
}
void print_status() {
cout << "NV-BDIZC" << endl;
cout << n << v << "-" << b << d << i << z << c << endl;
}
void print_regs() {
cout << " A X Y SP PC" << endl;
cout << hex(a, 2) << " " << hex(x, 2) << " " << hex(y, 2) << " " << hex(sp, 2) << " " << hex(pc, 4) << endl;
}
void adc(uint16_t mode, uint16_t value) {
if (mode == mode_imm) {
a += value + c;
clockticks6502 += 2;
pc += 2;
}
if (mode == mode_zp) {
a += ram[value] + c;
clockticks6502 += 3;
pc += 2;
}
if (mode == mode_ind_idx) {
a += ram[read16(value) + y] + c;
clockticks6502 += 5;
if (ram[value] + y > 255) clockticks6502++;
pc += 2;
}
c = a > 255;
a = a & 0xFF;
setflags(a);
instructions++;
}
|
| | Bansai
Registered: Feb 2023 Posts: 49 |
If you want loads/stores that wrap back from FFFF to 0000, you probably could play PTE/TLB tricks using mmap() or a related function to get this for free by mirroring lower addresses starting at the 64k boundary. Misaligned loads/stores are fast on modern hardware, and excluding Power and Sun ISAs, I think all are little endian at this point. |
| | Repose
Registered: Oct 2010 Posts: 227 |
I solved the issue by casting the value+1 to a uint16, which should force it to wrap around. There's a lot of tricky rules in C and I don't remember them off the top of my head unless I'm working a lot in it.
So basically it's finding 65536 in a larger precision then converting that to a lower precision, which should just truncate it. There's other ways to express that. But, I'm not clear why that's even necessary as the value was already unint16, unless it gave me an overflow error.
So, I think I'm using casting to get the bigger value then coercion to reduce it again, all in one step. |
| | Sasq
Registered: Apr 2004 Posts: 157 |
My code is also separated into functions, but they are constexpr/templated so they can all be resolved an inlined at compile time
template <enum Reg REG, enum Mode MODE>
static constexpr void Load(Machine& m)
{
m.Reg<REG>() = m.LoadEA<MODE>();
m.set<SZ>(m.Reg<REG>());
}
{ "lda", {
{ 0xa9, 2, Mode::IMM, Load<Reg::A, Mode::IMM>},
{ 0xa5, 3, Mode::ZP, Load<Reg::A, Mode::ZP>},
{ 0xb5, 4, Mode::ZPX, Load<Reg::A, Mode::ZPX>},
{ 0xad, 4, Mode::ABS, Load<Reg::A, Mode::ABS>},
{ 0xbd, 4, Mode::ABSX, Load<Reg::A, Mode::ABSX>},
{ 0xb9, 4, Mode::ABSY, Load<Reg::A, Mode::ABSY>},
{ 0xa1, 6, Mode::INDX, Load<Reg::A, Mode::INDX>},
{ 0xb1, 5, Mode::INDY, Load<Reg::A, Mode::INDY>},
} },
etc... |
| | Martin Piper
Registered: Nov 2007 Posts: 726 |
Any decent compiler can inline small functions without needing to use templates. |
| | Sasq
Registered: Apr 2004 Posts: 157 |
> Any decent compiler can inline small functions without needing to use templates.
To a degree. Not nearly enough to make sure all emulated opcodes are completely inlined.
For instance there is constant state in the emulator itself as well as arguments to functions.
Also without using templates you would have to create separate functions for _all_ opcodes, on not (like I do) a single Load() function.
template are invaluable for writing fast code like this (unless you want to use lot of macros). |
| | Krill
Registered: Apr 2002 Posts: 2997 |
Quoting Sasqtemplate are invaluable for writing fast code like this (unless you want to use lot of macros). Templates are macros done right. (TM) =) |
| | chatGPZ
Registered: Dec 2001 Posts: 11433 |
That said, it's not necessarily the optimal thing to just inline all and everything (caches etc).
And the CPU core is negligible on a modern box too - other things will eat a LOT more performance :) |
| | Bansai
Registered: Feb 2023 Posts: 49 |
Quoting chatGPZAnd the CPU core is negligible on a modern box too - other things will eat a LOT more performance :) Completely agreed. Based on other posts, I'm guessing Repose might have been verifying his math algorithms deterministically by iterating through all 2**x inputs with an instruction set simulator/emulator. |
| | Repose
Registered: Oct 2010 Posts: 227 |
Yes exactly! And my tests take up to 4 minutes on an ancient machine. For 2^24 tests and above, it starts to become a problem, but faster code isn't nearly enough to solve the problem. You have to start being smart, and make use of symmetries in the tests to do less work. For example, if the order of the multiplier and multiplicand don't affect timing of a particular algorithm, you can reduce about half your work. |
Previous - 1 | 2 - Next | |