[CSDb] - User Forums - Talk on emulating 6502 (and also targeting it using C++)

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > Talk on emulating 6502 (and also targeting it using C++)

2023-12-09 17:58

Sasq

Registered: Apr 2004
Posts: 157

Talk on emulating 6502 (and also targeting it using C++)

Held this talk a while ago, in front of an audience that mostly hadn't even heard of the 6502 :)

https://www.youtube.com/watch?v=ZSwl4UEBFss

... 1 post hidden. Click here to view all posts....

2023-12-11 21:30

Repose
Account closed

Registered: Oct 2010
Posts: 227

Interesting, I want to listen to that!

I wrote an unusual emulator. I didn't want to make an assembler and emulator just for microbenchmarking, so I cheated - all instructions are functions. (This is rather bad, but shows the concept)

using namespace std;

//6502 program counter
uint16_t pc;
// 6502 registers
uint16_t sp, a, x, y;
// 6502 flags
uint16_t n, v, b, d, i, z, c;
// 6502 RAM
uint16_t ram[65536];
//helper variables
uint32_t instructions = 0; //keep track of total instructions executed
uint32_t clockticks6502 = 0;

const uint16_t reset_vector = 0xFFFC;

// addressing modes for instructions
const uint16_t mode_imm = 1;
const uint16_t mode_zp = 2;
const uint16_t mode_abs = 3;
const uint16_t mode_adapt = 4;
const uint16_t mode_ind_idx = 5;

uint16_t read16(uint16_t addr) {
    // read 2 bytes with wraparound
    return ram[addr] + ram[(uint16_t) (addr + 1)] * 0x100;
}

uint32_t read32(uint16_t addr) {
    // read 4 bytes with wraparound
    return ram[addr] + ram[(uint16_t) (addr + 1)] * 0x100 + ram[(uint16_t) (addr + 2)] * 0x10000 + ram[(uint16_t) (addr + 3)] * 0x01000000;
}

void write16(uint16_t addr, uint16_t value) {
    ram[addr] = value & 0xFF;
    ram[(uint16_t) (addr + 1)] = value >> 8;
}

uint16_t read_reg(uint16_t reg) {
    // return the value of register reg
    if (reg == A_reg) return a;
    if (reg == X_reg) return x;
    if (reg == Y_reg) return y;
}

void clear_ram() {
    uint16_t i = 0;
    do {
        ram[i] = 0;
        i++;
    }
    while(i != 0); // wrap around of uint16
}

void setflags(uint16_t value) {
    if (value == 0) {
        z = 1;
    } else {
        z = 0;
    }
    if (value >= 128) {
        n = 1;
    } else {
        n = 0;
    }
}

void setorg(uint16_t value) {
    pc = value;
}

void print_status() {
    cout << "NV-BDIZC" << endl;
    cout << n << v << "-" << b << d << i << z << c << endl;
}

void print_regs() {
    cout << " A  X  Y SP   PC" << endl;
    cout << hex(a, 2) << " " << hex(x, 2) << " " << hex(y, 2) << " " << hex(sp, 2) << " " << hex(pc, 4) << endl;
}

void adc(uint16_t mode, uint16_t value) {
    if (mode == mode_imm) {
        a += value + c;
        clockticks6502 += 2;
        pc += 2;
    }
    if (mode == mode_zp) {
        a += ram[value] + c;
        clockticks6502 += 3;
        pc += 2;
    }
    if (mode == mode_ind_idx) {
        a += ram[read16(value) + y] + c;
        clockticks6502 += 5;
        if (ram[value] + y > 255) clockticks6502++;
        pc += 2;
    }
    c = a > 255;
    a = a & 0xFF;
    setflags(a);
    instructions++;
}

2023-12-11 21:54

Bansai

Registered: Feb 2023
Posts: 54

If you want loads/stores that wrap back from FFFF to 0000, you probably could play PTE/TLB tricks using mmap() or a related function to get this for free by mirroring lower addresses starting at the 64k boundary. Misaligned loads/stores are fast on modern hardware, and excluding Power and Sun ISAs, I think all are little endian at this point.

2023-12-12 06:23

Repose
Account closed

Registered: Oct 2010
Posts: 227

I solved the issue by casting the value+1 to a uint16, which should force it to wrap around. There's a lot of tricky rules in C and I don't remember them off the top of my head unless I'm working a lot in it.
So basically it's finding 65536 in a larger precision then converting that to a lower precision, which should just truncate it. There's other ways to express that. But, I'm not clear why that's even necessary as the value was already unint16, unless it gave me an overflow error.
So, I think I'm using casting to get the bigger value then coercion to reduce it again, all in one step.

2023-12-12 16:15

Sasq

Registered: Apr 2004
Posts: 157

My code is also separated into functions, but they are constexpr/templated so they can all be resolved an inlined at compile time

    template <enum Reg REG, enum Mode MODE>
    static constexpr void Load(Machine& m)
    {
        m.Reg<REG>() = m.LoadEA<MODE>();
        m.set<SZ>(m.Reg<REG>());
    }

{ "lda", {
    { 0xa9, 2, Mode::IMM, Load<Reg::A, Mode::IMM>},
    { 0xa5, 3, Mode::ZP, Load<Reg::A, Mode::ZP>},
    { 0xb5, 4, Mode::ZPX, Load<Reg::A, Mode::ZPX>},
    { 0xad, 4, Mode::ABS, Load<Reg::A, Mode::ABS>},
    { 0xbd, 4, Mode::ABSX, Load<Reg::A, Mode::ABSX>},
    { 0xb9, 4, Mode::ABSY, Load<Reg::A, Mode::ABSY>},
    { 0xa1, 6, Mode::INDX, Load<Reg::A, Mode::INDX>},
    { 0xb1, 5, Mode::INDY, Load<Reg::A, Mode::INDY>},
} },

etc...

2023-12-12 16:44

Martin Piper

Registered: Nov 2007
Posts: 739

Any decent compiler can inline small functions without needing to use templates.

2023-12-12 16:58

Sasq

Registered: Apr 2004
Posts: 157

> Any decent compiler can inline small functions without needing to use templates.

To a degree. Not nearly enough to make sure all emulated opcodes are completely inlined.

For instance there is constant state in the emulator itself as well as arguments to functions.

Also without using templates you would have to create separate functions for _all_ opcodes, on not (like I do) a single Load() function.

template are invaluable for writing fast code like this (unless you want to use lot of macros).

2023-12-12 17:35

Krill

Registered: Apr 2002
Posts: 3098

Quoting Sasq

template are invaluable for writing fast code like this (unless you want to use lot of macros).

Templates are macros done right. (TM) =)

2023-12-12 17:43

chatGPZ

Registered: Dec 2001
Posts: 11523

That said, it's not necessarily the optimal thing to just inline all and everything (caches etc).

And the CPU core is negligible on a modern box too - other things will eat a LOT more performance :)

2023-12-12 21:21

Bansai

Registered: Feb 2023
Posts: 54

Quoting chatGPZ

And the CPU core is negligible on a modern box too - other things will eat a LOT more performance :)

Completely agreed. Based on other posts, I'm guessing Repose might have been verifying his math algorithms deterministically by iterating through all 2**x inputs with an instruction set simulator/emulator.

2023-12-12 21:36

Repose
Account closed

Registered: Oct 2010
Posts: 227

Yes exactly! And my tests take up to 4 minutes on an ancient machine. For 2^24 tests and above, it starts to become a problem, but faster code isn't nearly enough to solve the problem. You have to start being smart, and make use of symmetries in the tests to do less work. For example, if the order of the multiplier and multiplicand don't affect timing of a particular algorithm, you can reduce about half your work.

Previous - 1 | 2 - Next

Refresh

Subscribe to this thread:

You need to be logged in to post in the forum.

Search the forum:
Search for in
All times are CET.

Search CSDb

Advanced

Users Online

Scooby/G★P/Light
psych
Guests online: 305

Top Demos

1 Next Level  (9.7)
2 13:37  (9.7)
3 Codeboys & Endians  (9.7)
4 Mojo  (9.6)
5 Coma Light 13  (9.6)
6 Edge of Disgrace  (9.6)
7 Signal Carnival  (9.6)
8 Wonderland XIV  (9.5)
9 Uncensored  (9.5)
10 Comaland 100%  (9.5)

Top onefile Demos

1 Nine  (9.7)
2 Layers  (9.6)
3 Cubic Dream  (9.6)
4 Party Elk 2  (9.6)
5 Copper Booze  (9.5)
6 Scan and Spin  (9.5)
7 Onscreen 5k  (9.5)
8 Grey  (9.5)
9 Dawnfall V1.1  (9.5)
10 Rainbow Connection  (9.5)

Top Groups

1 Artline Designs  (9.3)
2 Booze Design  (9.3)
3 Performers  (9.3)
4 Oxyron  (9.3)
5 Censor Design  (9.3)

Top Musicians

1 Rob Hubbard  (9.7)
2 Jeroen Tel  (9.7)
3 Stinsen  (9.7)
4 LMan  (9.7)
5 Linus  (9.6)

Page generated in: 0.103 sec.