[CSDb] - User Forums - 6502 VM running on a 6502

You are not logged in - nap

CSDb User Forums

Forums > C64 Coding > 6502 VM running on a 6502

2021-08-03 12:47

Krill

Registered: Apr 2002
Posts: 2839

6502 VM running on a 6502

I wonder what the slowdown for a highly-optimised 6502 VM running on a 6502 (or 6510) would be.

Considering Ultimate64 with its 48 MHz turbo mode, might it be generally possible to execute one guest cycle in 48 host cycles or fewer? =)

Guts feeling says yes, but i haven't yet dabbled with some actual code (on paper or otherwise).

I'm not much considering I/O (chip access, including interrupts) yet, thinking about the basic load/store/branch/arith instructions mostly, at this point.

Or maybe such a thing exists already, originally intended for SuperCPU or so? =)

2021-08-03 14:09

tlr

Registered: Sep 2003
Posts: 1714

Just as an exercise or with a specific purpose?

2021-08-03 14:21

Frantic

Registered: Mar 2003
Posts: 1627

I created a 6510 VM running on a 6510 at one point. I was experimenting with Genetic Programming algorithms and wanted to be able to restrict memory access to certain areas and some other restrictions like this.

If I remember correctly I actually executed the real instructions, after checking for some constraints. So I didn't emulate cpu flags and stuff like that, but actually used the real CPU flags and PHP/PLP.

2021-08-03 15:19

Krill

Registered: Apr 2002
Posts: 2839

Quoting tlr

Just as an exercise or with a specific purpose?

Why, specific purpose, of course. =)

Without spilling my actual beans just yet: the supervising VM shall run C-64 code at native 1 MHz speed.
Idea is to avoid crashing the host machine and injecting code into the guest with the VM having ultimate control over video output, at least.

Quoting Frantic

So I didn't emulate cpu flags and stuff like that, but actually used the real CPU flags and PHP/PLP.

Yes, as the VM runs another instance of itself (same CPU), it's more about a lot of context switching rather than emulating a foreign instruction set architecture. =)

2021-08-03 15:59

tlr

Registered: Sep 2003
Posts: 1714

Quoting Krill

Quoting tlr
Just as an exercise or with a specific purpose?
Why, specific purpose, of course. =)

Idea is to avoid crashing the host machine and injecting code into the guest with the VM having ultimate control over video output, at least.

If you aren't bothered by I/O, single stepping using an NMI timer to break after a single instruction is a common way. Typically used for single stepping purposes in monitors.
If your machine runs a lot faster than the timers this may not be feasible, but then again if it's a non-standard architecture, perhaps it would be better to implement the hypervisor in the architecture itself?

2021-08-03 16:12

Krill

Registered: Apr 2002
Posts: 2839

Quoting tlr

If you aren't bothered by I/O, single stepping using an NMI timer to break after a single instruction is a common way. Typically used for single stepping purposes in monitors.

I was thinking more in the direction of a pedestrian fetch-sanitise-dispatch approach, as there must be some kind of sandboxing. VM core could restrict itself to not use X or Y registers, in order to minimise context-switch overhead.

Quoting tlr

perhaps it would be better to implement the hypervisor in the architecture itself?

Indeed, that would be the clean and best option, if it were one. Would solve a lot of problems and actually provide proper sandboxing with separate register files, real memory protection and all, and the context switching done in hardware.
Not to speak of providing the original per-cycle behaviour on the 1 MHz grid minus DMA, such as double-writes with RMW instructions really being 8 pixels apart. :)

2021-08-03 17:36

tlr

Registered: Sep 2003
Posts: 1714

Quoting Krill

Quoting tlr
If you aren't bothered by I/O, single stepping using an NMI timer to break after a single instruction is a common way. Typically used for single stepping purposes in monitors.
I was thinking more in the direction of a pedestrian fetch-sanitise-dispatch approach, as there must be some kind of sandboxing. VM core could restrict itself to not use X or Y registers, in order to minimise context-switch overhead.

With the timer approach you could do that too but let opcodes that are safe just run, e.g LDA #<imm> and so on. Just peek at what's about to run.

2021-08-03 17:47

Krill

Registered: Apr 2002
Posts: 2839

Quoting tlr

With the timer approach you could do that too but let opcodes that are safe just run, e.g LDA #<imm> and so on. Just peek at what's about to run.

Sure, but if the opcode needs to be analysed anyways, the two unconditional interrupt context switches are unnecessary for most of the cases.
Besides, is the periphery including CIAs and their timers sped up in turbo mode as well? I'd guess not, and then having an interrupt for each 1 MHz cycle is out of the question anyways.

2021-08-03 18:07

tlr

Registered: Sep 2003
Posts: 1714

Quoting Krill

Quoting tlr
With the timer approach you could do that too but let opcodes that are safe just run, e.g LDA #<imm> and so on. Just peek at what's about to run.
Sure, but if the opcode needs to be analysed anyways, the two unconditional interrupt context switches are unnecessary for most of the cases.

You could opt for scanning the code forward until an unsafe op is found. Then either use interrupt to break there or place an RTS there. The latter might interfere with self modifying stuff though.
Haven't done any research on how long safe op sequences you'll get but there should be some at least. The scanning should be reasonably fast.
Quoting Krill

Besides, is the periphery including CIAs and their timers sped up in turbo mode as well? I'd guess not, and then having an interrupt for each 1 MHz cycle is out of the question anyways.

Probably not, because that'd break things like running the screen editor.

Isn't the Ultimate64 open source like the 1541U2?

2021-08-03 20:04

Hoogo

Registered: Jun 2002
Posts: 102

Did something like that in '91 to create memory maps of used or addressed locations, also just for simple emulation without caring for IRQs and other hardware stuff. For that purpose, speed was somewhere between 1/17 and 1/65 with all the bitmapping to store the found results.

I don't remember the details, I think it was a table of 256 bytes to handle the special cases, and the general cases to handle commands of 1-3 bytes, their addressing modes, and restoring all registers. I'm pretty sure that this can be done faster.

2021-08-03 22:05

chatGPZ

Registered: Dec 2001
Posts: 11111

Quote: