| |
Krill
Registered: Apr 2002 Posts: 2825 |
6502 VM running on a 6502
I wonder what the slowdown for a highly-optimised 6502 VM running on a 6502 (or 6510) would be.
Considering Ultimate64 with its 48 MHz turbo mode, might it be generally possible to execute one guest cycle in 48 host cycles or fewer? =)
Guts feeling says yes, but i haven't yet dabbled with some actual code (on paper or otherwise).
I'm not much considering I/O (chip access, including interrupts) yet, thinking about the basic load/store/branch/arith instructions mostly, at this point.
Or maybe such a thing exists already, originally intended for SuperCPU or so? =) |
|
| |
tlr
Registered: Sep 2003 Posts: 1703 |
Just as an exercise or with a specific purpose? |
| |
Frantic
Registered: Mar 2003 Posts: 1627 |
I created a 6510 VM running on a 6510 at one point. I was experimenting with Genetic Programming algorithms and wanted to be able to restrict memory access to certain areas and some other restrictions like this.
If I remember correctly I actually executed the real instructions, after checking for some constraints. So I didn't emulate cpu flags and stuff like that, but actually used the real CPU flags and PHP/PLP. |
| |
Krill
Registered: Apr 2002 Posts: 2825 |
Quoting tlrJust as an exercise or with a specific purpose? Why, specific purpose, of course. =)
Without spilling my actual beans just yet: the supervising VM shall run C-64 code at native 1 MHz speed.
Idea is to avoid crashing the host machine and injecting code into the guest with the VM having ultimate control over video output, at least.
Quoting FranticSo I didn't emulate cpu flags and stuff like that, but actually used the real CPU flags and PHP/PLP. Yes, as the VM runs another instance of itself (same CPU), it's more about a lot of context switching rather than emulating a foreign instruction set architecture. =) |
| |
tlr
Registered: Sep 2003 Posts: 1703 |
Quoting KrillQuoting tlrJust as an exercise or with a specific purpose? Why, specific purpose, of course. =)
Idea is to avoid crashing the host machine and injecting code into the guest with the VM having ultimate control over video output, at least.
If you aren't bothered by I/O, single stepping using an NMI timer to break after a single instruction is a common way. Typically used for single stepping purposes in monitors.
If your machine runs a lot faster than the timers this may not be feasible, but then again if it's a non-standard architecture, perhaps it would be better to implement the hypervisor in the architecture itself? |
| |
Krill
Registered: Apr 2002 Posts: 2825 |
Quoting tlrIf you aren't bothered by I/O, single stepping using an NMI timer to break after a single instruction is a common way. Typically used for single stepping purposes in monitors. I was thinking more in the direction of a pedestrian fetch-sanitise-dispatch approach, as there must be some kind of sandboxing. VM core could restrict itself to not use X or Y registers, in order to minimise context-switch overhead.
Quoting tlrperhaps it would be better to implement the hypervisor in the architecture itself? Indeed, that would be the clean and best option, if it were one. Would solve a lot of problems and actually provide proper sandboxing with separate register files, real memory protection and all, and the context switching done in hardware.
Not to speak of providing the original per-cycle behaviour on the 1 MHz grid minus DMA, such as double-writes with RMW instructions really being 8 pixels apart. :) |
| |
tlr
Registered: Sep 2003 Posts: 1703 |
Quoting KrillQuoting tlrIf you aren't bothered by I/O, single stepping using an NMI timer to break after a single instruction is a common way. Typically used for single stepping purposes in monitors. I was thinking more in the direction of a pedestrian fetch-sanitise-dispatch approach, as there must be some kind of sandboxing. VM core could restrict itself to not use X or Y registers, in order to minimise context-switch overhead.
With the timer approach you could do that too but let opcodes that are safe just run, e.g LDA #<imm> and so on. Just peek at what's about to run. |
| |
Krill
Registered: Apr 2002 Posts: 2825 |
Quoting tlrWith the timer approach you could do that too but let opcodes that are safe just run, e.g LDA #<imm> and so on. Just peek at what's about to run. Sure, but if the opcode needs to be analysed anyways, the two unconditional interrupt context switches are unnecessary for most of the cases.
Besides, is the periphery including CIAs and their timers sped up in turbo mode as well? I'd guess not, and then having an interrupt for each 1 MHz cycle is out of the question anyways. |
| |
tlr
Registered: Sep 2003 Posts: 1703 |
Quoting KrillQuoting tlrWith the timer approach you could do that too but let opcodes that are safe just run, e.g LDA #<imm> and so on. Just peek at what's about to run. Sure, but if the opcode needs to be analysed anyways, the two unconditional interrupt context switches are unnecessary for most of the cases.
You could opt for scanning the code forward until an unsafe op is found. Then either use interrupt to break there or place an RTS there. The latter might interfere with self modifying stuff though.
Haven't done any research on how long safe op sequences you'll get but there should be some at least. The scanning should be reasonably fast.
Quoting KrillBesides, is the periphery including CIAs and their timers sped up in turbo mode as well? I'd guess not, and then having an interrupt for each 1 MHz cycle is out of the question anyways.
Probably not, because that'd break things like running the screen editor.
Isn't the Ultimate64 open source like the 1541U2? |
| |
Hoogo
Registered: Jun 2002 Posts: 102 |
Did something like that in '91 to create memory maps of used or addressed locations, also just for simple emulation without caring for IRQs and other hardware stuff. For that purpose, speed was somewhere between 1/17 and 1/65 with all the bitmapping to store the found results.
I don't remember the details, I think it was a table of 256 bytes to handle the special cases, and the general cases to handle commands of 1-3 bytes, their addressing modes, and restoring all registers. I'm pretty sure that this can be done faster. |
| |
chatGPZ
Registered: Dec 2001 Posts: 11100 |
Quote:Isn't the Ultimate64 open source
only the application software, not the core itself |
| |
Krill
Registered: Apr 2002 Posts: 2825 |
Quoting tlrYou could opt for scanning the code forward until an unsafe op is found. Scanning doesn't work so well with the requirement to execute an instruction in exactly the same time it would take on the original 1 MHz system, though. And it's rather likely that there are runs of many unsafe instructions, so nothing gained. In this regard, it's closer to an emulator than some JIT/dynamic recompilation kind of VM.
Quoting tlrIsn't the Ultimate64 open source like the 1541U2? What Groepaz said, but what are you insinuating? :)
Some answers to these questions could be found by testing, but i have no plans whatsoever to add/fix/change anything on the firmware. Just figured that U64 would be the best platform currently for something i'd like to see used. =) |
| |
tlr
Registered: Sep 2003 Posts: 1703 |
Quoting KrillQuoting tlrIsn't the Ultimate64 open source like the 1541U2? What Groepaz said, but what are you insinuating? :)
Some answers to these questions could be found by testing, but i have no plans whatsoever to add/fix/change anything on the firmware. Just figured that U64 would be the best platform currently for something i'd like to see used. =)
I'm implying implementing the hypervisor/sandboxing in the FPGA fabric instead of running it in 6502 assembly. This is probably more straight forward with no tricks required to keep timing. But with no source, not really feasible.
Doing it in 6502 asm sound like a fun project though. :) |
| |
Krill
Registered: Apr 2002 Posts: 2825 |
Quoting tlrDoing it in 6502 asm sound like a fun project though. :) Yeah, but the limitations are obvious. Being able to run any original C-64 software would be a side-goal. :)
And with virtualisation in hardware, one could also have neat things like shifting processing slices between hypervisor and guest at (the hypervisor's) will, or having an overlay menu/monitor/editor running in the hypervisor. |
| |
Oswald
Registered: Apr 2002 Posts: 5017 |
hmm and no one interested why ? I am :) |
| |
Krill
Registered: Apr 2002 Posts: 2825 |
Quoting Oswaldhmm and no one interested why ? I am :) Ok, time to revive this thread. =D
What i was getting at was some kind of "Byte Battle"-like setup, which in turn is a... vintage computing... variant of "Shader Showdown", but with some Lua-based fantasy console thing.
I think that something like that running on an U64 would be close enough to the original machine in order to defy the "fantasy console" label while still allowing for producing code able to run on the stock machine, in a very time-constrained live-coding show, but with a neat editor/compiler thingy running the code as it's typed.
(Note that whatever language/compiler setup on top of the VM is an open question, could be some beefed-up BASIC, could be some macro-enhanced 6502 assembly, or Turbo Rascal, maybe even Lua itself, whatever.) |