Virtualizing uxn
Fri 11 Aug 2023 18:07 CEST
uxn is a delightful virtual-machine specification (with lots of implementations) supporting 16-bit operation.
At present, it is a “real mode” machine, and there is no support for running virtual uxn machines under an uxn hypervisor.
It’d be neat to have a virtualizable uxn system: you could write a full operating system, with memory protection and multi-tasking, entirely in uxntal, and existing ROMs wouldn’t even know the difference! Or you could write a debugger that made good use of the native uxn core where possible. You could serialize a program, suspending it to be resumed later, or transfer it across the network to a different host, all without having to modify the base uxn emulator.
So: Can we make uxn virtualizable, so that an uxn hypervisor can manage one or more child virtual machines, each running a program presented with a seamless illusion that it’s the only software on the machine? Let’s find out!
What is virtualizability?
In 1974, Gerald J. Popek and Robert P. Goldberg wrote a seminal paper laying out a simple and robust approach to determining whether a particular machine architecture is virtualizable. Quoting from the paper’s wikipage,
There are three properties of interest when analyzing the environment created by a VMM [a Virtual Machine Monitor, a.k.a. a Hypervisor]:
Equivalence / Fidelity. A program running under the VMM should exhibit a behavior essentially identical to that demonstrated when running on an equivalent machine directly.
Resource control / Safety. The VMM must be in complete control of the virtualized resources.
Efficiency / Performance. A statistically dominant fraction of machine instructions must be executed without VMM intervention.
Is real-mode uxn virtualizable?
Uxn as specified doesn’t have a supervisor-mode/user-mode split, so we will have to adapt Popek and Goldberg’s development.
Sometimes the uxn machine takes a trap, but its behavior is defined to transition to a metamachine, not to some supervisor mode. In case of errors, the metamachine will often reactivate the machine at some error handler vector, but again, it’s in normal operation mode, not some supervisor mode. This means that there are no privileged instructions in uxn.
There are no base-and-bounds registers, and the range of pointers exactly covers accessible RAM
(64k!), so there are no out-of-bounds RAM accesses possible. However, it is possible to
underflow or overflow the stack, which falls in the spirit of Popek and Goldberg’s
memorytrap
. When this happens, the metamachine takes over. BRK
and DIV
are the only
instructions which trap or fault otherwise than memorytrap
.
There is no way to change the amount of available memory, and no mode changes possible, so there are no control sensitive instructions.
While there is no memory virtualization in uxn, consideration of behavior sensitive instructions requires us to posit an address translation facility. But we can easily make sure this doesn’t fall foul of Popek and Goldberg’s first (relocation) criterion of behavior sensitivity; and as there are no modes, their second criterion fails also. So there are no behavior sensitive instructions.
Thus, because all of the (nonexistent) control- and/or behavior-sensitive instructions fall in the (empty) set of privileged instructions, we can rely on Popek and Goldberg’s theorem 1, and say that construction of a VMM for uxn must be feasible. In fact, it’s almost there as it stands.
Reaching out further, to recursive virtualizability, is in my opinion the simplest path to virtualization. Execution of uxn is deterministic, given the devices in play today! Thus, by Popek and Goldberg’s theorem 2, uxn must be able to be recursively virtualizable.
Virtualized uxn
Let’s explore one possible approach to actually virtualizing uxn.
At the foundation of a virtualization tree lies a real, physical machine, with real RAM and real I/O devices. That physical base machine has a physical RAM pointer. Every virtual machine in the tree implicitly accesses the physical RAM.
Each virtual machine also has private registers, and private memory. Recall we’re considering recursive virtualization, where the program a hypervisor supervises may in turn be a hypervisor.
- Registers
- 32-bit base, relative to its parent’s memory allocation
- 32-bit bound (length of accessible memory)
- 16-bit PC
- a trapCode register
- Memory, all of which MUST be disjoint from the allowed base-and-bound region of RAM
- 256 bytes of device memory
- two 255-byte stacks, plus two 1-byte stack pointers
- two 32-byte device mask bitsets and one 16-entry × 16-bits device version vector
- a 16-byte trap description area
The base and bound are new. Machines do not have access to them; they are only able to access
the registers of “child” VMs. An exception is made to allow a VM to read its own bound
register. It does this via the system device “expansion” port, using an operation getBound
,
described below.
The outermost VM has a base of 0 and a bound of however large the physical RAM is.
All memory-accessing operations (including loading an instruction) take the base and bound into account. Accesses falling outside physical RAM or outside the base-and-bound cause a memory fault.
The device mask bitsets still control device effects, but instead of calling a DEI
/DEO
handler routine as a subroutine of the VM’s evaluator, the evaluator traps, yielding control to
the parent VM, which performs the requested action.
Every time a VM stops execution, its trapCode and trap description area are updated to describe the cause of the stop.
System device expansion operations
Adding the following operations is sufficient to allow writing a hypervisor in uxntal:
name | operation | fields |
---|---|---|
getBound | #10 | &boundHi $2 &boundLo $2 |
vmExec | #11 | &vmAddr $2 |
getBound
The handler retrieves the 32-bit active bound on memory access, overwriting the four bytes following the operation code.
vmExec
The handler expects a VM Control Block (VMCB) to exist at vmAddr
:
field name | offset | size (bytes) | description |
---|---|---|---|
parentLink | 0 | 4 | scratch space for control transfer |
baseHi | 4 | 2 | high short of base register |
baseLo | 6 | 2 | low short of base register |
boundHi | 8 | 2 | high short of bound register |
boundLo | 10 | 2 | low short of bound register |
pc | 12 | 2 | program counter |
trapCode | 14 | 2 | trap code |
trapDescription | 16 | 16 | trap detail record |
deiMask | 32 | 32 | device input mask bitset |
deoMask | 64 | 32 | device output mask bitset |
devVers | 96 | 32 | device version vector |
(reserved) | 128 | 128 | – |
workStack | 256 | 255 | work stack base |
workStackPtr | 511 | 1 | work stack pointer |
returnStack | 512 | 255 | return stack base |
returnStackPtr | 767 | 1 | return stack pointer |
devMemory | 768 | 256 | device memory |
1024 |
The operation performs a context switch to the described VM. In the following, next
denotes
the VMCB pointed to by vmAddr
, and curr
the running VM’s VMCB.
The absolute base absBase
is computed from the running VM’s base register and the base
register in next
. If absBase
or absBase
plus the next
’s bound lies outside the curr
’s
base-and-bound, curr
takes a memory fault. If any part of the next
VMCB overlaps with the
region addressed by absBase
and next
’s bound, curr
takes a memory fault.1
If these checks all pass, curr
’s VMCB address is placed in parentLink
(it will not be
visible to any VM; see below) and execution resumes in next
.
Traps and faults
When a running VM takes a trap or a fault, it transfers control to its parent VM. The parent
VM’s address is retrieved from the parentLink
field of the running VMCB (curr
). If it is 0,
there is no parent, and this is the outermost VM; the pre-virtualization behavior applies.
Otherwise, curr
’s trapCode
and trapDescription
are set appropriately, its parentLink
field is cleared (ensuring that the VM cannot learn anything about its absolute location), and
the parent VM is resumed.
Modified instructions
The effect of BRK
is altered: it takes a trap with trapCode
of BRK
and no description.
The effects of DEI
and DEO
are also altered. Instead of invoking a dei
/deo
function
inside the evaluator, the machine takes a trap with trapCode
set to IO
. The suspended VM’s
PC is left pointing after the DEI
/DEO
instruction; the arguments to DEI
/DEO
are
included in the trapDescription
.
Memory and stack faults
When an out-of-bounds access occurs, the stack underflows, or the stack overflows, the machine
takes a fault with an appropriate trapCode
, and with trapDescription
including details of
the fault. The machine’s PC and stack are left alone so that if the machine is immediately
restarted, it will precisely repeat the fault. This could be used to e.g. provide a (small)
virtual memory, or an (unbounded) virtual stack.
Implementation
I have implemented emulator support for virtualization, as sketched above,
here. I’ve also written a simple hypervisor
called xun
(like Xen, see?). To see
the code, check out the virtuxn3
branch of the https://gitlab.com/tonyg/uxn
git repository.
The implementation is done in two stages:
-
the first stage (diffs from
virtuxn3-base
tovirtuxn3-novirt
) splits out “MMU” and IO “processors” into a “system-on-chip”,uxnsoc.[ch]
, updating the various core implementations to use the slightly-different interface to memory and device IO. -
the second stage (diffs from
virtuxn3-novirt
tovirtuxn3
) is simpler, providing an alternative compatible “system-on-chip” implementation supporting the new system extension commands and providing the necessary machinery to swap in and out of nested VMs as required.
The hypervisor, xun
, is a little
program that is currently hardcoded to load a particular ROM, bin/xuninner.rom
. It cannot yet
recursively run itself running another program, and lacks configurability. It also doesn’t do
anything about device multiplexing, like a real multi-tasking operating system would have to
do. But the basics are there, enough to show it working: it can, for example, run piano.rom
without problems.
Differences between direct and virtualized execution
Because the devices take virtual addresses, rather than physical addresses, a supervisor
has to copy down media such as sprites and audio samples before asking the hardware to use
them. This leads to noticeable differences in behavior: running piano.rom
, sounding a note,
and then scribbling on the waveform causes the playing note to change its timbre. When running
the same program under xun
, however, the note stays the same; the edit to the timbre doesn’t
take effect until the next note is played.
One change that could ameliorate this would be to make the various addresses used in device memory into 32-bit physical addresses (16 bit page / 16-bit offset). That way, translated addresses from nested VMs could be used without having to copy them. (A second benefit would be the ability for ROMs to use media outside the bottom 64k without having to copy them first.)
Next steps
It’d be nice to adapt the devices to support 32-bit addresses for sprite and audio sample data
etc. It can be done backward-compatibly; I need to write down the approach I have in mind.
There’s also a bit of implementation work remaining in xun
to allow full recursive
virtualization; it’s, in some sense, a Trivial Matter of Programming, but the details are
actually a little more fiddly than they have to be and perhaps some changes to the device
interfaces could make things easier for hypervisors without complicating life for simple
real-mode ROMs. There’s nascent support in baseline uxn
for self-describing devices: I’d like
to see this fleshed out a bit so that nested VMs could reliably discover the features made
available to them by their hypervisor. Ultimately, it’d be amazing to build something a little
System-7 like using virtualized uxn for running programs. The nested VM interface could even be
widened to include a “fuel counter” of instructions before a “fuel exhausted” trap, allowing
preemptive multitasking.
The end
All in all, uxn virtualizes very nicely. Writing a hypervisor entirely in uxntal works well with minimal extensions to the VM specification. I’m looking forward to seeing how things develop!
-
A corollary of this is that a nested VM’s addressable memory must always be smaller (at least by the size of one VMCB) than its parent’s addressable memory. ↩