Virtualizing uxn

uxn is a delightful virtual-machine specification (with lots of implementations) supporting 16-bit operation.

At present, it is a “real mode” machine, and there is no support for running virtual uxn machines under an uxn hypervisor.

It’d be neat to have a virtualizable uxn system: you could write a full operating system, with memory protection and multi-tasking, entirely in uxntal, and existing ROMs wouldn’t even know the difference! Or you could write a debugger that made good use of the native uxn core where possible. You could serialize a program, suspending it to be resumed later, or transfer it across the network to a different host, all without having to modify the base uxn emulator.

So: Can we make uxn virtualizable, so that an uxn hypervisor can manage one or more child virtual machines, each running a program presented with a seamless illusion that it’s the only software on the machine? Let’s find out!

What is virtualizability?

In 1974, Gerald J. Popek and Robert P. Goldberg wrote a seminal paper laying out a simple and robust approach to determining whether a particular machine architecture is virtualizable. Quoting from the paper’s wikipage,

There are three properties of interest when analyzing the environment created by a VMM [a Virtual Machine Monitor, a.k.a. a Hypervisor]:

Equivalence / Fidelity. A program running under the VMM should exhibit a behavior essentially identical to that demonstrated when running on an equivalent machine directly.

Resource control / Safety. The VMM must be in complete control of the virtualized resources.

Efficiency / Performance. A statistically dominant fraction of machine instructions must be executed without VMM intervention.

Is real-mode uxn virtualizable?

Uxn as specified doesn’t have a supervisor-mode/user-mode split, so we will have to adapt Popek and Goldberg’s development.

Sometimes the uxn machine takes a trap, but its behavior is defined to transition to a metamachine, not to some supervisor mode. In case of errors, the metamachine will often reactivate the machine at some error handler vector, but again, it’s in normal operation mode, not some supervisor mode. This means that there are no privileged instructions in uxn.

There are no base-and-bounds registers, and the range of pointers exactly covers accessible RAM (64k!), so there are no out-of-bounds RAM accesses possible. However, it is possible to underflow or overflow the stack, which falls in the spirit of Popek and Goldberg’s memorytrap. When this happens, the metamachine takes over. BRK and DIV are the only instructions which trap or fault otherwise than memorytrap.

There is no way to change the amount of available memory, and no mode changes possible, so there are no control sensitive instructions.

While there is no memory virtualization in uxn, consideration of behavior sensitive instructions requires us to posit an address translation facility. But we can easily make sure this doesn’t fall foul of Popek and Goldberg’s first (relocation) criterion of behavior sensitivity; and as there are no modes, their second criterion fails also. So there are no behavior sensitive instructions.

Thus, because all of the (nonexistent) control- and/or behavior-sensitive instructions fall in the (empty) set of privileged instructions, we can rely on Popek and Goldberg’s theorem 1, and say that construction of a VMM for uxn must be feasible. In fact, it’s almost there as it stands.

Reaching out further, to recursive virtualizability, is in my opinion the simplest path to virtualization. Execution of uxn is deterministic, given the devices in play today! Thus, by Popek and Goldberg’s theorem 2, uxn must be able to be recursively virtualizable.

Virtualized uxn

Let’s explore one possible approach to actually virtualizing uxn.

At the foundation of a virtualization tree lies a real, physical machine, with real RAM and real I/O devices. That physical base machine has a physical RAM pointer. Every virtual machine in the tree implicitly accesses the physical RAM.

Each virtual machine also has private registers, and private memory. Recall we’re considering recursive virtualization, where the program a hypervisor supervises may in turn be a hypervisor.

  • Registers
    • 32-bit base, relative to its parent’s memory allocation
    • 32-bit bound (length of accessible memory)
    • 16-bit PC
    • a trapCode register
  • Memory, all of which MUST be disjoint from the allowed base-and-bound region of RAM
    • 256 bytes of device memory
    • two 255-byte stacks, plus two 1-byte stack pointers
    • two 32-byte device mask bitsets and one 16-entry × 16-bits device version vector
    • a 16-byte trap description area

The base and bound are new. Machines do not have access to them; they are only able to access the registers of “child” VMs. An exception is made to allow a VM to read its own bound register. It does this via the system device “expansion” port, using an operation getBound, described below.

The outermost VM has a base of 0 and a bound of however large the physical RAM is.

All memory-accessing operations (including loading an instruction) take the base and bound into account. Accesses falling outside physical RAM or outside the base-and-bound cause a memory fault.

The device mask bitsets still control device effects, but instead of calling a DEI/DEO handler routine as a subroutine of the VM’s evaluator, the evaluator traps, yielding control to the parent VM, which performs the requested action.

Every time a VM stops execution, its trapCode and trap description area are updated to describe the cause of the stop.

System device expansion operations

Adding the following operations is sufficient to allow writing a hypervisor in uxntal:

name operation fields
getBound #10 &boundHi $2 &boundLo $2
vmExec #11 &vmAddr $2

getBound

The handler retrieves the 32-bit active bound on memory access, overwriting the four bytes following the operation code.

vmExec

The handler expects a VM Control Block (VMCB) to exist at vmAddr:

field name offset size (bytes) description
parentLink 0 4 scratch space for control transfer
baseHi 4 2 high short of base register
baseLo 6 2 low short of base register
boundHi 8 2 high short of bound register
boundLo 10 2 low short of bound register
pc 12 2 program counter
trapCode 14 2 trap code
trapDescription 16 16 trap detail record
deiMask 32 32 device input mask bitset
deoMask 64 32 device output mask bitset
devVers 96 32 device version vector
(reserved) 128 128
workStack 256 255 work stack base
workStackPtr 511 1 work stack pointer
returnStack 512 255 return stack base
returnStackPtr 767 1 return stack pointer
devMemory 768 256 device memory
  1024    

The operation performs a context switch to the described VM. In the following, next denotes the VMCB pointed to by vmAddr, and curr the running VM’s VMCB.

The absolute base absBase is computed from the running VM’s base register and the base register in next. If absBase or absBase plus the next’s bound lies outside the curr’s base-and-bound, curr takes a memory fault. If any part of the next VMCB overlaps with the region addressed by absBase and next’s bound, curr takes a memory fault.1

If these checks all pass, curr’s VMCB address is placed in parentLink (it will not be visible to any VM; see below) and execution resumes in next.

Traps and faults

When a running VM takes a trap or a fault, it transfers control to its parent VM. The parent VM’s address is retrieved from the parentLink field of the running VMCB (curr). If it is 0, there is no parent, and this is the outermost VM; the pre-virtualization behavior applies. Otherwise, curr’s trapCode and trapDescription are set appropriately, its parentLink field is cleared (ensuring that the VM cannot learn anything about its absolute location), and the parent VM is resumed.

Modified instructions

The effect of BRK is altered: it takes a trap with trapCode of BRK and no description.

The effects of DEI and DEO are also altered. Instead of invoking a dei/deo function inside the evaluator, the machine takes a trap with trapCode set to IO. The suspended VM’s PC is left pointing after the DEI/DEO instruction; the arguments to DEI/DEO are included in the trapDescription.

Memory and stack faults

When an out-of-bounds access occurs, the stack underflows, or the stack overflows, the machine takes a fault with an appropriate trapCode, and with trapDescription including details of the fault. The machine’s PC and stack are left alone so that if the machine is immediately restarted, it will precisely repeat the fault. This could be used to e.g. provide a (small) virtual memory, or an (unbounded) virtual stack.

Implementation

I have implemented emulator support for virtualization, as sketched above, here. I’ve also written a simple hypervisor called xun (like Xen, see?). To see the code, check out the virtuxn3 branch of the https://gitlab.com/tonyg/uxn git repository.

The implementation is done in two stages:

  • the first stage (diffs from virtuxn3-base to virtuxn3-novirt) splits out “MMU” and IO “processors” into a “system-on-chip”, uxnsoc.[ch], updating the various core implementations to use the slightly-different interface to memory and device IO.

  • the second stage (diffs from virtuxn3-novirt to virtuxn3) is simpler, providing an alternative compatible “system-on-chip” implementation supporting the new system extension commands and providing the necessary machinery to swap in and out of nested VMs as required.

The hypervisor, xun, is a little program that is currently hardcoded to load a particular ROM, bin/xuninner.rom. It cannot yet recursively run itself running another program, and lacks configurability. It also doesn’t do anything about device multiplexing, like a real multi-tasking operating system would have to do. But the basics are there, enough to show it working: it can, for example, run piano.rom without problems.

Differences between direct and virtualized execution

Because the devices take virtual addresses, rather than physical addresses, a supervisor has to copy down media such as sprites and audio samples before asking the hardware to use them. This leads to noticeable differences in behavior: running piano.rom, sounding a note, and then scribbling on the waveform causes the playing note to change its timbre. When running the same program under xun, however, the note stays the same; the edit to the timbre doesn’t take effect until the next note is played.

One change that could ameliorate this would be to make the various addresses used in device memory into 32-bit physical addresses (16 bit page / 16-bit offset). That way, translated addresses from nested VMs could be used without having to copy them. (A second benefit would be the ability for ROMs to use media outside the bottom 64k without having to copy them first.)

Next steps

It’d be nice to adapt the devices to support 32-bit addresses for sprite and audio sample data etc. It can be done backward-compatibly; I need to write down the approach I have in mind. There’s also a bit of implementation work remaining in xun to allow full recursive virtualization; it’s, in some sense, a Trivial Matter of Programming, but the details are actually a little more fiddly than they have to be and perhaps some changes to the device interfaces could make things easier for hypervisors without complicating life for simple real-mode ROMs. There’s nascent support in baseline uxn for self-describing devices: I’d like to see this fleshed out a bit so that nested VMs could reliably discover the features made available to them by their hypervisor. Ultimately, it’d be amazing to build something a little System-7 like using virtualized uxn for running programs. The nested VM interface could even be widened to include a “fuel counter” of instructions before a “fuel exhausted” trap, allowing preemptive multitasking.

The end

All in all, uxn virtualizes very nicely. Writing a hypervisor entirely in uxntal works well with minimal extensions to the VM specification. I’m looking forward to seeing how things develop!

  1. A corollary of this is that a nested VM’s addressable memory must always be smaller (at least by the size of one VMCB) than its parent’s addressable memory. 

File distribution over DNS: (ab)using DNS as a CDN

This is the story of a one-afternoon hack that turned into a one-weekend hack.

I woke up on Saturday with a silly idea: what would it be like to use content-defined chunking (CDC) and serve the chunks over DNS? DNS caching could make scaleout and incremental updates quite efficient, at least in theory. Strong hashing gives robust download integrity. DNS replication gives you a kind of high availability, even!

After a coffee, I figured I may as well try it out.

TL;DR. It works, more or less, so long as your resolver properly upgrades to DNS-over-TCP when it gets a truncated UDP response. The immutable, strongly-named chunks are served in TXT records (!) and are cached by resolvers for a long time. This lowers load on the authoritative DNS server. Each stored “file” gets a user-friendly name for its root chunk in the form of a CNAME with a much shorter TTL.

You can could, for a while, try it out with:

docker run -i --rm leastfixedpoint/nscdn \
    nscdn get SEKIENAKASHITA.DEMO.NSCDN.ORG. > SekienAkashita.jpg

which downloads downloaded, when it was running, this image using nothing but DNS.

A demo server is was running on the domain demo.nscdn.org and the code is available at https://gitlab.com/tonyg/nscdn.

How it works

The Ronomon variant of FastCDC, due to Joran Greef, is a JavaScript-friendly, 32-bit content-defined chunking algorithm that splits big files into chunks with a distribution of sizes having an average, minimum and maximum length.1

For this project I chose an 8k minimum size, a 16k average, and a 48k upper limit, because of the limitations involved in serving large amounts of data via DNS.

The core idea is to use the CDC algorithm to slice up a data file, and then construct a broad, shallow Merkle tree from it, serving each leaf and inner node of the tree as a separate DNS TXT record associated with a domain label including the Base32 encoding of the BLAKE2b hash of the data.

The example file from above, SekienAkashita.jpg, is split up into five chunks and one inner node that lists the chunks:

SEKIENAKASHITA.DEMO.NSCDN.ORG (CNAME)
 └─⯈ 2-NPNQZ4OQGZCXTLGFT6SGXO5TFLVZWMCFOXAV6GEYNDPYL67QSMPQ.DEMO.NSCDN.ORG (TXT, 320 bytes)
      ├─⯈ 1-3NZRAZ4RTLWEFFSE27EBEVNKFOM6SFI4ISQEH7NGNPCG2CS4SIUA.··· (TXT, 22366 bytes)
      ├─⯈ 1-ZYNVPQYLBJYT7EOWRUDNY6XSURS3KWOMTQUODKUCCWXXQLKZFF3Q.··· (TXT, 8282 bytes)
      ├─⯈ 1-VPJYS4PUGX275YDFEKZF6BC3SXYZDMNTANV5LXMDWMB2PDSKWJPQ.··· (TXT, 16303 bytes)
      ├─⯈ 1-IBGDKGR2IXRIISKASJIP5CLPLHCUSGE5V6SVRWKHFHJSIAZVXOHQ.··· (TXT, 18696 bytes)
      └─⯈ 1-QT7EHDMMEVKJF77MOL4T4PXU3FGSCBXRVNFMYJK4NOQ4BJ6I7YCA.··· (TXT, 43819 bytes)

For a larger file—say, the Linux kernel—the index node itself would be longer than permitted for a chunk, so it would be split up itself and another level would be added to the tree to index the chunks of the lower-level node. This recursion can be repeated as required. I chose a 64-bit file size limit.

Storing a tree in DNS

A CNAME record from the human-usable name of each file (SEKIENAKASHITA.DEMO.NSCDN.ORG) points to the DNS record of the root node of the file’s Merkle tree.

Each tree node is stored as raw binary (!) in a DNS TXT record. Each record’s DNS label is formed from the type code of the node and the base32 encoding of the BLAKE2b hash of the binary content.

The content of leaf nodes (type 1) is just the binary data associated with the chunk. Each inner node (type 2) is a sequence of 64-byte “pointer” records containing a binary form of child nodes’ DNS labels, along with a length for each child. This allows random-access retrieval of subranges of each file.

Weirdly, this is all completely standards-compliant use of DNS. The only place I’m pushing the limits is the large-ish TXT records, effectively mandating use of DNS’s fallback TCP mode.

Serving the data: stick it in SQLite, tiny server, job done

Any ordinary DNS server can be used to serve the records from a domain under one’s control.

However, I decided to hack together my own little one that could serve records straight out of a SQLite database. I used it as an excuse to experiment with Golang again for the first time in more than a decade.2

I cobbled together a program called nscdnd which uses

Similarly, a little tool called nscdn allows insertion (nscdn add) and deletion (nscdn del) of files from a database, plus retrieval and reassembly of a file from a given domain name (nscdn get).

The bulk of the interesting code is in

Okay, but is this a good idea?

¯\_(ツ)_/¯ It works surprisingly well in the limited testing I’ve done. I doubt it’s the most efficient way to transfer files, but it’s not wildly unreasonable. The idea of getting the chunks cached by caching resolvers between clients and the authoritative server seems to work well: when I re-download something, it only hits the authoritative server for the short-lived CNAME and the root-node TXT record. The other nodes in the tree seem to be cached somewhat locally to my client.

Try it out yourself!

You can At the time I originally wrote this, it was possible to retrieve files from the demo server on demo.nscdn.org, as previously mentioned:

docker run -i --rm leastfixedpoint/nscdn \
    nscdn get SEKIENAKASHITA.DEMO.NSCDN.ORG. > SekienAkashita.jpg

You can also Since then, I’ve had to take the demo server down, but you can run an nscdnd instance for a domain you control. All the following examples use docker, but you can just check out the repository and build it yourself too.

To run a server:

docker run -d -p 53:53 -p 53:53/udp \
    -v `pwd`/store.sqlite3:/data/store.sqlite3 \
    --env NSCDN_ROOT=your.domain.example.com \
    --name nscdnd \
    leastfixedpoint/nscdn

and add files to the store:

docker run -i --rm \
    -v `pwd`/store.sqlite3:/data/store.sqlite3 \
    leastfixedpoint/nscdn \
    nscdn add /data/store.sqlite3 SOMEFILENAME  < SomeFilename.bin

Then add an NS record pointing to it:

your.domain.example.com. 86400 IN NS your.nscdnd.server.example.com.

and retrieve your files:

docker run -i --rm leastfixedpoint/nscdn \
    nscdn get SOMEFILENAME.your.domain.example.com

You can also dig +short -t txt your.domain.example.com to get information about the running server.

For the demo.nscdn.org server, it looks looked like this:

$ dig +short -t txt demo.nscdn.org
"server=nscdnd" "version=v0.3.1" "SPDX-License-Identifier=AGPL-3.0-or-later" "source=https://gitlab.com/tonyg/nscdn"

Finally, you can use dig to retrieve CNAME and TXT records without using the nscdn tool:

$ dig +short -t any SEKIENAKASHITA.DEMO.NSCDN.ORG
2-NPNQZ4OQGZCXTLGFT6SGXO5TFLVZWMCFOXAV6GEYNDPYL67QSMPQ.DEMO.NSCDN.ORG.
"\000\000\000\000\000\000\000\001\000\000\000\000\000\000W^\000\000\000...

$ dig +short -t txt 1-ZYNVPQYLBJYT7EOWRUDNY6XSURS3KWOMTQUODKUCCWXXQLKZFF3Q.DEMO.NSCDN.ORG.
"\143q\254\239\254\247\253\211\131\199\152t\205b\238\207J\190\212\159INP...

(Dig presents the TXT records using the slightly peculiar decimal escaping syntax from RFC 1035.)

Future directions

Compression of individual chunks. At the moment, chunks are served uncompressed. It’d be a friendly thing to do to compress the contents of each TXT record.

Experimenting with chunk sizes. Is the distribution of chunk sizes with the current parameters reasonable? Are smaller chunks required, operationally, given we’re kind of pushing DNS to its limits here? Could larger chunk sizes work?

How does it perform? How could performance be improved?

Garbage-collection of chunks in a store. At present, running nscdn del just removes the CNAME link to a root chunk. It doesn’t traverse the graph to remove unreferenced chunks from the store.

Incremental downloads, partial downloads, recovery/repair of files.

Statistics on sharing of chunks in a store. Say you used a store to distribute multiple releases of a piece of software. It’d be interesting to know how much sharing of chunks exists among the different release files.

Download and assemble files in the browser? So DNS-over-HTTPS is a thing. What happens if we use, for example, the Cloudflare DoH API to retrieve our chunks?

JSON.stringify(await (await fetch(
    "https://cloudflare-dns.com/dns-query?name=demo.nscdn.org&type=TXT",
    { headers: { "accept": "application/dns-json" } })).json(), null, 2)
 {
      "Status": 0,
      "TC": false, "RD": true, "RA": true, "AD": false, "CD": false,
      "Question": [{
          "name": "demo.nscdn.org",
          "type": 16
        }],
      "Answer": [{
          "name": "demo.nscdn.org",
          "type": 16,
          "TTL": 284,
          "data": "\"server=nscdnd\"\"version=v0.3.1\"\"SPDX-License-Identifier=AGPL-3.0-or-later\"\"source=https://gitlab.com/tonyg/nscdn\""
        }]
}

Hmm. Promising. Unfortunately, it doesn’t seem to like the TXT records having binary data in them (bug? certainly TXT records are allowed to hold binary data, see here and then here), so it might not Just Work. Perhaps a little redesign to use Base64 in the TXT record bodies is required; or perhaps the application/dns-message binary-format API would work.

Encryption of stored files. Perhaps something simple like age.

Insertion of data across the network. While Dynamic DNS is a thing, it’s not quite suitable for this purpose. Perhaps some way of inserting or deleting files other than the current ssh-to-the-server-and-run-a-command would be nice.

Content-defined chunking of index nodes. For this quick prototype, I split non-leaf nodes in a tree into 512-pointer chunks, but it’d be nicer to split them using some variation on the CDC algorithm already used to split the raw binary data in the file. That way, incremental updates would be more resilient to insertions and deletions.

Think more about Named Data Networking. Remember Named Data Networking (NDN)? It’s an alternative Internet architecture, initially kicked off by Van Jacobson and colleagues. (It used to be known as CCN, Content-Centric Networking.) This DNS hack is a cheap-and-nasty system that has quite a bit in common with the design ideas of NDN, though obviously it’s desperately unsophisticated by comparison.

The End

Anyway, I had a lot of fun hacking this together and relearning Go, though I was a little embarrassed when I found myself spending a lot of time at the beginning browsing for a domain to buy to show it off…

  1. The distribution of chunk sizes is pretty strange though. I don’t know in what sense the “average” actually is an average; the minimum and maximum cut-offs are enforced by the algorithm though. For more detail, see the FastCDC paper

  2. So I have, uh, criticisms of go. But it’s nothing that hasn’t been said before. Let’s just say that the experience reminded me of programming JavaScript in the bad old days before TC39 really got going. Overall the experience was “ok, I guess”. 

SirTunnel, a personal ngrok alternative

Happy New Year!

From time to time I need to expose a development web site or web service to the world. In the past, I’ve used ngrok for that, and of course long ago I built ReverseHTTP which is somewhere in the same ballpark, but I recently got fed up with the state of affairs and decided to see whether there was something simple I could run myself to do the job.

I found Anders Pitman’s SirTunnel:

Minimal, self-hosted, 0-config alternative to ngrok. Caddy+OpenSSH+50 lines of Python.

It really is desperately simple. A beautiful bit of engineering. At its heart, it scripts Caddy’s API to add and remove tunnels on the fly. When you SSH into your server, you invoke the script, and for the duration of the SSH connection, a subdomain of your server’s domain forwards traffic across the SSH link.

I’ve forked the code for myself. So far, I haven’t changed much: the script cleans up stale registrations at startup, as well as at exit, in case a previous connection was interrupted somehow; and I’ve added support for forwarding to local TLS services, with optional “insecure-mode” for avoiding certificate identity checks.

To get it running on a VM in the cloud, install Caddy (there’s a caddy package for Debian bookworm and sid), then disable the systemd caddy service and enable the caddy-api service:

apt install caddy
systemctl disable caddy
systemctl enable caddy-api
systemctl stop caddy
systemctl start caddy-api

Set up a wildcard DNS record for your server - something like *.demo.example.com. Each tunnel will be made available on a subdomain of demo.example.com.

Then use the API to upload a simple “global” config. Here’s mine:

{
  "apps": {
    "http": {
      "servers": {
        "default": {
          "logs": {},
          "listen": [":443"],
          "routes": []
        }
      }
    }
  }
}

Upload it by putting it in a file caddy_global.json and run

curl -L localhost:2019/load -H 'Content-Type: application/json' -d @caddy_global.json

Then, make sure SirTunnel’s sirtunnel.py script is available somewhere on the server to your SSH user account.

At that point, to expose a local development service running on port 8443 to the world:

ssh -t -R 8443:localhost:8443 YOURSERVER path/to/sirtunnel.py YOURAPP.demo.example.com 8443

I wrapped that up in a tiny script so that I didn’t have to remember the details of that incantation, but it’s simple enough that you could easily just type it in the terminal each time.

Many thanks to Anders Pitman for a really nice piece of software!

Purely Functional Operating Systems

Apparently Peter Henderson’s 1982 paper “Purely Functional Operating Systems” is hard to find online. Some years ago, during my PhD research, I scanned the paper from a physical copy in the university library. Here’s the scan I made.

Henderson, Peter. “Purely Functional Operating Systems.” In Functional Programming and Its Applications, edited by J. Darlington, P. Henderson, and D. Turner, 177–92. Cambridge University Press, 1982.

~60x speed-up of Linux "perf"

I’ve recently been using cargo-flamegraph to profile syndicate-server.

The tool is great, but it uses perf to record and analyze profile data, and perf on Debian has a performance problem: when not linked against libbfd, it shells out to addr2line for every address it needs to look up. Thousands and thousands and thousands of incredibly short-lived processes.

Michał Sidor suggests building against libbfd, something that the Debian maintainers aren’t allowed to do.1 I tried the other approach, suggested by Steinar H. Gunderson, of patching perf to use a long-running addr2line process instead, sending queries to it over a pipe.

This works very well. What used to take endless minutes now takes a few seconds. It makes working with cargo-flamegraph much more pleasant!

With the unpatched, debian-default perf_5.10, running on about 10 seconds of activity in syndicate-server:

$ time /usr/bin/perf_5.10 script -i perf.data >/dev/null

real    12m51.499s
user    11m57.455s
sys     0m53.821s

With my patch:

$ time perf script -i perf.data >/dev/null

real    0m11.335s
user    0m11.047s
sys     0m0.309s

That’s sixty-eight times faster.

You can download the patch here.

Links:

  1. The problem is an unfortunate incompatibility of licenses: perf is GPLv2, not GPLv2+, and libbfd is GPLv3+. 

New System-layer Syndicate project

Part of a series: #squeak-phone


I’m proud to announce a new project on using the Syndicated Actor model and dataspaces to implement a system layer.

https://syndicate-lang.org/projects/2021/system-layer/

The project will be investigating the question

Could dataspaces be a suitable system layer foundation, perhaps replacing software like systemd and D-Bus?

Concretely, it will produce a prototype system layer implementation building on my existing “squeak-on-a-cellphone” prototyping work.

I’m very grateful for project sponsorship from the NLnet Foundation, as part of the NGI Zero PET programme.

New Syndicate website

I’ve just published a completely revamped and massively expanded Syndicate project website:

https://syndicate-lang.org/

Hooray!

A big part of the change is to fix the confusing terminology used in the project. From now on, I’ll try to stick to the following:

  • Syndicated actors, and the Syndicated Actor model: Like regular actors, but with replicated state, i.e. assertions connecting peers rather than just messages.

  • The notion of a dataspace: A particular kind of behaviour of a syndicated actor: tracks assertions made to it, and reacts to assertions of interest by responding with previously-learned facts.

  • Structuring of a syndicated actor via conversational concurrency and facets: Programming actors, syndicated or not, gets to be a bit of a mess if you’re restricted to using a monolithic behaviour function. Facets let you split up an actor’s behaviour into reusable composable pieces. Conversational concurrency guides you in how to carve it up.

  • Language support for syndicated actors and conversational concurrency: the Syndicate DSL.

OnScreenKeyboardMorph: Smalltalk keyboard on a phone

Part of a series: #squeak-phone


Back in October 2020, I built an on-screen keyboard for Squeak Smalltalk. It ended up being about 230 lines of code (!) in total.

I’ve been using Smalltalk as the primary UI for my experimental cellphone software stack, and I needed a way to type input. Using VNC to develop on the phone works fine, but to work on the device itself - both for day-to-day phone tasks as well as developing the system - I need a proper on-device keyboard.

This video shows the keyboard in action. As you can see, it’s not perfect! At one point I tapped ctrl while typing, leading to an unexpected pop-up menu appearing.

But the basic idea is sound. (Aside: why aren’t there any Android keyboards that are laid out like a PC keyboard, with punctuation in the right place, etc.?)

The usual shortcuts like Alt-P for “print it”, Alt-D for “do it” and Alt-I for “inspect it” all work for evaluating snippets of Smalltalk interactively.

The keyboard integrates with my previously-written touchscreen support code, with the red/blue/yellow modifier buttons affecting touches, making them into simulated left-/middle-/right-clicks, and with keyboard focus changes auto-popping-up and -down the keyboard.

Simulating mouse clicks is a temporary state of affairs that lets me use the touchscreen reasonably naturally, making use of context menus and so on, without having to make invasive changes to the input pipeline of Squeak’s Morphic.

How it works

Class OnScreenKeyboardMorph synthesizes keyboard events and injects them into the Morphic world.

OnScreenKeyboardMorph >> emitKeystroke: aCharacter
    | evt |
    evt := KeyboardEvent new
        setType: #keystroke
        buttons: modifiers
        position: 0@0
        keyValue: aCharacter asciiValue
        hand: ActiveHand
        stamp: Time millisecondClockValue.
    self resetModifiers.
    ActiveHand handleEvent: evt.

You can have arbitrarily many keyboards instantiated, but there’s a global one living in a flap at the bottom of the screen. That’s the blue tab at the bottom left of the screen you can see in the video.

OnScreenKeyboardMorph class >> rebuildFlap
    | f k |
    self flap ifNotNil: [:old |
        Flaps removeFlapTab: old keepInList: false.
        ActiveWorld reformulateUpdatingMenus].

    k := self new.
    f := FlapTab new referent: k beSticky.
    f setName: self flapId edge: #bottom color: Color blue lighter.
    k beFlap: true.

    Flaps addGlobalFlap: f.
    ActiveWorld addGlobalFlaps.
    ActiveWorld reformulateUpdatingMenus.

I had already written code to read from /dev/input/eventN, tracking each multitouch contact separately. A subclass of HandMorph overrides newKeyboardFocus: to pop the keyboard up and down as the keyboard focus comes and goes:

LinuxInputHandMorph >> newKeyboardFocus: aMorphOrNil
    aMorphOrNil
        ifNil: [OnScreenKeyboardMorph hideFlap]
        ifNotNil: [(OnScreenKeyboardMorph future: 200) raiseFlap].
    ^ super newKeyboardFocus: aMorphOrNil.

Each key is represented by an OnScreenKeyMorph. It tracks keyboard shift state by registering as a dependent of the OnScreenKeyboardMorph:

OnScreenKeyMorph >> keyboard: anOnScreenKeyboardMorph
    keyboard := anOnScreenKeyboardMorph.
    keyboard addDependent: self.

OnScreenKeyMorph >> update: aParameter
    aParameter = #keyboardShiftState ifTrue: [^ self recomputeLabel].
    super update: aParameter.

The important part, of course, is the mouse event handler for OnScreenKeyMorph:

OnScreenKeyMorph >> mouseDown: evt
    selector
        ifNil: [keyboard emitKeystroke: self activeLabel first]
        ifNotNil: [keyboard perform: selector]

The only other interesting part is the keyboard initialization routine, which builds the layout:

OnScreenKeyboardMorph >> initialize
    super initialize.
    self borderWidth: 0.
    self extent: 180 points @ 120 points.
    self layoutPolicy: TableLayout new.
    self wantsHaloFromClick: false.

    font := (TextStyle named: #Roboto) fontOfPointSize: 18.
    modifiers := 0.
    stickyModifiers := 0.
    buttons := Dictionary new.

    {
        '`~ 1! 2@ 3# 4$ 5% 6^ 7& 8* 9( 0) -_ =+' substrings, {'<-' -> #backspace}.
        {'->' -> #tab}, 'qQ wW eE rR tT yY uU iI oO pP [{ ]}' substrings.
        {'esc' -> #escape}, 'aA sS dD fF gG hH jJ kK lL ;: ''" \|' substrings, {#cr}.
        {#shift}, 'zZ xX cC vV bB nN mM ,< .> /?' substrings.
        {#ctrl. #space. #alt}.
        {'|<-' -> #home. '<' -> #arrowLeft. 'v' -> #arrowDown. '^' -> #arrowUp. '>' -> #arrowRight. '->|' -> #end.}.
    } do: [:r | self addMorphBack: (self makeRow: (r collect: [:item | self makeButton: item]))].
    self addMorphBack:
        (self makeRow:
            {
                OnScreenMouseButtonMorph new bit: 4.
                OnScreenMouseButtonMorph new bit: 2.
                OnScreenMouseButtonMorph new bit: 1.
            }).

The rest is book-keeping and simple delegation methods, like this:

OnScreenKeyboardMorph >> shift
    self toggleButton: 8

OnScreenKeyboardMorph >> space
    self emitKeystroke: Character space

and so on.

I haven’t released the whole package yet, because it’s all very much in flux, but if you’re interested, you can take a look at the changeset for the Keyboard-related code here: LinuxIO-Morphic-KeyboardMorph.st

TypeScript: Messages from Interfaces and back

UPDATE: Full code available at this gist (also embedded below).


Say you have the following TypeScript interface I that you want to invoke remotely by passing messages of type M; or that you receive messages of type M and want to handle them using an object of type I:

interface I {
    m1(a: string, b: number): boolean;
    m2(): void;
    m3(n: number): void;
    m4(x: [string, string]): { k: string, j: string };
    m5(a: string, b: string[]): number;
}

type M = { selector: "m1", args: [string, number], callback: (result: boolean) => void }
       | { selector: "m2", args: [], callback: (result: void) => void }
       | { selector: "m3", args: [number], callback: (result: void) => void }
       | { selector: "m4", args: [[string, string]], callback: (result: { k: string; j: string }) => void }
       | { selector: "m5", args: [string, string[]], callback: (result: number) => void }

Keeping things type-safe looks really tedious! There’s obviously a connection between I and M. Can we avoid writing them by hand?

Can we derive M from I? Can we derive I from M?

The answer to all of these questions is yes!1

TL;DR TypeScript lets us define generic types Messages and Methods such that M = Messages<I> and I = Methods<M>. Read on for the details.

Interface ⟶ Messages

Let’s start with what, for me, has been the common case: given an interface type, automatically produce the type of messages that can be sent to implementors of the interface.

First, how do we want to represent messages?

type Message<Selector extends ValidSelector, Args extends any[], Result> =
    Args extends never[]
    ? { selector: Selector, args: [], callback: (result: Result) => void }
    : { selector: Selector, args: Args, callback: (result: Result) => void }

type ValidSelector = string | number | symbol

I’ve taken a leaf out of Smalltalk’s book, and made a message include a selector, the name of the method the message intends to invoke, and some args, the provided arguments to the method. The Args extends never[] check is to help type inference deduce the empty argument tuple: without it, the type system won’t complain about missing arguments.

I’ve also added a callback to Message. The technique I describe here can be further extended to “asynchronous” or callbackless settings with minor modifications.

The next definition, of type Messages<I>, is where the magic happens. It expands to a union of Messages representing the methods defined in I:

type Messages<I> = MessagesProduct<I>[keyof I]
type MessagesProduct<I> = {
    [K in keyof I]: (I[K] extends (...args: infer P) => infer Q ? Message<K, P, Q> : never);
}

And that’s it! Here’s how it works:

  • MessagesProduct is a mapped type that describes a modified interface, where all (and only) the method properties of interface I are rewritten to have a Message as their type, but keeping the same key;

  • then, the ...[keyof I] part in the definition of Messages uses index types to set up a union type built from all the value types (“indexed access operator”) associated with all the keys in I (“index type query operator”).

Messages ⟶ Interface

Going in the other direction is simpler:

type Methods<M extends { selector: ValidSelector }> = {
    [S in M['selector']]: (
        M extends Message<S, infer P, infer R> ? (...args: P) => R :
        never);
}

It’s a mapped type, again, that maps union members that have Message type to an appropriate function signature. It takes advantage of TypeScript’s automatic distributivity: a union of products gets rewritten to be a product of unions. Then, in the conditional type M extends Message<...> ? ..., it projects out just exactly the member of interest again.2

This time we use the mapped type as-is instead of re-projecting it into a union using indexed access like we did with MessagesProduct above.

Type-safe interpretation of messages

Now we have types for our interfaces, and types for the messages that match them, can we write a type-safe generic perform function? Yes, we can!

function perform<I extends Methods<M>,
                 S extends ValidSelector,
                 M extends Message<S, any, any>>(i: I, m: M): void
{
    m.callback(i[m.selector](... m.args));
}

An example

Given the above definition for I, actually using Messages<I> produces the following type definition3 (according to the IDE that I use):

type M = Message<"m1", [a: string, b: number], boolean>
       | Message<"m2", [], void>
       | Message<"m3", [n: number], void>
       | Message<"m4", [x: [string, string]], { k: string; j: string }>
       | Message<"m5", [a: string, b: string[]], number>

Conversely, given the M from the top of the file, we get the following for Methods<M>:

type I = {
    m1: (a: string, b: number) => boolean;
    m2: () => void;
    m3: (n: number) => void;
    m4: (x: [string, string]) => { k: string; j: string };
    m5: (a: string, b: string[]) => number;
}

Roundtripping works too: both Methods<Messages<I>> and Messages<Methods<M>> give what you expect.

TypeScript is really cool

It’s a fully-fledged, ergonomic realization of the research started by Sam Tobin-Hochstadt, who invented Occurrence Typing, the technology at the heart of TypeScript.

Then, building on the language itself, emacs with tide, flycheck, and company makes for a very pleasant IDE.4

Congratulations to Sam, whose ideas really have worked out amazingly well, and to the TypeScript team for producing such a polished and pleasant language.


Appendix: Full code implementing this idea

This module implements the idea described in this article, extended with the notion of EventMessages, which don’t have a callback.

// This Tuple type (and tuple() function) is a hack to induce
// TypeScript to infer tuple types rather than array types. (Source:
// https://github.com/microsoft/TypeScript/issues/27179#issuecomment-422606990)
//
// Without it, [123, 'hi', true] will often get the type (string |
// number | boolean)[] instead of [number, string, boolean].
//
export type Tuple = any[] | [any];
export const tuple = <A extends Tuple>(... args: A) => args;

// Type ValidSelector captures TypeScript's notion of a valid object
// property name.
//
export type ValidSelector = string | number | symbol;

export type EventMessage<Selector extends ValidSelector, Args extends any[]> =
    { selector: Selector, args: Args };

export type RequestMessage<Selector extends ValidSelector, Args extends any[], Result extends Exclude<any, void>> =
    { selector: Selector, args: Args, callback: (result: Result) => void };

export type Message<Selector extends ValidSelector, Args extends any[], Result> =
    void extends Result ? EventMessage<Selector, Args> : RequestMessage<Selector, Args, Result>;

// Function message() is needed for similar reasons to tuple() above:
// to help TypeScript infer the correct literal type for the selector
// (as well as the arguments).
//
export const message = <S extends ValidSelector, A extends Tuple, R>(m: Message<S, A, R>) => m;

type MessagesProduct<I, ContextArgs extends any[]> = {
    [K in keyof I]: (I[K] extends (...args: [...ContextArgs, ...infer P]) => infer Q
        ? Message<K, P, Q>
        : never);
};

export type Messages<I, ContextArgs extends any[] = []> = MessagesProduct<I, ContextArgs>[keyof I];

export type Methods<M extends { selector: ValidSelector }, ContextArgs extends any[] = []> = {
    [S in M['selector']]: (
        M extends RequestMessage<S, infer P, infer R>
            ? (void extends R ? never : (...args: [...ContextArgs, ...P]) => R)
            : (M extends EventMessage<S, infer P>
                ? (...args: [...ContextArgs, ...P]) => void
               : never));
};

export function perform<I extends Methods<M, ContextArgs>,
                        S extends ValidSelector,
                        M extends RequestMessage<S, Tuple, any>,
                        ContextArgs extends any[] = []>
    (i: I, m: M, ...ctxt: ContextArgs): (M extends RequestMessage<S, Tuple, infer R> ? R : never);
export function perform<I extends Methods<M, ContextArgs>,
                        S extends ValidSelector,
                        M extends EventMessage<S, Tuple>,
                        ContextArgs extends any[] = []>
    (i: I, m: M, ...ctxt: ContextArgs): void;
export function perform<I extends Methods<M, ContextArgs>,
                        S extends ValidSelector,
                        M extends RequestMessage<S, Tuple, any>,
                        ContextArgs extends any[] = []>
    (i: I, m: M, ...ctxt: ContextArgs): any
{
    const r = i[m.selector](...ctxt, ... m.args);
    m.callback?.(r);
    return r;
}

  1. Well, at least in TypeScript v4.x, anyway. I don’t know about earlier versions. 

  2. Actually I’ll admit to not being quite sure that this is what’s really going on here. TypeScript’s unions feel a bit murky: there’s been more than one occasion I’ve been surprised at what a union-of-products has been automatically “simplified” (?) into. 

  3. Hey, what’s going on with those named tuple slots? I would have expected a tuple type like [number, string] not to be able to have names attached to the slots, but it turns out I’m wrong and the compiler at least internally propagates names in some circumstances! It even re-uses them if you convert a Messages<I> back into an interface, Methods<Messages<I>>… 

  4. Here’s my .emacs TypeScript setup, based on the examples in the tide manual:

    (defun setup-tide-mode ()
      (interactive)
      (tide-setup)
      (flycheck-mode +1)
      (setq flycheck-check-syntax-automatically '(save mode-enabled))
      (eldoc-mode +1)
      (tide-hl-identifier-mode +1)
      (company-mode +1)
      (local-set-key (kbd "TAB") #'company-indent-or-complete-common)
      (local-set-key (kbd "C-<return>") #'tide-fix))
    (setq company-tooltip-align-annotations t)
    (add-hook 'typescript-mode-hook #'setup-tide-mode)