I’ve just finished reading “Xoc, an Extension-Oriented Compiler for
Systems Programming”, by Russ Cox, Tom Bergany, Austin
T. Clements, Frans Kaashoek, and Eddie Kohlery. It’s a description of
an interesting approach to extensible compilation, for C in this case,
that refines a lot of ideas that have been explored previously,
bringing them together in a more compositional style than previous
extensible compilers have managed to achieve.
One of the key themes in the paper is how to control extension
visibility. They describe a couple of approaches they take: one, a
type system used to ensure that extensions to abstract syntax are only
visible to other extensions when those extensions explicitly declare
knowledge of them; and two, an approach from Warth et al.’s
“Statically scoped object adaptation with expanders” paper for
controlling visibility of extensions to core data structures. I wonder
if the latter is used to underpin the former at all.
Regarding the type system approach they take, it’s interesting how it
lets the extension writer provide analyses directly in terms of
extension abstract syntax, rather than forcing translation to a core
language before performing any analysis. We did the latter in
Highwire, and the error messages referred to core-language constructs
which didn’t map well to source-language syntax: the user was forced
to unexpand the macro in their heads to make sense of the output of
the compiler. The paper refers to this problem in section 5.2, where
the authors write “Since macros cannot access compiler internals, they
typically do not even type check their arguments, producing cryptic
errors when invoked incorrectly.”
Their compiler, xoc, is a source-to-source translator at the moment,
which means it’s not integrated with code generation and backend
optimisations and rewrites. It’d be fascinating to build a complete
compiler in this style. It’d also be neat to relate this approach to
extensible semantics. At the moment the extensions look quite
ad-hoc. How could this framework be used, for example, to extend
Scheme or ML, where there’s a real semantics underneath it?
One nice feature of xoc is that it tries to preserve hygiene during
rewriting. I’m curious as to whether it preserves hygiene to the
degree that, say, syntax-case
does; their description of the feature
was a bit terse for me to be able to tell. In particular I’m
interested in how it treats keywords, and where “fresh names” come in
to the picture; the paper speaks simply of the code emitter renaming
variables “when two distinct variables share a name”.
They use a GLR parser for converting concrete syntax to abstract
syntax. One of the benefits of this is its ability to deal with
ambiguity, which they exploit in a couple of ways, most ingeniously in
their approach to type inference (if you like) for syntax pattern
variables. I’d like to learn more about how xoc computes the type of a
syntax pattern variable; I wonder if a similar system could be ginned
up for use with OMeta?
The laziness of most of xoc’s attribute computations is rather
nifty. Code generated relatively late in a compilation, when earlier
passes are complete for the rest of the code, can have necessary
information filled in lazily on demand, running just those parts of
the previous analyses that are required.
I’d like to see some of the pattern-matching and reconstruction
features of xoc employed in the context of general purpose
computing. It seems to me that using a GLR parser the way they do
could effectively provide concrete domain-specific syntax for
algebraic data types, not just abstract syntax trees. (The question
arises: when is an AST an ADT? Always?) Being able to cons up syntax
for new types as easily as Haskell and ML cons up the types themselves
seems like a win it’s worth experimenting with. OMeta goes a step in
this direction; I hope to explore it myself sometime soon.