I’ve just finished reading “Xoc, an Extension-Oriented Compiler for Systems Programming”, by Russ Cox, Tom Bergany, Austin T. Clements, Frans Kaashoek, and Eddie Kohlery. It’s a description of an interesting approach to extensible compilation, for C in this case, that refines a lot of ideas that have been explored previously, bringing them together in a more compositional style than previous extensible compilers have managed to achieve.
One of the key themes in the paper is how to control extension visibility. They describe a couple of approaches they take: one, a type system used to ensure that extensions to abstract syntax are only visible to other extensions when those extensions explicitly declare knowledge of them; and two, an approach from Warth et al.’s “Statically scoped object adaptation with expanders” paper for controlling visibility of extensions to core data structures. I wonder if the latter is used to underpin the former at all.
Regarding the type system approach they take, it’s interesting how it lets the extension writer provide analyses directly in terms of extension abstract syntax, rather than forcing translation to a core language before performing any analysis. We did the latter in Highwire, and the error messages referred to core-language constructs which didn’t map well to source-language syntax: the user was forced to unexpand the macro in their heads to make sense of the output of the compiler. The paper refers to this problem in section 5.2, where the authors write “Since macros cannot access compiler internals, they typically do not even type check their arguments, producing cryptic errors when invoked incorrectly.”
Their compiler, xoc, is a source-to-source translator at the moment, which means it’s not integrated with code generation and backend optimisations and rewrites. It’d be fascinating to build a complete compiler in this style. It’d also be neat to relate this approach to extensible semantics. At the moment the extensions look quite ad-hoc. How could this framework be used, for example, to extend Scheme or ML, where there’s a real semantics underneath it?
One nice feature of xoc is that it tries to preserve hygiene during
rewriting. I’m curious as to whether it preserves hygiene to the
degree that, say,
syntax-case does; their description of the feature
was a bit terse for me to be able to tell. In particular I’m
interested in how it treats keywords, and where “fresh names” come in
to the picture; the paper speaks simply of the code emitter renaming
variables “when two distinct variables share a name”.
They use a GLR parser for converting concrete syntax to abstract syntax. One of the benefits of this is its ability to deal with ambiguity, which they exploit in a couple of ways, most ingeniously in their approach to type inference (if you like) for syntax pattern variables. I’d like to learn more about how xoc computes the type of a syntax pattern variable; I wonder if a similar system could be ginned up for use with OMeta?
The laziness of most of xoc’s attribute computations is rather nifty. Code generated relatively late in a compilation, when earlier passes are complete for the rest of the code, can have necessary information filled in lazily on demand, running just those parts of the previous analyses that are required.
I’d like to see some of the pattern-matching and reconstruction features of xoc employed in the context of general purpose computing. It seems to me that using a GLR parser the way they do could effectively provide concrete domain-specific syntax for algebraic data types, not just abstract syntax trees. (The question arises: when is an AST an ADT? Always?) Being able to cons up syntax for new types as easily as Haskell and ML cons up the types themselves seems like a win it’s worth experimenting with. OMeta goes a step in this direction; I hope to explore it myself sometime soon.