Fexprs remain inscrutable
In the last few weeks, a bit of discussion of John Shutt’s Fexprbased programming language, Kernel, has sprung up in various circles. I read Shutt’s PhD thesis, implemented a couple of toy Kernelinspired interpreters (in Javascript and Racket), and became simultaneously excited about the simplicity, hygiene and power of the approach and apprehensive about the strong possibility that there was something fundamental I was overlooking. After all, the theory of Fexprs is trivial.
Calling Kernel’s Fexprs “trivial” doesn’t seem quite right to me; perhaps calling them “inscrutable” would be a better fit.
Three serious problems with Kernel and $vau
After a discussion with a friend earlier this week, I think it’s
important to note three serious problems with Kernel and the
underlying $vau
calculi Shutt proposes:

It’s not possible even to compute the free variables of a term in
$vau
calculus in general.^{1} This makes compilation, automatic refactoring, and crossreferencing impossible in general. 
There are too many contexts that tell terms apart; contextual equivalence is too fine. Consider these two terms:
($lambda (f) (f (+ 3 4))) ($lambda (f) (f (+ 4 3)))
These two terms are not contextually equivalent, because if you pass an operative^{2} as the argument to the function, the operative can inspect the syntactic structure of the program.

This overfine contextual equivalence is a problem in practice, not just a theoretical concern. Consider this following definition of a
foldr
function:($define! foldr ($lambda (c n xs) ($if (null? xs) n (c (car xs) (foldr c n (cdr xs))))))
See how
c
has control over whether and when the recursive case is evaluated, because it is not required to evaluate its arguments! Compare this to the same definition where the last line is replaced with($let ((ys (foldr c n (cdr xs)))) (c (car xs) ys))
By introducing the intermediate
$let
, we take control over the recursion away fromc
, forcing evaluation of the recursive case.To show concretely why this is a problem, the call
(foldr $and #t somelist)
will either be a shortcircuiting “and” or not, depending on an implementation detail of thefoldr
function!
Taken together, we see that purely local reasoning about Kernel programs is impossible; or at least, getting useful results from such reasoning is impossible in general. Only wholeprogram analysis can give us enough stability to let us reach useful conclusions about our programs.
On the other hand, it’s possible to mitigate these problems somewhat.
First of all, it might be possible to use online partial evaluation to compute the various static properties of terms we’re used to from the lambdacalculus in many (but certainly not all) situations. Undecidable situations might be able to be warned about. This might make the language stable enough to work in practice.
Secondly, by shifting one’s perspective, the inequivalence of the two
variants of foldr
given above might be able to be seen as a
feature: applicatives, after all, are not operatives in
Kernel. Personally, I find it unsettling, but that could be my Scheme
heritage showing. Documentation for Kernel applicatives would need
to specify what possible computations any higherorder parameters
might receive as arguments. The foldr
given above can be
simultaneously CPS and nonCPS, when given operative and applicative
arguments, respectively.
Both these possible mitigations are speculative and weak.
Reasons to be cheerful
I think that the core idea of separating argument evaluation from argument transmission (or function call; I’m unsure what to call it) is a very powerful one. Consider a distributed system: evaluation of arguments has to happen on the sending side, and transmission of the results of evaluation to the receiving side (where instantiation of the function body happens) is a separate operation.
I also think the simplicity of implementation of a Kernelstyle Fexpr based system is worth paying attention to, especially given that it makes writing hygienic “macros” effortless, and in the process handily avoids the thorny problem of exactly what “hygiene” is in the first place. Achieving such easeofuse takes heavyduty macrology in languages like Scheme that lack firstclass macros or Fexprs.
Having our cake and eating it, while also enjoying improved security
Consider securing Kernel in the same way that a capabilitybased approach can secure Scheme. To secure Kernel as it stands, each applicative has to guard against possibly receiving some operative as an argument.
To make it easier to secure, we might make the system signal an error when an attempt is made to pass an operative as an argument to an applicative.
This makes Fexprs/operatives/”macros” secondclass, just as they are in Scheme. I conjecture that doing so has the following effects:

it makes contextual equivalence useful again, making the two variants of
foldr
given above indistinguishable^{3}, but 
it simultaneously makes the system exactly equivalent in power to Scheme with, say, a
syntaxcase
macro system.
So by doing this, we’ve given up most of Kernel’s approach, seemingly in exchange for, well, familiar old Scheme. We haven’t lost everything, though: we’ve retained the simplicity and power of Kernel’s approach to hygiene.
Conclusions
Without quite a bit of further research, I don’t think Kernelstyle languages can be used to build secure or efficiently compiled systems. Making changes necessary to build a secure variant of Kernel may destroy many of the properties that make the language interesting in the first place, but doing so might leave us in a familiar setting seen through new eyes: Scheme.

I thought it was weird, actually, that this wasn’t pointed out in Shutt’s thesis, which refers to a function
FV(x)
without defining it and without further discussion. ↩ 
Operatives in Kernel are roughly equivalent to Fexprs in older lisps. ↩

While it doesn’t erase the distinction between
($lambda (x) (f x))
andf
, it does make it amenable to local reasoning. This outcome is actually just the same as in Schemewithmacros, where that particular equivalence doesn’t hold either. ↩
Comments (closed)
To properly appreciate the outstanding challenges for fexprs, one has to first grok why they work in theory. And it does seem there's something fundamental missing from the above discussion. To unravel it, one might start with this: symbols are not variables. Symbols are source code, variables are not. Determining the scope of a symbol may be a Turingcomplete problem; determining whether a variable is free in a term is trivial, just as it would be in lambdacalculus. (Btw, it's *not* $vaucalculus, it's vaucalculus; $vau is an objectlanguage device, vau is a calculus device. That's not nitpicking, it's at the heart of the matter.)
The calculus has two separate parts: one part for source expressions, with no rewrite rules at all, and a second part for fexpr calls, whose central rewrite rule is with deadly accuracy called the betarule. And bridging the gap between those two parts is the machinery of evaluation, which looks up symbols, converts pairs into combinations, and unwraps applicatives while scheduling evaluation of their operands.
The sourceexpression part of the calculus has a trivial theory; source expressions are data, observables, not operationally equivalent to each other. They're irreducible in the full calculus.
The fexprcalling part of the calculus has a nontrivial theory that is, essentially, the theory of lambdacalculus. I admit, I was floored when I realized the theory of vaucalculus was not *only* nontrivial, but in fact had the whole theory of lambdacalculus inside it. It was more shocking because I'd been working with vaucalculus for several years by that time, without even noticing.
The evaluation bridge turns the whole calculus into a device for deducing the sort of partial evaluation you're talking about. A major punchline of Chapter 9 of the dissertation is that the pure calculus doesn't need variables to model pure Kernel, but does need them to do much partial evaluation. In later chapters, when I introduce other kinds of variables to manage impurities, the kind of variables bound by the vau operator are called "partialevaluation variables".
Hi John, glad to see you here! Thanks for your comments.
Re the "free variable" question: I'd like to be able to ask "Is b free in ($lambda () (a b))?"; do you think that is a sensible question? Can it be answered? What about "Is b free in ($lambda (a) (a b))?"
Re that lambdacalculus is contained within vaucalculus: I'd love to see a more full explanation of this, spelling it out in detail including the consequences for designs of full programming languages based on vaucalculus. I found myself missing a discussion of the relationship between the lambdacalculus, the vaucalculus, and the Kernel source language in chapter 15 of your thesis; in particular, it seems that most people will want to be using $lambda rather than $vau when programming in Kernel, and it's not clear to me how the connection extends up through the evaluation of source expressions. (Pardon me if I get the terminology wrong here.)
Concerning "Is b free in ($lambda () (a b))", and the like. Before this would become meaningful, you'd have to frame it in terms of a calculus, so it becomes clear precisely what you want to know. ($lambda () (a b)) is a data structure. What you presumably want to know is, what would happen if that data structure were evaluated. So you're really asking about the behavior of some vaucalculus term of the form [eval ($lambda () (a b)) e] where e is some environment. You want to know whether symbol b is sure to be evaluated in e rather than in some other environment. And once one frames the question that way, it's very clear that the answer cannot possibly be determined without knowing something about e. In fact, it's really a question about e. What bindings of $lambda and a are visible in e, and what are the implications of those bindings for the evaluation of symbol b in this particular expression? Perhaps we only know that e is is the state of a certain environment sometime during a program run, and we then hope to prove that that environment can only be mutated in certain ways. And so on.
Concerning lambdacalculus and vaucalculus. There are the *symbols* $lambda and $vau, which for a calculus are just constants. And then there are \lambda and \vau, which are operators in the calculi. (Since this isn't LaTeX, if I need to write a \lambda or \vauabstraction I'll delimit it with square brakets, so parentheses are only for pairs, i.e., cons cells.)
The relationship between \lambda and \vau is that they are, essentially, just two different ways of writing exactly the same thing. The basic difference between \lambdacalculus and \vaucalculus is that \lambdacalculus has no pairs, and no evaluation. So it isn't really meaningful to even ask about $lambda expressions or $vau expressions with reference to \lambdacalculus. In \vaucalculus, expressions of the form [eval <expr> <env>] get rewritten so that they start to have subterms of forms such as [\vau <param> <body>] and [combine <combiner> <operands> <env>]. And this is true regardless of how many or few $lambda's or $vau's the original <expr> had in it. So you'll understand, I hope, when I say that it doesn't really matter, for applicability of \vaucalculus, whether the programmer uses a lot of $lambda's and very few $vau's.
I too would be interested to find out just how many opportunities there are *in practice* to apply the sorts of "partial evaluation" simplifications afforded by \vaucalculus. The calculus is a tool for asking that question. The answer might well provide great insights into subtle (or gross) improvements of the language design.
In case you happen not to have seen one or both of these, they might interest you:
http://axisofeval.blogspot.com...
http://lambdatheultimate.org...
[I have since repaired the mistake that led to comments appearing at different URLs.  tonyg]
(I remember *planning* to comment on this, quite some time back; apparently I didn't, but I'm not sure why not.)
It's just as easy to determine the free variables of a term in vaucalculi as in lambdacalculi (nothing essential is changed by the additional binding constructs and classes of variables). You're presumably thinking of symbols, whose interpretation may be statically undecidable; but symbols are data, they aren't variables, and that data/computation distinction is what empowers a nontrivial theory of fexprs. A key issue in optimization is figuring out when you can guarantee things about the evaluation of data, and therefore eliminate it.
The following two expressions produce contextually equivalent objects (barring use of eq?), provided they're evaluated in a stable environment with standard bindings.
($lambda (f) (apply f (+ 3 4)))
($lambda (f) (apply f (+ 4 3)))
What best to make of that technique is an interesting question, but it does suggest that the power afforded by Kernel isn't inherently untameable.
[I have since repaired the mistake that led to comments appearing at different URLs.  tonyg]
Hi John; you *did* comment on this, but technical issues (!) mean that the earlier comments are only visible at this subtly different URL: http://www.eightytwenty.org//... :( I made a mess of the disqus configuration at one point, and it doesn't seem possible to fix older comments. :(
Regarding free variables, I'm thinking of source code. When I write ($lambda (a) (a b)), I'd like to know whether b is free in that expression. The problem is that I can't decide that until I have a complete program to look at, because I could always pass an operative in as `a` which doesn't examine its argument. This makes static analysis of source code tricky.
Similarly, I could pass in an operative for `f` to ($lambda (f) (f (+ 3 4))) that could tell the difference between being given (+ 3 4) and being given (+ 4 3). That's a very fine contextual equivalence indeed.
Until you have a complete program to look at, you don't even know what $lambda means, so it might not be meaningful to ask whether b is "free". It also seems the possibilities are more complicated than merely "free" or "bound". We did go over this territory in the before, I see.
My point about using apply was that *some* terms have larger operational equivalence classes than others. You can write the applicative so that passing in an operative breaks its encapsulation, but you can also write the applicative so that doesn't happen. Other techniques could be used, of course; apply is just one that's built in. For example,
($define! appv ($lambda (c) ($if (operative? c) (wrap c) c)))
($lambda (f) ((appv f) (+ 3 4)))
The dangerous thing (using f in operator position) is not difficult to do by accident, but these techniques are a starting point for considering how to make the accident less likely.
John wrote:
> Until you have a complete program to look at, you don't even know what $lambda means, so it might not be meaningful to ask whether b is "free".
That's a dealbreaker for any real system: I don't want to have to look at 10^6 lines of code in order to figure out what my 10line function does.
The whole point of "Fexprs are Trivial" was that fexprs make it impossible to analyze program FRAGMENTS, which is in practice something we need to do all the time.
In this comment, John seems to agree that this holds for vau, too.
(I'm going to address the issues in the above comment; I do recommend, though, that the commenter do a selfdiagnostic on the debate tactics they have employed.)
The above reasoning apparently elides the same distinction as the 1998 paper: it does not distinguish between source code and reasoning about source code. That elision leads to a problematic theoretical inference from the paper, and a problematic practical inference here.
The 1998 paper could be said, equivalently, either to assume all calculus terms are source code, or to assume all reasoning about source code must be sourcetosource. There would be no difficulty with that as an explicit premise (as opposed to an implicit assumption); one could call the paper something such as "The Theory of Fexpr SourcetoSource Equivalence is Trivial" or on the other side of the coin, "The Theory of Perfect Reflection is Trivial", which, like the actual title of the paper, is catchy and everyone would be alert to the specificity of the result derived by the paper. The unfortunate reality is that the assumption is not presented in such an explicit way, and what readers seem likely to take away from the paper is "fexprs cannot be reasoned about formally". The proposition that we can talk about something with precision, yet cannot reason about it formally, should set off alarm claxons in one's mind. Nor is it sufficient to qualify merely that one can only reason about whole programs. We can obviously make formally provable statements about program fragments  conditional statements about them. The usefulness of those conditional statements depends on the nature and likelihood of the conditions.
Now, let's look at the practical side of this. I've got a source expression, which is part of a larger program. Let's say the expression is ten lines, and the program is a million lines. When I make a conditional statement about the meaning of this expression, I'm not going to state the condition in terms of a set of possible sourcecode contexts in which the fragment might occur. I'm going to state the condition in terms of environments in which the fragment might be evaluated. And if the interpreter (I mean that term in its broad sense, including compilers) does static analysis, that too will be in terms of environments. Notice that environments are *not source code*  we are *not* reasoning only about source code. Our reasoning involves source code, but also involves something that cannot be read from a source file (although, since Kernel happens to have firstclass environments, it might result from *evaluating* source code).
Why does it sound so scary that the surrounding program is a million lines long? Because of an implicit assumption that the difficulty of deducing the environment is proportional to the length of the surrounding program. That assumption should not be implicit. My work makes a *big deal* out of the importance of designing any language with fexprs so as to maximize deductions about environments; you will find it discussed in Chapter 5 of my dissertation (but I don't mean to discourage reading that document from the beginning). The strengths and weaknesses of the Kernel design provisions for this, are an interesting topic; I've never claimed, nor supposed, Kernel was some panacea to cure all the ills of programming, never to be improved upon, and one can't expect to improve on it without studying its weaknesses; but studying the weaknesses of those provisions is entirely different from overlooking their existence. They do exist.
You are correct, of course, that one can state theorems about program fragments in the form of conditionals.
Can you state the conditions under which the two fragments in the original post($lambda (f) (f (+ 3 4)))
($lambda (f) (f (+ 4 3)))
are equivalent?
If I write a compiler, how can my compiler verify those conditions in order to transform one of these into the other, or into
($lambda (f) (f 7)) ?
oops, that got botched in formatting. The code fragments are, of course
($lambda (f) (f (+ 3 4)))
($lambda (f) (f (+ 4 3)))
($lambda (f) (f 7))
Hopefully this will come out with better formatting.
I like that question :). Okay, let's see. Two conditions would suffice  one about the static environment, and one about the way the results of evaluation will be used. (Necessaryandsufficient conditions would of course be undecidable.)
(1) The compiler can prove that both expressions will be evaluated in an environment where $lambda has its standard binding when they are evaluated, and where + will have its standard binding when the resulting applicatives are *called*. It doesn't matter what binding $lambda has when they are called, nor what binding + has when they are created.
This brings up another point. Source expressions are passive data, and optimizations pertain to active terms (with meanings like "evaluate this expression in that environment"). So the question probably shouldn't be when those first two terms could be transformed into the third, but rather when evaluations of all three could be transformed to the same optimized active term.
(2) The compiler can prove that the applicatives, which result from evaluating these two expressions, will (wait for it) not be called with an operative argument. Seriously. I could, and won't just now, write a whole little essay about issues surrounding the undesirability of putting a parameter in operator position like this. But there *are* circumstances, not altogether implausible, in which the compiler could make this deduction. The compiler might prove that the applicatives will never escape from a closed environment in which all calls to them can be identified by the compiler (standard features like $provide! and $letsafe may help with this). Or, even if the applicatives themselves might be called in uncontrolled situations where they might be given an operative, one might be able to prove that some *particular* calls to the applicatives will not be passed operatives, and optimize those calls even if one can't optimize the applicatives themselves in general.
In the interest of making it difficult to do dangerous things by accident, I would suggest you reverse the syntactic requirement: the path of least resistance (f (+ 3 4)) is equivalent to using apply, and more explicit code  e.g. ($ f (+ 3 4))  allows f to be an operative.
You don't locally know whether the ($lambda ...) term as a whole will be evaluated vs. inspected by another operative. A compiler or programmer couldn't decide even the first sentence of (1) without observing the whole program.
Given runtime representation of the environment, it can also be difficult to statically determine whether and how a term will be used at runtime.
Briefly (the severe indentation is likely trying to tell us something). There are various measures in the language to facilitate provably stable environments. Among these:
Environment mutation primitives only affect local bindings, not visible bindings in ancestors; so an ancestor can be provably stable even though visible in uncontrolled contexts. Among other things, this means a large module can protect its toplevel environment with its toplevel statements (things like $let, $letsafe, $provide!), so the stability of the toplevel environment doesn't depend on lowerlevel details.
The tentatively conjectured module system supposes that each individually compiled module has a toplevel environment that starts out standard, isolating imports from other modules so they can't accidentally corrupt the standard bindings.
I've considered this before. Not everything I'll say about it is negative.
First: It *should* be possible to build this arrangement on top of standard Kernel. Moreover, if it *isn't* possible to do so, there'd be important things to learn from *why* it isn't possible. It has seemed to me for many years that a whole class of such alterations possible in *theory* are far less so in *practice* because they would really require replacing the entire standard environment. Deep issues ensue; perhaps to tackle on my blog (where I have a vast backlog).
Second: The arrangement disrupts the uniformity of the language core, making the useful functioning of the evaluator algorithm dependent on a distinguished combiner (you call it $, a name I've considered for... something, but it'd have to be incredibly fundamental and abstractioninvariant to justify using up that name  the combiner here should be a pretty rare operation, so imo wouldn't warrant that name). That distinguished combiner is closer to being a "special form" than I'd have thought one could get without reserved keywords.
The nonuniformity is a big deal partly because it's a slippery slope. If we're tempted to do this, there will be lots of other things waiting in the wings to tempt us *just a little* further. Note my comment early in the Kernel Report about resonance from purity. An incredibly "smooth" language core means that whatever you build on top of it will effectively inherit flexibility from the existence of the underlying smoothness (also why the operative/wrapper is key even if some abstraction layer were to sequester it where the programmer can't directly access it). Which further favors a very pure language core which one then hopes to build "on top of".
Third: This does seem like it could have the desired effect, in a simple sort of way. Oddly enough, an alternative I've been musing on for some years is far less simple... at least in implementation. I'll likely blog about it eventually  "guarded environments", akin to Kernel's guarded continuations. (Guarded environments are mentioned in a footnote somewhere in my dissertation.)
I wasn't even considering mutation. It is the wide and dynamic potential for observation of a term that causes difficulty.
Hm? The other subthread, below, relates to visibility; this one is about binding stability.
Uh, no. This subthread is about how terms are observed or distinguished. Stability was never part of it.