Unrolling Recursion, the Golden Ratio, and Partial Evaluation

The little language shootout I ran a few weeks ago showed that Java had a significant edge over the other contestants. I speculated at the time that it might have been because Java was unrolling the fib loop once, inlining one of the recursive invocations, which would have removed one layer from the call tree - essentially halving the runtime. (More accurately, dividing the runtime by φ = 1.618...)

I just now ran the same program unrolled once by hand, and with the unrolled version the C code has almost identical (well within the margin of error!) runtime to the Java code. Here's the faster C code:

static unsigned long fib(int n) {
  if (n < 2) {
    return 1;
  } else {
    if (n < 3) {
      return 2;
    } else {
      return fib(n - 1) + fib(n - 2);

Essentially, it's inlining that recurses once, allowing a self-recursive procedure to inline one layer of self-invocation. It's a neat trick - it's discussed in section of Craig Chambers's thesis on the earlier optimizing Self compiler. Self only inlines once (i.e. nonrecursively) - clearly, Java inlines twice occasionally.

Further unrolling the recursion gives commensurate speedups, as you would expect: each unrolling divides the runtime by φ, roughly, so you can easily speed up the C code by about a factor of ten compared to the completely non-unrolled version by unrolling until you return 13 constantly after an explicit test that n < 7.

All this inlining/unrolling/constant-propagation seems really closely related to partial evaluation, too... Hidehiko Masuhara's PhD thesis (U. Tokyo, 1999) is an interesting read, applying partial evaluation to reflective interpreters to give vastly improved performance for behaviourally-reflective languages. Integrating a partial evaluation system with a Self-like dynamic optimising compiler seems like a worthwhile project - something that is likely to be necessary for the implementation of ThiNG, perhaps, given how reflective I imagine it will end up being.