Co-optional arguments?

Racket and many other languages have the concept of optional arguments at the receiver side:

(define (a-procedure mandatory-arg #:foo [foo 123])
  ;; Use mandatory-arg and foo as normal
  ...)

(a-procedure mandatory-value #:foo 234) ;; override foo
(a-procedure mandatory-value) ;; leave foo alone

Here, the procedure is offering its caller the option of supplying a value for foo, and if the caller does not do so, uses 123 as the value for foo. The caller is permitted not to care (or even to know!) about the #:foo option.

Almost no language that I know of has the symmetric case: an optional argument at the caller side (and because the feature doesn’t exist, I’m having to invent syntax to show what I mean):

(define (run-callback callback)
  (callback mandatory-value [#:foo 234]))

(run-callback (lambda (mandatory-arg) ...)) ;; ignores the extra value
(run-callback (lambda (mandatory-arg #:foo [foo 123]) ...)) ;; uses it

The intent here is that the procedure is permitted not to care (or even to know) about the additional value its caller supplies.

This would be useful for callbacks and other situations where the code invoking a procedure has potentially-useful information that the specific (and statically-unknown) procedure being called may or may not care about.

I said above that almost no languages have this feature: it turns out that Squeak and Pharo have something similar in their BlockClosure>>#cull: methods. The following code works when given a closure accepting zero, one or two arguments:

someBlock cull: 1 cull: 2 cull: 3

(Update: @msimoni points out that Common Lisp’s allow-other-keys is similar, too. It still requires the callee to declare something extra, though.)

The Software Crisis, 2012 edition

This morning’s breakfast experiment involves rough estimation of the changes in cost and capacity of computing power over the last few years, and imagining where that will leave us in ten years.

In summary: hardware is already superpowered. The gap between hardware and software is already huge. It’s only going to get worse. What is to be done about the parlous state of software?

SD Cards

Rough googling suggests a 512MB card cost about $60 in 2005. That’s around $68 in 2012 dollars. A commodity 32GB card costs about $25 today. Assuming exponential decline in price, or equivalently exponential increase in capacity for constant cost, we see that we can expect capacity per dollar to roughly double year on year.

CPU

More rough googling suggests CPU capacity (measured in GFLOPS) is increasing at roughly a factor of 1.6 per year. GPUs are improving more quickly, approximately doubling in speed each year.

DRAM

Wikipedia informs me that in 1971, DRAM cost 5c per bit, and in 1999 it cost 20µc per bit. Again assuming exponential scaling, that gives approximately a 1.56 increase in capacity year on year.

Total

Component201220172022
Storage1~32~1024
Compute1~10.5~110
RAM1~9.2~85

In five years time, expect to be working with machines that are ten times as fast, that have ten times as much RAM, and that have thirty-two times as much secondary storage as today’s machines.

In ten years time, expect machines one hundred times as fast, with one hundred times the amount of RAM, and one thousand times the amount of storage.

I’m not even sure how to measure progress in software. But my impression is that it isn’t keeping up its end of the bargain. Perhaps we’re seeing linear improvement, at best.

I think a big part of the problem is that our ambition hasn’t increased to match our capacity. We haven’t kept our expectations from software in line with the ability of our new hardware.

RabbitHub Status, May 2012

RabbitHub, an implementation of PubSubHubBub I built as part of my work on RabbitMQ, hasn’t been maintained for a while now. It lags the current state-of-the-art in two respects:

  1. it will need straightforward updates to work with the current versions of RabbitMQ Server, and
  2. it implements PSHB version 0.1 rather than the current 0.3.

Fixing the former would take about two days of expert work, and I don’t know how much work fixing the latter would be. Both perfectly reasonable to consider, though.

RabbitHub is unique, as far as I know, in one respect: it uses PSHB as a generic webhook-based messaging protocol, rather than the more narrow Atom distribution protocol that the designers have in mind. There are other PSHB implementations, notably the reference implementation and Superfeedr, but they all as far as I know are specialized to Atom transport. A list of hub implementations can be found at http://code.google.com/p/pubsubhubbub/wiki/Hubs.

Fame, of a sort

Per Chris Cunnington’s recent survey, it looks like two of my SqueakSource projects have more than 1000 downloads and are therefore in the top 500 projects on the site! ;-)

and I’ll somewhat cheekily claim a small portion of the credit for Max Leske’s GitFS, with 8339 downloads, since it was originally based on my project Git for Squeak at its core, but has definitely moved far from the humble beginning I gave it.

Network server programming with OCaml

Some years ago, I experimented with networking in SML/NJ and spent a few hours figuring out how to write a multithreaded TCP/IP server using SML/NJ. Here, I’ve performed the same exercise with OCaml 3.12.1.

The code for this example is on github.

Download source code, building, and running

The following example is comprised of a _tags file for controlling ocamlbuild and the .ml file itself. The complete sources:

Running the following command compiles the project:

ocamlbuild test.native

The ocamlbuild output is a native executable. The executable is placed in the _build directory, and a symbolic link to the executable is placed in the working directory. To run the program:

./test.native
The build control file

The _tags file contains

true: use_unix
true: thread

which instructs the build system to include the Unix POSIX module, which provides a BSD-sockets API, and to configure the OCaml runtime to support the Thread lightweight-threads module. For more information about ocamlbuild, see here.

The example source code

Turning to test.ml now, we first bring the contents of a few modules into scope:

open Unix
open Printf
open Thread

The Unix module, mentioned above, provides a POSIX BSD-sockets API; Printf is for formatted printing; and Thread is for multithreading. We’ll be using a single thread per connection. Other models are possible.

OCaml programs end up being written upside down, in a sense, because function definitions need to precede their use (unless mutually-recursive definitions are used). For this reason, the next chunk is conn_main, the function called in a new lightweight thread when an inbound TCP connection has been accepted. Here, it simply prints out a countdown from 10 over the course of the next five seconds or so, before closing the socket. Multiple connections end up running conn_main in independent threads of control, leading automatically to the natural and obvious interleaving of outputs on concurrent connections.

let conn_main s =
  let cout = out_channel_of_descr s in
  let rec count n =
    match n with
    | 0 ->
        fprintf cout "Bye!\r\n%!"
    | _ ->
        fprintf cout "Hello %d\r\n%!" n;
        Thread.delay 0.5;
        count (n - 1)
  in
  count 10;
  printf "Closing the connection.\n%!";
  close s

Note the mysterious %! format specifiers in the format strings: these translate into calls to flush, forcing buffered output out to the actual socket. The alternative would be to call flush cout directly ourselves.

The function that depends on conn_main is the accept loop, which repeatedly accepts a connection and spawns a connection thread for it.

let rec accept_loop sock =
  let (s, _) = accept sock in
  printf "Accepted a connection.\n%!";
  let _ = Thread.create conn_main s in
  accept_loop sock

Finally, the block of code that starts the whole service running, creating the TCP server socket and entering the accept loop. We set SO_REUSEADDR on the socket, listen on port 8989 with a connection backlog of 5, and enter the accept loop. We catch and ignore SIGPIPE in order to avoid crashing when a client departs unexpectedly.

let _ =
  Sys.set_signal Sys.sigpipe Sys.Signal_ignore;
  let sock = socket PF_INET SOCK_STREAM 0 in
  setsockopt sock SO_REUSEADDR true;
  bind sock (ADDR_INET (inet_addr_of_string "0.0.0.0", 8989));
  listen sock 5;
  printf "Entering accept loop...\n%!";
  accept_loop sock

Tasks within Transactions using RabbitMQ

In a discussion of moving from Google AppEngine to EC2, a participant mentioned that the only “odd” GAE service he made use of was transactional task queues, and that there was no direct analogue of this service in other queueing systems.

Here’s one way of building a transactional task queue using RabbitMQ and your favourite SQL database.

  • a table in the database will hold pending tasks
  • messages going through RabbitMQ will inform workers of the arrival of new tasks
  • the available tasks will be load-balanced between n workers
  • a “cleaner” process makes sure all tasks are eventually run and that failed tasks are eventually detected

In this article, I’ll spell out the requirements for the system, and discuss the details of the database tables, RabbitMQ queues, and daemon processes required. I’ll then evaluate the solution presented in terms of the requirements, and conclude by discussing a few variations on the solution that might be worth exploring.

Requirements

This is what the GAE Task Queue API has to say about transactional task queueing:

You can enqueue a task as part of a datastore transaction, such that the task is only enqueued—and guaranteed to be enqueued—if the transaction is committed successfully. Tasks added within a transaction are considered to be a part of it and have the same level of isolation and consistency.

Our requirements, then, are:

  1. all successfully-enqueued tasks should eventually run
  2. successfully-enqueued tasks should be runnable shortly after transaction commit
  3. tasks where the transaction rolled back should never run
  4. the system should be robust against race conditions
  5. successfully-enqueued tasks shouldn’t be run more than once

where a “successfully-enqueued” task is one that was enqueued as part of a successfully-committed transaction.

Tables & Queues

Create a table—let’s call it tasks —in your SQL database (here imagined to be PostgreSQL):

CREATE TABLE tasks (
  id TEXT PRIMARY KEY,
  description TEXT UNIQUE,
  start_time TIMESTAMP WITH TIME ZONE
);

Each column is important:

  • the id column uniquely names the job. Use a UUID, or the MD5 or SHA-1 hash of the description, or some other suitably-unique name for the job.

  • the description column describes the task to be performed. You could use JSON, a URL, a Java class or method name, or any other means of identifying what is to be done.

  • the start_time column is used to make sure only a single worker starts performing the task, and to detect failed tasks.

Next, create a queue in your RabbitMQ instance; let’s call it task_queue.

Queue_Declare(queue = 'task_queue', durable = True)

The tasks table will serve as the “master list” of work remaining to be done. The task_queue queue will only be used to rapidly notify workers that a new task is waiting for them. If messages are lost from that queue on occasion, no harm will be done: the “cleaner” process, described below, will pick up the pieces.

Enqueueing a Task

To enqueue a task as part of a database transaction, create a row for the task as part of the transaction, and commit normally:

BEGIN;
  -- ... the main part of your transaction goes here ...
  INSERT INTO tasks
    VALUES ('ad5b4ccf3902db006405074c721a990e', 'mytask', NULL);
COMMIT;

The new row must have its start_time column set to NULL.

At any time near the end of the transaction—either before or just after the commit—send a message to the task_queue containing the id of the new task:

Basic_Publish(exchange = '',
              routing_key = 'task_queue',
              body = 'ad5b4ccf3902db006405074c721a990e')
Worker Processes

Your workers should connect to the database and to RabbitMQ as usual. Each worker process should Basic_Consume from the task_queue. Whenever a message comes in containing a task id —for example, the ID of the task enqueued just above, ad5b4ccf3902db006405074c721a990e —the worker should:

  1. BEGIN a transaction.

  2. SELECT * FROM tasks WHERE id = 'ad5b4ccf3902db006405074c721a990e' FOR UPDATE

    The FOR UPDATE is important: it prevents a race condition if two workers accidentally start looking at the same job at the same time.

  3. If no row is returned, wait for a few seconds (the “retry time”) and retry the SELECT. The message travelling through RabbitMQ could have overtaken the database commit. If after a couple of retries (the “giveup time”) the row is still absent, it’s likely the sender’s database commit didn’t complete, so move on to waiting for the next message arriving from task_queue.

  4. Check the start_time of the returned row. If it’s anything but NULL, some other worker has already started this job, so move on to the next message.

  5. UPDATE tasks SET start_time = NOW() WHERE id = 'ad5b4ccf3902db006405074c721a990e'

  6. COMMIT the transaction. At this point, this worker instance is considered to have claimed the job.

  7. Perform the task described by the description column of the row.

  8. DELETE FROM tasks WHERE id = 'ad5b4ccf3902db006405074c721a990e'

At step 3, where a worker may give up on a given notification and move on to the next one, it’s possible that the worker might just not have waited for long enough for the record to appear in the database. Similarly, at step 4, the worker that had claimed the task may have crashed or been terminated before it was able to DELETE the task record from the database. In both cases, the “cleaner” process will make sure the job gets done eventually.

The “Cleaner” Process

The “cleaner” process should connect to the database and to RabbitMQ as usual. It should periodically—say, every minute (the “cleanup interval”) —wake up and perform the following routine:

  1. SELECT * FROM tasks WHERE start_time IS NULL

  2. Each row returned indicates a possibly-overlooked task, so a repeat message to task_queue should be sent, just like the ones described above. The worker pool will pick up the message as usual and will start work on the task. The use of FOR UPDATE in each worker ensures that the job won’t be performed twice.

  3. SELECT * FROM tasks WHERE start_time < (NOW() - INTERVAL '1 hour')

  4. Each row returned indicates a possible crashed or deadlocked task. Vary the interval as required! For some jobs, one hour is too long; for others, much too short. The “cleaner” should kick off task-specific recovery actions:

    • sometimes it’s appropriate to simply retry the job: UPDATE the start_time to NULL, and then post a message to task_queue.

    • sometimes a compensating job should be enqueued, either as a task (in which case use the normal task enqueueing process) or as a simple function call within the “cleaner”.

    • sometimes the job should be abandoned, and a “dead job” notification sent to a system operator: perhaps send an email, or send a message to an appropriate exchange using RabbitMQ.

Evaluation

How did we do in satisfying our requirements?

Requirement 1 is satisfied by the action of the “cleaner”: any time a row is successfully inserted into the tasks table, the combined action of the “cleaner” and the worker processes make sure the job will be run.

Requirement 2 is satisfied by the use of the task_queue: the message sent through RabbitMQ serves to alert a worker that an item is headed their way in the database.

Requirement 3 is satisfied by the use of transactional inserts into the tasks table: if the commit is rolled back, the row will never appear in the database, and even if a message was sent on task_queue, the worker will eventually stop waiting for the database row to arrive, and the “cleaner” will never hear of the rolled-back job to begin with.

Requirement 4 is satisfied by the use of the retry timers, which will catch most races between the database and RabbitMQ. In cases where severe delays mean that the workers give up on a job before its row appears, the “cleaner” process will notice, and will resend a job notification message to task_queue.

Requirement 5 is satisfied by the use of the start_time field in each row, in conjunction with SELECT ... FOR UPDATE to ensure that only one worker will be able to claim a job for processing.

Variations

There are plenty of variations on this basic pattern.

  1. You could make the “cleaner” process only resubmit jobs that have been waiting for more than two “cleanup intervals”, to lower the (already small) probability of (harmless) double notifications. You’d only need to do this if your database was bad at performing SELECT ... FOR UPDATE queries.

  2. You could automatically scale your worker pool: instead of posting notifications directly to the task_queue, you’d send them to a task_exchange. They’d then be distributed not only to workers but to a special “scaler” process. Each worker would send a notification to the “scaler” when each task was completed, and the “scaler” would keep statistics on the arrival rate and the completion rate of tasks. New worker process instances would be started by the “scaler” in situations where the arrival rate was greatly exceeding the completion rate, and old instances would be torn down in the opposite situation.

  3. The (highly experimental) presence exchange RabbitMQ plugin could be used to detect crashed jobs more quickly.

  4. Instead of DELETEing tasks, an additional timestamp column finish_time could be added to the table, and the queries and updates could be adjusted to use it to simulate job deletion. That way, a permanent record is kept of all the tasks the system has performed.

And, of course, the “retry time”, “giveup time”, “cleanup interval” and crashed-task-detection-interval can all be adjusted to the needs of a given system.

Finally, in cases where you don’t need transactional task queueing, systems like Celery provide an amazing wealth of general task-execution functionality, and will almost certainly serve you better than rolling your own.

Equality, Hashing and Canonicalization

I’ve just been reading “Type-Safe Modular Hash-Consing” (Filliâtre & Conchon, 2006, http://www.lri.fr/~filliatr/ftp/publis/hash-consing2.pdf) and it got me thinking about the relationship between finding a canonical element of a set via some equivalence and simply deciding the equivalence itself.

In order to hashcons, according to the paper, you need an equivalence predicate T × TBool and a hash function TInt. They have to line up. This lining up seems interesting. The hash function is naming an equivalence class of Ts - or at least it would be if it weren’t permitted to spuriously identify things sometimes! So it’s a rough equivalence class.

But my main idea is that say we were hashconsing append-lists, defined by

T ::= Nil T ++ T [T]

We’d want our equivalence predicate and our hash function to honour t ++ Nilt. But while it’s doing so, there’s also the opportunity for it to pick a canonical element: in this case it should choose just t rather than t ++ Nil, since it is structurally smaller.

So perhaps hashconsing should be expressed using a canonicalization function TT and a hash function TInt, rather than referring to an equivalence predicate at all?

In some sense a hashcons operator (per the paper taking THashconsed T) is a kind of canonicalizing function, of course, but slightly weird in that it has links to the resource management facility of the underlying VM rather than simply expressing an object-level equivalence.

I wonder what would be a good way of writing a canonicalization function. It feels a lot like a standard reduction, with the notion of reduction being rules like t ++ Nilt, Nil ++ tt etc.

So a hashconsing system is a kind of VM-supported normal-form-finding interpreter. For data, rather than programs. Huh.

Why Object-Oriented Languages Need Tail Calls

This blog post is a mirror of a post on the Project Fortress Community Blog by Guy Steele. The original posting is no longer available. I retrieved this copy of the text from the Wayback Machine.

Because the snapshot that the Internet Archive took of the original post was made on December 6, 2009, the only comments that appear are those that were made leading up to that date.

November 30, 2009: Why Object-Oriented Languages Need Tail Calls

[Note: The original blog post used the term “tail-recursion”, but in response to feedback on December 1, I have replaced this term with “tail calls” to emphasize that the issue has to do with multiple objects calling each others’ methods, not just the limited case of an object invoking its own methods or the even more limited case of a single method invoking itself (whether on the same object or another object). Also, some minor errors in the code examples have been corrected as of December 3.]

We recommend an interesting essay from the recent ACM OOPSLA Conference:

William R. Cook. On understanding data abstraction, revisited. Onward! Essays, 2009 ACM OOPSLA. SIGPLAN Notices 44, 10 (Oct. 2009), 557-572. DOI=http://doi.acm.org/10.1145/1639949.1640133 (Or here is the draft version.)

Cook makes an important distinction between abstract data types and object-oriented programming, and argues that they have related but complementary characteristics.

In this blog post we extend one of his examples in order to make a completely different point: object-oriented programming languages need tail calls correctly implemented, not just as a “trivial and optional” optimization of the use of stack space that could be achieved by using iteration statements, but in order to preserve object-oriented abstractions.

Following Cook, we begin by considering an abstraction of “sets of integer values”. At a minimum, we would like to be able to (a) obtain an empty set, (b) adjoin an integer to a set to create a new set, (c) construct a set that is the union of two given sets, (d) inquire whether a set is empty, and (e) inquire whether a set contains a specified integer. We can describe this abstraction as a Fortress API:

api IntSetAPI

trait IntSet
  isEmpty: Boolean
  adjoin(x: ZZ): IntSet
  union(other: IntSet): IntSet
  contains(x: ZZ): Boolean
end

(*) Example: `Empty.adjoin(1).adjoin(2).union(Empty.adjoin(3)).contains(2)` should be `true`.

Empty: IntSet

end IntSetAPI

(To facilitate comparison to the examples in Cook’s paper and also to the way one might code these examples in the Java programming language, we use dotted methods throughout these examples. A more Fortress-like style would be to define operators for the “union” and “contains” operations:

trait IntSet   (*) Alternate interface definition #1
  isEmpty: Boolean
  adjoin(x: ZZ): IntSet
  opr UNION(self, other: IntSet): IntSet
  opr IN(x: ZZ, self): Boolean
end

(*) Example: `2 IN (Empty.adjoin(1).adjoin(2) UNION Empty.adjoin(3))` should be `true`.

Indeed, an even more Fortress-like style would be to eliminate the adjoin operation from the interface and replace it with a constructor of singleton sets:

trait IntSet   (*) Alternate interface definition #2
  isEmpty: Boolean
  opr UNION(self, other: IntSet): IntSet
  opr IN(x: ZZ, self): Boolean
end

opr {x: ZZ}: IntSet

(*) Example: `2 IN ({1} UNION {2} UNION {3})` should be `true`.

But we digress.)

One possible implementation for the original API is as follows:

component IntSetOO
export IntSetAPI

trait IntSet
  getter isEmpty(): Boolean = false
  adjoin(x: ZZ): IntSet = AdjoinObject(self, x)
  union(other: IntSet): IntSet = UnionObject(self, other)
  contains(y: ZZ): Boolean
end

object AdjoinObject(s: IntSet, x: ZZ) extends IntSet
  contains(y: ZZ) = (y = x) OR: s.contains(x)
end

object UnionObject(s1: IntSet, s2: IntSet) extends IntSet
  isEmpty: Boolean = s1.isEmpty AND s2.isEmpty
  contains(y: ZZ) = s1.contains(x) OR: s2.contains(x)
end

object EmptyObject extends IntSet
  isEmpty: Boolean = true
  contains(_: ZZ) = false
end

Empty = EmptyObject

end IntSetOO

We use several different sorts of object to represent sets. Each implements the same abstract interface. One of Cook’s points is that, in a truly object-oriented style of programming, the implementation of each object uses only the abstract interface to communicate with other objects, allowing the implementations to be quite independent of each other, and allowing new implementations to be added easily at any time. For example, Cook shows that infinite sets can easily be represented using the same abstract interface, and we can readily generalize his example and translate it into Fortress code as follows:

object IntegersMod(n: ZZ) extends IntSet
  contains(y: ZZ) = (y MOD n = 0)
end

Now we can write an expression such as

IntegersMod(2).union(IntegersMod(3)).adjoin(7)

and expect it to behave as if it contained all integers that are a multiple of either 2 or 3 (or both), as well as the integer 7.

Now, let’s suppose that we were to construct a set of the first million prime numbers using sets implemented in this way:

do
  s: IntSet := Empty
  k: ZZ := 2
  n: ZZ := 0
  while (n < 1'000'000) do
    if isPrime(k) then
      s := s.adjoin(k)
      n += 1
    end
    k += 1
  end
  s.contains(13)
end

Yes, we know lots of ways this code can be optimized, but let us please focus on the behavior of the constructed set, which is a linear chain of one million objects of type AdjoinObject. When we ask whether the set s contains 13, the first AdjoinObject will check its x value, see that is it not 13, and then pass the buck to the next AdjoinObject. Observe that this “passing of the buck” is a tail call; to see this, note that the short-circuit Boolean OR operator is equivalent to the use of an if statement:

object AdjoinObject(s: Intset, x: ZZ) extends IntSet
  contains(y: ZZ) = if (y = x) then true else s.contains(x) end
end

if the underlying implementation fails to implement tail calls without unnecessary growth of stack, and also is unable to support nearly a million nested stack frames, then execution of this query will fail.

Many programmers, including those familiar with the style of the Java programming language, will object that the original design of the implementation was wrong-headed: if we want iterative behavior, we should use an iterative statement such as while rather than method calls. This approach is easily captured in Fortress for these AdjoinObject searches:

object AdjoinObject(s: Intset, x: ZZ) extends IntSet
  contains(y: ZZ) = do
    u: IntSet := self
    (*) The next statement has an interesting idea but is erroneous.
    while (NOT u.isEmpty) AND: (y =/= u.x) do u := u.s end
    NOT u.isEmpty
  end
end

But there is a problem here; our implementation uses not only AdjoinObject instances but also instances of UnionObject and IntegerMod. In order to maintain the iterative structure of this implementation of the contains method, we will need to make it handle these other kinds of objects as well. This requires the implementation of AdjoinObject to be aware of the implementation details of these other kinds of objects, violating our object-oriented abstractions. Every time we want to invent a new implementation of IntSet, we will also need to add a case to this while loop in AdjoinObject. (In point of fact, the while statement shown above contains static type errors. These could be remedied by using typecase. Trying to satisfy the typecase without using an else clause would then point up the need to handle all the other possible implementations of IntSet.)

We could try to abstract the iterative process: If only our sets supported Java-style iterators, we could just enumerate all the elements of the set—oops, can’t do that, because some of our represented sets are infinite. And Java-style iterators have problems of their own, not least that they rely upon side effects and therefore introduce synchronization issues in parallel code.

Even without the other kinds of objects, we can see that this latest implementation of contains for AdjoinObject violates object-oriented abstraction: the implementation inspects the field x not only for the instance on which the contains method was invoked, but the field x of other instances as well. Thus instances of AdjoinObject are abstract with respect to “the outside world”, but not with respect to each other. As Cook points out, this is typical of an abstract data type: there is conceptually a single implementation of the data type, and the code for the abstract data type is privy to these details for purposes of accessing all instances of the data type. This has its advantages; in particular, it caters to certain kinds of optimizations.

In contrast, the object-oriented style makes every object fully abstract, even to other instances of the same “kind” of object. This also has its advantages; in particular, it easily supports multiple implementations and code refactoring. If we want this latter sort of abstraction and want to code iterations over chained objects in a fully abstract and modular manner, then proper support of tail calls is indispensable. This is even more important for the implementation of transitions in an object-oriented finite-state machine of indefinitely long lifetime.

For another example of the importance of tail calls, see Automata via Macros by Shriram Krishnamurthi.

The paper A Tail-Recursive Machine with Stack Inspection by John Clements and Matthias Felleisen demonstrates that correct implementation of tail calls is not incompatible with security mechanisms that rely on stack inspection.

Posted: 2009-11-30 14:56 (Updated: 2009-12-03 22:28)
Author: gls
Categories: Implementation TailCalls TailRecursion

Comments
  1. Faré – 2009-12-01 13:24

    In other words, Proper Tail Calls make control modular, whereas iteration requires omniscience.

    I tried to reply as much to Guido van Rossum when he used absurd arguments against providing PTC http://fare.livejournal.com/142410.html

  2. matthias – 2009-12-01 17:00

    Thank you, Guy, for emphasizing the importance of proper tail-call implementation (let’s call it PITCH here) for software design and refactoring. When I delivered my ECOOP keynote in 2004, I made the PITCH point slightly differently using the Design Patterns book. I developed some code using the patterns from the book in Java. I requested and received confirmation that the code was truly OO oriented and truly according to Their Book. Then I tested it and it worked fine. Then I stress-tested it (on a not so large input), and it failed with stack overflow.

    The next step was to demo the same code in the PLT Scheme class system, which looked very similar to the Java code. And it ran great on those stress tests and even larger ones. So I asked the 300-400 people in the audience what the problem was and why this slow, interpreted language Scheme did it right and Java, C#, Eiffel, Python, etc got it wrong.

    Not one person had any insight so after a minute of playing with them, I told them about the story of “Smalltalk to Actors to Scheme and how PITCH had ‘come along for the ride’” (you may recall our email exchange in the spring of 2004 to confirm the story of ‘apply’). And I did reinforce how much knowledge the OO research community has forgotten over the years.

    Sadly, the reaction was surprisingly hostile from some corners in the room. One person was especially outspoken in arguing that we all know that we just need three or four special constructs – and they may have to be map or for-each or such – and we’d be in business. Given that it was a public exchange, it was difficult to point out to him how messed up his argument was.

    While I have received many positive emails about the talk (my slides are on-line and people find them time and again), this message about PITCH deserves to be repeated and repeated and repeated for OO audiences. Let’s keep going at it.

  3. gls – 2009-12-03 10:44

    Yes, Faré, the third paragraph of your post is quite pertinent. I especially liked this remark:

    [I]n programs for which proper tail calls matters, the choice is conspicuously not between proper tail calls and improper tail calls, it is a choice between proper tail calls and explicit central stateful loops. And the absence of debugging information is constant when you transform tail calls into stateful loops.

  4. timbray – 2009-12-03 11:03

    I’m wondering if Clojure’s “recur” and “trampoline” forms - see http://clojure.org/special_forms#toc10 and http://clojure.org/api#toc569 - constitute another plausible path to the same goal. Clojure has a commendable focus on immutability.

  5. Faré – 2009-12-03 12:37

    Tim, recur is only for self-recursion, which is nice but doesn’t allow for modular design.

    And trampoline provide you full PTC at the cost of a (local) encoding that must be followed by every single bit of software that you want to use PTC with. Admittedly, though, you can probably statically determine which pieces of software may be involved in higher-order or otherwise modular tail recursion, and systematically encode those parts of the software with trampolines. Kindof sucks, but works. Now why should the user have to be doing manually what the computer could automatically do a better job at?

    In any case, if you want to use third-party libraries and they failed to enforce that convention, you lose. Whereas if the use of trampolines is to become universal – you may as well put PTC in the language implementation and save users the hassle of dealing with trampolines.

  6. jyasskin – 2009-12-03 13:50

    It’s really not convincing to show that a set implementation nobody would ever use works better with TCO. Further, your “even more Fortress-like style” with a union operator replacing adjoin isn’t helped by TCO: build the set in the wrong order, and you still blow out the stack.

    The objection to TCO that you’re trying to answer is that it’s rarely helpful in the real world. So you have to find real-world examples to refute that. An impractically-slow set isn’t a real-world example.

  7. gls – 2009-12-03 14:09

    jyasskin, your point is well-taken, and I will even take it a step further: it is even more in the spirit of Fortress to use data structures such as relatively balanced trees that are amenable to parallel traversal than to use long linear-chain structures that require sequential traversal. To that end, a “real” implementation of UNION would automatically rebalance the UnionObject tree. (See Jan Maessen’s earlier posts in this blog about implementing treaps.)

    On the other hand, I believe that the overheads of parallel processing are inherently large enough that there will always be a place for sequential processing, if only near the low-level leaves of the trees, and I believe that there will almost certainly be a sequential aspect to the highest level of any real-world program, if only “Wanna do it again?” I believe that tail calls are important for sequential programs that are organized according to certain principles of modular structure.

    I wish I could fit a real-world example onto a blog page. The best I can do is to exhibit a “toy” example that illustrates the basic idea.

  8. ccshan – 2009-12-03 16:56

    jyasskin– Most operating systems make tail calls between user code and kernel code. The context switch is usually implemented with a bit of assembly code, precisely because C doesn’t have tail calls (but machine code does). All these operating systems are real-world examples of the necessity of tail calls.

  9. rsc – 2009-12-03 22:07

    ccshan– It’s certainly a thought-provoking analogy, but I don’t think it withstands scrutiny.

    The user → kernel transition is almost never a tail call: there are results coming back from the kernel. The exception is the exit system call, the final switch a process makes.

    The kernel → user transition is a little like tail call, but more like a return from a function call (not a tail return, just a return). The exception is process creation, the first switch the kernel makes.

    Regardless of how well the analogy works, OS context switches would still be implemented in assembly even if C had tail calls, because C doesn’t give you access to the instructions necessary to change the processor privilege level and virtual memory state (but machine code does). I wouldn’t use that as an example of the necessity of exposing those very low-level hardware details, so I don’t think it makes sense to use it as an example of the necessity of tail calls either.

  10. metageek – 2009-12-04 07:07

    gls wrote:

    I wish I could fit a real-world example onto a blog page.

    I bet you can find something in the graphics world to do it. For example, given a tree of non-overlapping bounding volumes, find the smallest volume that matches a given point. That involves walking down the tree, asking each child, “does your bounding volume contain this point?”; when one replies yes, you recurse into it. Since no more than one child can say yes, that’s a tail call.

    I think this is a plausible scenario from video games, used to find the game-world object at a given point. However, I’m not sure how often they actually do that, as opposed to finding the game-world object that overlaps with a given volume (i.e., collision detection).

  11. ggchappell – 2009-12-04 09:04

    Guy, this is a thought-provoking article that makes an excellent point.

    On the other hand, I think that jyasskin’s point about UNION is also a good one, and your response is not really adequate. Doesn’t the proposed tree-rebalancing implementation of UNION have exactly the same problem as the iterative implementation of ADJOIN? I.e., it requires knowledge of the internal structure of other objects. So if we added a new union-like object to the mix, then the new & old union objects could not rebalance each other.

    Perhaps we can summmarize all this by saying that TCO permits the use of black-box abstractions when a property of an object depends on a property of another object; however, TCO is not sufficient to handle the case when a property of an object depends on properties of two other objects.

    Thus, your example neatly illustrates not only why TCO is a Good Thing, but also its limitations.

  12. Derek Elkins – 2009-12-04 09:30

    I have two other examples here. The first example is perfectly reasonable code, though not common (mainstream) object-oriented style, though arguably it should be. The second is just a sketch but is a very pragmatic example. Hopefully, this (Guy’s) blog post provides the explanation that my post left implicit.

  13. dasht – 2009-12-04 18:13

    Just for kicks… and while I like the article I’m replying to… I have an argument for the conclusion that gls is mistaken here - his argument doesn’t (quite) hold up. It’s over on LtU:

    http://lambda-the-ultimate.org/node/3702#comment-52777

    “Steele’s argument bugs me. I don’t think it is quite right: We can construct a Cooke style OO language that does not have TCO yet which can solve the integer set problem Steele poses. [….]”

  14. ccshan – 2009-12-04 20:02

    rsc– I was talking about calls made to a continuation. Think about how fork() and exit() work. It is because exit() is a tail call that all my finished processes aren’t sitting around waiting for their calls to exit() to return.

Fexprs remain inscrutable

In the last few weeks, a bit of discussion of John Shutt’s Fexpr-based programming language, Kernel, has sprung up in various circles. I read Shutt’s PhD thesis, implemented a couple of toy Kernel-inspired interpreters (in Javascript and Racket), and became simultaneously excited about the simplicity, hygiene and power of the approach and apprehensive about the strong possibility that there was something fundamental I was overlooking. After all, the theory of Fexprs is trivial.

Calling Kernel’s Fexprs “trivial” doesn’t seem quite right to me; perhaps calling them “inscrutable” would be a better fit.

Three serious problems with Kernel and $vau

After a discussion with a friend earlier this week, I think it’s important to note three serious problems with Kernel and the underlying $vau calculi Shutt proposes:

  1. It’s not possible even to compute the free variables of a term in $vau-calculus in general.1 This makes compilation, automatic refactoring, and cross-referencing impossible in general.

  2. There are too many contexts that tell terms apart; contextual equivalence is too fine. Consider these two terms:

     ($lambda (f) (f (+ 3 4)))
     ($lambda (f) (f (+ 4 3)))
    

    These two terms are not contextually equivalent, because if you pass an operative2 as the argument to the function, the operative can inspect the syntactic structure of the program.

  3. This over-fine contextual equivalence is a problem in practice, not just a theoretical concern. Consider this following definition of a foldr function:

     ($define! foldr
       ($lambda (c n xs)
         ($if (null? xs)
              n
              (c (car xs) (foldr c n (cdr xs))))))
    

    See how c has control over whether and when the recursive case is evaluated, because it is not required to evaluate its arguments! Compare this to the same definition where the last line is replaced with

     ($let ((ys (foldr c n (cdr xs)))) (c (car xs) ys))
    

    By introducing the intermediate $let, we take control over the recursion away from c, forcing evaluation of the recursive case.

    To show concretely why this is a problem, the call (foldr $and #t some-list) will either be a short-circuiting “and” or not, depending on an implementation detail of the foldr function!

Taken together, we see that purely local reasoning about Kernel programs is impossible; or at least, getting useful results from such reasoning is impossible in general. Only whole-program analysis can give us enough stability to let us reach useful conclusions about our programs.

On the other hand, it’s possible to mitigate these problems somewhat.

First of all, it might be possible to use online partial evaluation to compute the various static properties of terms we’re used to from the lambda-calculus in many (but certainly not all) situations. Undecidable situations might be able to be warned about. This might make the language stable enough to work in practice.

Secondly, by shifting one’s perspective, the inequivalence of the two variants of foldr given above might be able to be seen as a feature: applicatives, after all, are not operatives in Kernel. Personally, I find it unsettling, but that could be my Scheme heritage showing. Documentation for Kernel applicatives would need to specify what possible computations any higher-order parameters might receive as arguments. The foldr given above can be simultaneously CPS and non-CPS, when given operative and applicative arguments, respectively.

Both these possible mitigations are speculative and weak.

Reasons to be cheerful

I think that the core idea of separating argument evaluation from argument transmission (or function call; I’m unsure what to call it) is a very powerful one. Consider a distributed system: evaluation of arguments has to happen on the sending side, and transmission of the results of evaluation to the receiving side (where instantiation of the function body happens) is a separate operation.

I also think the simplicity of implementation of a Kernel-style Fexpr based system is worth paying attention to, especially given that it makes writing hygienic “macros” effortless, and in the process handily avoids the thorny problem of exactly what “hygiene” is in the first place. Achieving such ease-of-use takes heavy-duty macrology in languages like Scheme that lack first-class macros or Fexprs.

Having our cake and eating it, while also enjoying improved security

Consider securing Kernel in the same way that a capability-based approach can secure Scheme. To secure Kernel as it stands, each applicative has to guard against possibly receiving some operative as an argument.

To make it easier to secure, we might make the system signal an error when an attempt is made to pass an operative as an argument to an applicative.

This makes Fexprs/operatives/”macros” second-class, just as they are in Scheme. I conjecture that doing so has the following effects:

  1. it makes contextual equivalence useful again, making the two variants of foldr given above indistinguishable3, but

  2. it simultaneously makes the system exactly equivalent in power to Scheme with, say, a syntax-case macro system.

So by doing this, we’ve given up most of Kernel’s approach, seemingly in exchange for, well, familiar old Scheme. We haven’t lost everything, though: we’ve retained the simplicity and power of Kernel’s approach to hygiene.

Conclusions

Without quite a bit of further research, I don’t think Kernel-style languages can be used to build secure or efficiently compiled systems. Making changes necessary to build a secure variant of Kernel may destroy many of the properties that make the language interesting in the first place, but doing so might leave us in a familiar setting seen through new eyes: Scheme.

  1. I thought it was weird, actually, that this wasn’t pointed out in Shutt’s thesis, which refers to a function FV(x) without defining it and without further discussion. 

  2. Operatives in Kernel are roughly equivalent to Fexprs in older lisps. 

  3. While it doesn’t erase the distinction between ($lambda (x) (f x)) and f, it does make it amenable to local reasoning. This outcome is actually just the same as in Scheme-with-macros, where that particular equivalence doesn’t hold either. 

Building GNU Smalltalk on Mac OS X Snow Leopard

Recently I wanted to dust off some old code I’d written for GNU Smalltalk, and found that Homebrew’s formula for it didn’t build cleanly in 64-bit mode, and wouldn’t include LibSDL or Cairo support in 32-bit mode. So I rolled up my sleeves and checked out the git repository. UPDATE: I’ve updated the Homebrew formula in my fork of Homebrew; see below.

It turned out to be straightforward, after I made a few false starts. Here’s how I got it built and working, including SDL and Cairo support.

Prerequisites

GNU Smalltalk, when built from Git, depends on libffi and libsigsegv. Fortunately, Homebrew’s formulas for these work well:

$ brew install libffi
$ brew install libsigsegv
GNU Smalltalk itself

Check the code out from Git:

$ git clone https://github.com/bonzini/smalltalk.git smalltalk
$ cd smalltalk

If you’re on Snow Leopard, you’ll have autoconf version 2.61 and automake version 1.10. The GNU Smalltalk source code requests newer versions of these tools, but will build just fine with the versions shipped with Snow Leopard’s XCode. Edit configure.ac so that it has the line AC_PREREQ(2.61) instead of AC_PREREQ(2.63), and edit Makefile.am so that it has the line AUTOMAKE_OPTIONS = gnu 1.10.

(These changes are summarised in this patch.)

Once configure.ac and Makefile.am have been edited, carry on as you usually would for an autotools project:

$ autoreconf -fvi
$ ./configure --prefix=/opt/gnu-smalltalk PKG_CONFIG_PATH=/usr/local/Cellar/libffi/3.0.9/lib/pkgconfig
$ make
$ make install

The PKG_CONFIG_PATH variable definition on the ./configure line is necessary to let GNU Smalltalk’s configuration script find libffi, which is a keg-only Homebrew formula for Snow Leopard.

Updated Homebrew

2 Oct: I’ve just updated the Homebrew gnu-smalltalk formula to follow the steps above. It now builds from git HEAD rather than from a numbered release. My changes haven’t been accepted into the main branch of Homebrew yet, but for now you can see my formula here.