Main Index (eighty-twenty news)

Tasks within Transactions using RabbitMQ

Sun 18 Dec 2011 13:55 EST

Tony Garnock-Jones

In a discussion of moving from Google AppEngine to EC2, a participant mentioned that the only “odd” GAE service he made use of was transactional task queues, and that there was no direct analogue of this service in other queueing systems.

Here’s one way of building a transactional task queue using RabbitMQ and your favourite SQL database.

a table in the database will hold pending tasks
messages going through RabbitMQ will inform workers of the arrival of new tasks
the available tasks will be load-balanced between n workers
a “cleaner” process makes sure all tasks are eventually run and that failed tasks are eventually detected

In this article, I’ll spell out the requirements for the system, and discuss the details of the database tables, RabbitMQ queues, and daemon processes required. I’ll then evaluate the solution presented in terms of the requirements, and conclude by discussing a few variations on the solution that might be worth exploring.

Requirements

This is what the GAE Task Queue API has to say about transactional task queueing:

You can enqueue a task as part of a datastore transaction, such that the task is only enqueued—and guaranteed to be enqueued—if the transaction is committed successfully. Tasks added within a transaction are considered to be a part of it and have the same level of isolation and consistency.

Our requirements, then, are:

all successfully-enqueued tasks should eventually run
successfully-enqueued tasks should be runnable shortly after transaction commit
tasks where the transaction rolled back should never run
the system should be robust against race conditions
successfully-enqueued tasks shouldn’t be run more than once

where a “successfully-enqueued” task is one that was enqueued as part of a successfully-committed transaction.

Tables & Queues

Create a table—let’s call it tasks —in your SQL database (here imagined to be PostgreSQL):

CREATE TABLE tasks (
  id TEXT PRIMARY KEY,
  description TEXT UNIQUE,
  start_time TIMESTAMP WITH TIME ZONE
);

Each column is important:

the id column uniquely names the job. Use a UUID, or the MD5 or SHA-1 hash of the description, or some other suitably-unique name for the job.
the description column describes the task to be performed. You could use JSON, a URL, a Java class or method name, or any other means of identifying what is to be done.
the start_time column is used to make sure only a single worker starts performing the task, and to detect failed tasks.

Next, create a queue in your RabbitMQ instance; let’s call it task_queue.

Queue_Declare(queue = 'task_queue', durable = True)

The tasks table will serve as the “master list” of work remaining to be done. The task_queue queue will only be used to rapidly notify workers that a new task is waiting for them. If messages are lost from that queue on occasion, no harm will be done: the “cleaner” process, described below, will pick up the pieces.

Enqueueing a Task

To enqueue a task as part of a database transaction, create a row for the task as part of the transaction, and commit normally:

BEGIN;
  -- ... the main part of your transaction goes here ...
  INSERT INTO tasks
    VALUES ('ad5b4ccf3902db006405074c721a990e', 'mytask', NULL);
COMMIT;

The new row must have its start_time column set to NULL.

At any time near the end of the transaction—either before or just after the commit—send a message to the task_queue containing the id of the new task:

Basic_Publish(exchange = '',
              routing_key = 'task_queue',
              body = 'ad5b4ccf3902db006405074c721a990e')

Worker Processes

Your workers should connect to the database and to RabbitMQ as usual. Each worker process should Basic_Consume from the task_queue. Whenever a message comes in containing a task id —for example, the ID of the task enqueued just above, ad5b4ccf3902db006405074c721a990e —the worker should:

BEGIN a transaction.
SELECT * FROM tasks WHERE id = 'ad5b4ccf3902db006405074c721a990e' FOR UPDATE

The FOR UPDATE is important: it prevents a race condition if two workers accidentally start looking at the same job at the same time.
If no row is returned, wait for a few seconds (the “retry time”) and retry the SELECT. The message travelling through RabbitMQ could have overtaken the database commit. If after a couple of retries (the “giveup time”) the row is still absent, it’s likely the sender’s database commit didn’t complete, so move on to waiting for the next message arriving from task_queue.
Check the start_time of the returned row. If it’s anything but NULL, some other worker has already started this job, so move on to the next message.
UPDATE tasks SET start_time = NOW() WHERE id = 'ad5b4ccf3902db006405074c721a990e'
COMMIT the transaction. At this point, this worker instance is considered to have claimed the job.
Perform the task described by the description column of the row.
DELETE FROM tasks WHERE id = 'ad5b4ccf3902db006405074c721a990e'

At step 3, where a worker may give up on a given notification and move on to the next one, it’s possible that the worker might just not have waited for long enough for the record to appear in the database. Similarly, at step 4, the worker that had claimed the task may have crashed or been terminated before it was able to DELETE the task record from the database. In both cases, the “cleaner” process will make sure the job gets done eventually.

The “Cleaner” Process

The “cleaner” process should connect to the database and to RabbitMQ as usual. It should periodically—say, every minute (the “cleanup interval”) —wake up and perform the following routine:

SELECT * FROM tasks WHERE start_time IS NULL
Each row returned indicates a possibly-overlooked task, so a repeat message to task_queue should be sent, just like the ones described above. The worker pool will pick up the message as usual and will start work on the task. The use of FOR UPDATE in each worker ensures that the job won’t be performed twice.
SELECT * FROM tasks WHERE start_time < (NOW() - INTERVAL '1 hour')
Each row returned indicates a possible crashed or deadlocked task. Vary the interval as required! For some jobs, one hour is too long; for others, much too short. The “cleaner” should kick off task-specific recovery actions:
- sometimes it’s appropriate to simply retry the job: UPDATE the start_time to NULL, and then post a message to task_queue.
- sometimes a compensating job should be enqueued, either as a task (in which case use the normal task enqueueing process) or as a simple function call within the “cleaner”.
- sometimes the job should be abandoned, and a “dead job” notification sent to a system operator: perhaps send an email, or send a message to an appropriate exchange using RabbitMQ.

Evaluation

How did we do in satisfying our requirements?

Requirement 1 is satisfied by the action of the “cleaner”: any time a row is successfully inserted into the tasks table, the combined action of the “cleaner” and the worker processes make sure the job will be run.

Requirement 2 is satisfied by the use of the task_queue: the message sent through RabbitMQ serves to alert a worker that an item is headed their way in the database.

Requirement 3 is satisfied by the use of transactional inserts into the tasks table: if the commit is rolled back, the row will never appear in the database, and even if a message was sent on task_queue, the worker will eventually stop waiting for the database row to arrive, and the “cleaner” will never hear of the rolled-back job to begin with.

Requirement 4 is satisfied by the use of the retry timers, which will catch most races between the database and RabbitMQ. In cases where severe delays mean that the workers give up on a job before its row appears, the “cleaner” process will notice, and will resend a job notification message to task_queue.

Requirement 5 is satisfied by the use of the start_time field in each row, in conjunction with SELECT ... FOR UPDATE to ensure that only one worker will be able to claim a job for processing.

Variations

There are plenty of variations on this basic pattern.

You could make the “cleaner” process only resubmit jobs that have been waiting for more than two “cleanup intervals”, to lower the (already small) probability of (harmless) double notifications. You’d only need to do this if your database was bad at performing SELECT ... FOR UPDATE queries.
You could automatically scale your worker pool: instead of posting notifications directly to the task_queue, you’d send them to a task_exchange. They’d then be distributed not only to workers but to a special “scaler” process. Each worker would send a notification to the “scaler” when each task was completed, and the “scaler” would keep statistics on the arrival rate and the completion rate of tasks. New worker process instances would be started by the “scaler” in situations where the arrival rate was greatly exceeding the completion rate, and old instances would be torn down in the opposite situation.
The (highly experimental) presence exchange RabbitMQ plugin could be used to detect crashed jobs more quickly.
Instead of DELETEing tasks, an additional timestamp column finish_time could be added to the table, and the queries and updates could be adjusted to use it to simulate job deletion. That way, a permanent record is kept of all the tasks the system has performed.

And, of course, the “retry time”, “giveup time”, “cleanup interval” and crashed-task-detection-interval can all be adjusted to the needs of a given system.

Finally, in cases where you don’t need transactional task queueing, systems like Celery provide an amazing wealth of general task-execution functionality, and will almost certainly serve you better than rolling your own.

Equality, Hashing and Canonicalization

Tue 22 Nov 2011 13:15 EST

Tony Garnock-Jones

#tech #scheme

permalink

I’ve just been reading “Type-Safe Modular Hash-Consing” (Filliâtre & Conchon, 2006, http://www.lri.fr/~filliatr/ftp/publis/hash-consing2.pdf) and it got me thinking about the relationship between finding a canonical element of a set via some equivalence and simply deciding the equivalence itself.

In order to hashcons, according to the paper, you need an equivalence predicate T × T → Bool and a hash function T → Int. They have to line up. This lining up seems interesting. The hash function is naming an equivalence class of Ts - or at least it would be if it weren’t permitted to spuriously identify things sometimes! So it’s a rough equivalence class.

But my main idea is that say we were hashconsing append-lists, defined by

T ::= Nil

T ++ T

[T]

We’d want our equivalence predicate and our hash function to honour t ++ Nil ≡ t. But while it’s doing so, there’s also the opportunity for it to pick a canonical element: in this case it should choose just t rather than t ++ Nil, since it is structurally smaller.

So perhaps hashconsing should be expressed using a canonicalization function T → T and a hash function T → Int, rather than referring to an equivalence predicate at all?

In some sense a hashcons operator (per the paper taking T → Hashconsed T) is a kind of canonicalizing function, of course, but slightly weird in that it has links to the resource management facility of the underlying VM rather than simply expressing an object-level equivalence.

I wonder what would be a good way of writing a canonicalization function. It feels a lot like a standard reduction, with the notion of reduction being rules like t ++ Nil → t, Nil ++ t → t etc.

So a hashconsing system is a kind of VM-supported normal-form-finding interpreter. For data, rather than programs. Huh.

Why Object-Oriented Languages Need Tail Calls

Sat 1 Oct 2011 15:29 EDT

tonyg (reposting Guy Steele's original article)

#tech

permalink

(2 comments)

This blog post is a mirror of a post on the Project Fortress Community Blog by Guy Steele. The original posting is no longer available. I retrieved this copy of the text from the Wayback Machine.

Because the snapshot that the Internet Archive took of the original post was made on December 6, 2009, the only comments that appear are those that were made leading up to that date.

November 30, 2009: Why Object-Oriented Languages Need Tail Calls

[Note: The original blog post used the term “tail-recursion”, but in response to feedback on December 1, I have replaced this term with “tail calls” to emphasize that the issue has to do with multiple objects calling each others’ methods, not just the limited case of an object invoking its own methods or the even more limited case of a single method invoking itself (whether on the same object or another object). Also, some minor errors in the code examples have been corrected as of December 3.]

We recommend an interesting essay from the recent ACM OOPSLA Conference:

William R. Cook. On understanding data abstraction, revisited. Onward! Essays, 2009 ACM OOPSLA. SIGPLAN Notices 44, 10 (Oct. 2009), 557-572. DOI=http://doi.acm.org/10.1145/1639949.1640133 (Or here is the draft version.)

Cook makes an important distinction between abstract data types and object-oriented programming, and argues that they have related but complementary characteristics.

In this blog post we extend one of his examples in order to make a completely different point: object-oriented programming languages need tail calls correctly implemented, not just as a “trivial and optional” optimization of the use of stack space that could be achieved by using iteration statements, but in order to preserve object-oriented abstractions.

Following Cook, we begin by considering an abstraction of “sets of integer values”. At a minimum, we would like to be able to (a) obtain an empty set, (b) adjoin an integer to a set to create a new set, (c) construct a set that is the union of two given sets, (d) inquire whether a set is empty, and (e) inquire whether a set contains a specified integer. We can describe this abstraction as a Fortress API:

api IntSetAPI

trait IntSet
  isEmpty: Boolean
  adjoin(x: ZZ): IntSet
  union(other: IntSet): IntSet
  contains(x: ZZ): Boolean
end

(*) Example: `Empty.adjoin(1).adjoin(2).union(Empty.adjoin(3)).contains(2)` should be `true`.

Empty: IntSet

end IntSetAPI

(To facilitate comparison to the examples in Cook’s paper and also to the way one might code these examples in the Java programming language, we use dotted methods throughout these examples. A more Fortress-like style would be to define operators for the “union” and “contains” operations:

trait IntSet   (*) Alternate interface definition #1
  isEmpty: Boolean
  adjoin(x: ZZ): IntSet
  opr UNION(self, other: IntSet): IntSet
  opr IN(x: ZZ, self): Boolean
end

(*) Example: `2 IN (Empty.adjoin(1).adjoin(2) UNION Empty.adjoin(3))` should be `true`.

Indeed, an even more Fortress-like style would be to eliminate the adjoin operation from the interface and replace it with a constructor of singleton sets:

trait IntSet   (*) Alternate interface definition #2
  isEmpty: Boolean
  opr UNION(self, other: IntSet): IntSet
  opr IN(x: ZZ, self): Boolean
end

opr {x: ZZ}: IntSet

(*) Example: `2 IN ({1} UNION {2} UNION {3})` should be `true`.

But we digress.)

One possible implementation for the original API is as follows:

component IntSetOO
export IntSetAPI

trait IntSet
  getter isEmpty(): Boolean = false
  adjoin(x: ZZ): IntSet = AdjoinObject(self, x)
  union(other: IntSet): IntSet = UnionObject(self, other)
  contains(y: ZZ): Boolean
end

object AdjoinObject(s: IntSet, x: ZZ) extends IntSet
  contains(y: ZZ) = (y = x) OR: s.contains(x)
end

object UnionObject(s1: IntSet, s2: IntSet) extends IntSet
  isEmpty: Boolean = s1.isEmpty AND s2.isEmpty
  contains(y: ZZ) = s1.contains(x) OR: s2.contains(x)
end

object EmptyObject extends IntSet
  isEmpty: Boolean = true
  contains(_: ZZ) = false
end

Empty = EmptyObject

end IntSetOO

We use several different sorts of object to represent sets. Each implements the same abstract interface. One of Cook’s points is that, in a truly object-oriented style of programming, the implementation of each object uses only the abstract interface to communicate with other objects, allowing the implementations to be quite independent of each other, and allowing new implementations to be added easily at any time. For example, Cook shows that infinite sets can easily be represented using the same abstract interface, and we can readily generalize his example and translate it into Fortress code as follows:

object IntegersMod(n: ZZ) extends IntSet
  contains(y: ZZ) = (y MOD n = 0)
end

Now we can write an expression such as

IntegersMod(2).union(IntegersMod(3)).adjoin(7)

and expect it to behave as if it contained all integers that are a multiple of either 2 or 3 (or both), as well as the integer 7.

Now, let’s suppose that we were to construct a set of the first million prime numbers using sets implemented in this way:

do
  s: IntSet := Empty
  k: ZZ := 2
  n: ZZ := 0
  while (n < 1'000'000) do
    if isPrime(k) then
      s := s.adjoin(k)
      n += 1
    end
    k += 1
  end
  s.contains(13)
end

Yes, we know lots of ways this code can be optimized, but let us please focus on the behavior of the constructed set, which is a linear chain of one million objects of type AdjoinObject. When we ask whether the set s contains 13, the first AdjoinObject will check its x value, see that is it not 13, and then pass the buck to the next AdjoinObject. Observe that this “passing of the buck” is a tail call; to see this, note that the short-circuit Boolean OR operator is equivalent to the use of an if statement:

object AdjoinObject(s: Intset, x: ZZ) extends IntSet
  contains(y: ZZ) = if (y = x) then true else s.contains(x) end
end

if the underlying implementation fails to implement tail calls without unnecessary growth of stack, and also is unable to support nearly a million nested stack frames, then execution of this query will fail.

Many programmers, including those familiar with the style of the Java programming language, will object that the original design of the implementation was wrong-headed: if we want iterative behavior, we should use an iterative statement such as while rather than method calls. This approach is easily captured in Fortress for these AdjoinObject searches:

object AdjoinObject(s: Intset, x: ZZ) extends IntSet
  contains(y: ZZ) = do
    u: IntSet := self
    (*) The next statement has an interesting idea but is erroneous.
    while (NOT u.isEmpty) AND: (y =/= u.x) do u := u.s end
    NOT u.isEmpty
  end
end

But there is a problem here; our implementation uses not only AdjoinObject instances but also instances of UnionObject and IntegerMod. In order to maintain the iterative structure of this implementation of the contains method, we will need to make it handle these other kinds of objects as well. This requires the implementation of AdjoinObject to be aware of the implementation details of these other kinds of objects, violating our object-oriented abstractions. Every time we want to invent a new implementation of IntSet, we will also need to add a case to this while loop in AdjoinObject. (In point of fact, the while statement shown above contains static type errors. These could be remedied by using typecase. Trying to satisfy the typecase without using an else clause would then point up the need to handle all the other possible implementations of IntSet.)

We could try to abstract the iterative process: If only our sets supported Java-style iterators, we could just enumerate all the elements of the set—oops, can’t do that, because some of our represented sets are infinite. And Java-style iterators have problems of their own, not least that they rely upon side effects and therefore introduce synchronization issues in parallel code.

Even without the other kinds of objects, we can see that this latest implementation of contains for AdjoinObject violates object-oriented abstraction: the implementation inspects the field x not only for the instance on which the contains method was invoked, but the field x of other instances as well. Thus instances of AdjoinObject are abstract with respect to “the outside world”, but not with respect to each other. As Cook points out, this is typical of an abstract data type: there is conceptually a single implementation of the data type, and the code for the abstract data type is privy to these details for purposes of accessing all instances of the data type. This has its advantages; in particular, it caters to certain kinds of optimizations.

In contrast, the object-oriented style makes every object fully abstract, even to other instances of the same “kind” of object. This also has its advantages; in particular, it easily supports multiple implementations and code refactoring. If we want this latter sort of abstraction and want to code iterations over chained objects in a fully abstract and modular manner, then proper support of tail calls is indispensable. This is even more important for the implementation of transitions in an object-oriented finite-state machine of indefinitely long lifetime.

For another example of the importance of tail calls, see Automata via Macros by Shriram Krishnamurthi.

The paper A Tail-Recursive Machine with Stack Inspection by John Clements and Matthias Felleisen demonstrates that correct implementation of tail calls is not incompatible with security mechanisms that rely on stack inspection.

Posted: 2009-11-30 14:56 (Updated: 2009-12-03 22:28)
Author: gls
Categories: Implementation TailCalls TailRecursion

Comments

Faré – 2009-12-01 13:24

In other words, Proper Tail Calls make control modular, whereas iteration requires omniscience.

I tried to reply as much to Guido van Rossum when he used absurd arguments against providing PTC http://fare.livejournal.com/142410.html
matthias – 2009-12-01 17:00

Thank you, Guy, for emphasizing the importance of proper tail-call implementation (let’s call it PITCH here) for software design and refactoring. When I delivered my ECOOP keynote in 2004, I made the PITCH point slightly differently using the Design Patterns book. I developed some code using the patterns from the book in Java. I requested and received confirmation that the code was truly OO oriented and truly according to Their Book. Then I tested it and it worked fine. Then I stress-tested it (on a not so large input), and it failed with stack overflow.

The next step was to demo the same code in the PLT Scheme class system, which looked very similar to the Java code. And it ran great on those stress tests and even larger ones. So I asked the 300-400 people in the audience what the problem was and why this slow, interpreted language Scheme did it right and Java, C#, Eiffel, Python, etc got it wrong.

Not one person had any insight so after a minute of playing with them, I told them about the story of “Smalltalk to Actors to Scheme and how PITCH had ‘come along for the ride’” (you may recall our email exchange in the spring of 2004 to confirm the story of ‘apply’). And I did reinforce how much knowledge the OO research community has forgotten over the years.

Sadly, the reaction was surprisingly hostile from some corners in the room. One person was especially outspoken in arguing that we all know that we just need three or four special constructs – and they may have to be map or for-each or such – and we’d be in business. Given that it was a public exchange, it was difficult to point out to him how messed up his argument was.

While I have received many positive emails about the talk (my slides are on-line and people find them time and again), this message about PITCH deserves to be repeated and repeated and repeated for OO audiences. Let’s keep going at it.
gls – 2009-12-03 10:44

Yes, Faré, the third paragraph of your post is quite pertinent. I especially liked this remark:

[I]n programs for which proper tail calls matters, the choice is conspicuously not between proper tail calls and improper tail calls, it is a choice between proper tail calls and explicit central stateful loops. And the absence of debugging information is constant when you transform tail calls into stateful loops.
timbray – 2009-12-03 11:03

I’m wondering if Clojure’s “recur” and “trampoline” forms - see http://clojure.org/special_forms#toc10 and http://clojure.org/api#toc569 - constitute another plausible path to the same goal. Clojure has a commendable focus on immutability.
Faré – 2009-12-03 12:37

Tim, recur is only for self-recursion, which is nice but doesn’t allow for modular design.

And trampoline provide you full PTC at the cost of a (local) encoding that must be followed by every single bit of software that you want to use PTC with. Admittedly, though, you can probably statically determine which pieces of software may be involved in higher-order or otherwise modular tail recursion, and systematically encode those parts of the software with trampolines. Kindof sucks, but works. Now why should the user have to be doing manually what the computer could automatically do a better job at?

In any case, if you want to use third-party libraries and they failed to enforce that convention, you lose. Whereas if the use of trampolines is to become universal – you may as well put PTC in the language implementation and save users the hassle of dealing with trampolines.
jyasskin – 2009-12-03 13:50

It’s really not convincing to show that a set implementation nobody would ever use works better with TCO. Further, your “even more Fortress-like style” with a union operator replacing adjoin isn’t helped by TCO: build the set in the wrong order, and you still blow out the stack.

The objection to TCO that you’re trying to answer is that it’s rarely helpful in the real world. So you have to find real-world examples to refute that. An impractically-slow set isn’t a real-world example.
gls – 2009-12-03 14:09

jyasskin, your point is well-taken, and I will even take it a step further: it is even more in the spirit of Fortress to use data structures such as relatively balanced trees that are amenable to parallel traversal than to use long linear-chain structures that require sequential traversal. To that end, a “real” implementation of UNION would automatically rebalance the UnionObject tree. (See Jan Maessen’s earlier posts in this blog about implementing treaps.)

On the other hand, I believe that the overheads of parallel processing are inherently large enough that there will always be a place for sequential processing, if only near the low-level leaves of the trees, and I believe that there will almost certainly be a sequential aspect to the highest level of any real-world program, if only “Wanna do it again?” I believe that tail calls are important for sequential programs that are organized according to certain principles of modular structure.

I wish I could fit a real-world example onto a blog page. The best I can do is to exhibit a “toy” example that illustrates the basic idea.
ccshan – 2009-12-03 16:56

jyasskin– Most operating systems make tail calls between user code and kernel code. The context switch is usually implemented with a bit of assembly code, precisely because C doesn’t have tail calls (but machine code does). All these operating systems are real-world examples of the necessity of tail calls.
rsc – 2009-12-03 22:07

ccshan– It’s certainly a thought-provoking analogy, but I don’t think it withstands scrutiny.

The user → kernel transition is almost never a tail call: there are results coming back from the kernel. The exception is the exit system call, the final switch a process makes.

The kernel → user transition is a little like tail call, but more like a return from a function call (not a tail return, just a return). The exception is process creation, the first switch the kernel makes.

Regardless of how well the analogy works, OS context switches would still be implemented in assembly even if C had tail calls, because C doesn’t give you access to the instructions necessary to change the processor privilege level and virtual memory state (but machine code does). I wouldn’t use that as an example of the necessity of exposing those very low-level hardware details, so I don’t think it makes sense to use it as an example of the necessity of tail calls either.
metageek – 2009-12-04 07:07

gls wrote:

I wish I could fit a real-world example onto a blog page.

I bet you can find something in the graphics world to do it. For example, given a tree of non-overlapping bounding volumes, find the smallest volume that matches a given point. That involves walking down the tree, asking each child, “does your bounding volume contain this point?”; when one replies yes, you recurse into it. Since no more than one child can say yes, that’s a tail call.

I think this is a plausible scenario from video games, used to find the game-world object at a given point. However, I’m not sure how often they actually do that, as opposed to finding the game-world object that overlaps with a given volume (i.e., collision detection).
ggchappell – 2009-12-04 09:04

Guy, this is a thought-provoking article that makes an excellent point.

On the other hand, I think that jyasskin’s point about UNION is also a good one, and your response is not really adequate. Doesn’t the proposed tree-rebalancing implementation of UNION have exactly the same problem as the iterative implementation of ADJOIN? I.e., it requires knowledge of the internal structure of other objects. So if we added a new union-like object to the mix, then the new & old union objects could not rebalance each other.

Perhaps we can summmarize all this by saying that TCO permits the use of black-box abstractions when a property of an object depends on a property of another object; however, TCO is not sufficient to handle the case when a property of an object depends on properties of two other objects.

Thus, your example neatly illustrates not only why TCO is a Good Thing, but also its limitations.
Derek Elkins – 2009-12-04 09:30

I have two other examples here. The first example is perfectly reasonable code, though not common (mainstream) object-oriented style, though arguably it should be. The second is just a sketch but is a very pragmatic example. Hopefully, this (Guy’s) blog post provides the explanation that my post left implicit.
dasht – 2009-12-04 18:13

Just for kicks… and while I like the article I’m replying to… I have an argument for the conclusion that gls is mistaken here - his argument doesn’t (quite) hold up. It’s over on LtU:

http://lambda-the-ultimate.org/node/3702#comment-52777

“Steele’s argument bugs me. I don’t think it is quite right: We can construct a Cooke style OO language that does not have TCO yet which can solve the integer set problem Steele poses. [….]”
ccshan – 2009-12-04 20:02

rsc– I was talking about calls made to a continuation. Think about how fork() and exit() work. It is because exit() is a tail call that all my finished processes aren’t sitting around waiting for their calls to exit() to return.

Fexprs remain inscrutable

Thu 29 Sep 2011 19:49 EDT

Tony Garnock-Jones

#tech #scheme

permalink

(18 comments)

In the last few weeks, a bit of discussion of John Shutt’s Fexpr-based programming language, Kernel, has sprung up in various circles. I read Shutt’s PhD thesis, implemented a couple of toy Kernel-inspired interpreters (in Javascript and Racket), and became simultaneously excited about the simplicity, hygiene and power of the approach and apprehensive about the strong possibility that there was something fundamental I was overlooking. After all, the theory of Fexprs is trivial.

Calling Kernel’s Fexprs “trivial” doesn’t seem quite right to me; perhaps calling them “inscrutable” would be a better fit.

Three serious problems with Kernel and `$vau`

After a discussion with a friend earlier this week, I think it’s important to note three serious problems with Kernel and the underlying $vau calculi Shutt proposes:

It’s not possible even to compute the free variables of a term in $vau-calculus in general.¹ This makes compilation, automatic refactoring, and cross-referencing impossible in general.
There are too many contexts that tell terms apart; contextual equivalence is too fine. Consider these two terms:
```
 ($lambda (f) (f (+ 3 4)))
 ($lambda (f) (f (+ 4 3)))
```
These two terms are not contextually equivalent, because if you pass an operative² as the argument to the function, the operative can inspect the syntactic structure of the program.
This over-fine contextual equivalence is a problem in practice, not just a theoretical concern. Consider this following definition of a foldr function:
```
 ($define! foldr
   ($lambda (c n xs)
     ($if (null? xs)
          n
          (c (car xs) (foldr c n (cdr xs))))))
```
See how c has control over whether and when the recursive case is evaluated, because it is not required to evaluate its arguments! Compare this to the same definition where the last line is replaced with
```
 ($let ((ys (foldr c n (cdr xs)))) (c (car xs) ys))
```
By introducing the intermediate $let, we take control over the recursion away from c, forcing evaluation of the recursive case.

To show concretely why this is a problem, the call (foldr $and #t some-list) will either be a short-circuiting “and” or not, depending on an implementation detail of the foldr function!

Taken together, we see that purely local reasoning about Kernel programs is impossible; or at least, getting useful results from such reasoning is impossible in general. Only whole-program analysis can give us enough stability to let us reach useful conclusions about our programs.

On the other hand, it’s possible to mitigate these problems somewhat.

First of all, it might be possible to use online partial evaluation to compute the various static properties of terms we’re used to from the lambda-calculus in many (but certainly not all) situations. Undecidable situations might be able to be warned about. This might make the language stable enough to work in practice.

Secondly, by shifting one’s perspective, the inequivalence of the two variants of foldr given above might be able to be seen as a feature: applicatives, after all, are not operatives in Kernel. Personally, I find it unsettling, but that could be my Scheme heritage showing. Documentation for Kernel applicatives would need to specify what possible computations any higher-order parameters might receive as arguments. The foldr given above can be simultaneously CPS and non-CPS, when given operative and applicative arguments, respectively.

Both these possible mitigations are speculative and weak.

Reasons to be cheerful

I think that the core idea of separating argument evaluation from argument transmission (or function call; I’m unsure what to call it) is a very powerful one. Consider a distributed system: evaluation of arguments has to happen on the sending side, and transmission of the results of evaluation to the receiving side (where instantiation of the function body happens) is a separate operation.

I also think the simplicity of implementation of a Kernel-style Fexpr based system is worth paying attention to, especially given that it makes writing hygienic “macros” effortless, and in the process handily avoids the thorny problem of exactly what “hygiene” is in the first place. Achieving such ease-of-use takes heavy-duty macrology in languages like Scheme that lack first-class macros or Fexprs.

Having our cake and eating it, while also enjoying improved security

Consider securing Kernel in the same way that a capability-based approach can secure Scheme. To secure Kernel as it stands, each applicative has to guard against possibly receiving some operative as an argument.

To make it easier to secure, we might make the system signal an error when an attempt is made to pass an operative as an argument to an applicative.

This makes Fexprs/operatives/”macros” second-class, just as they are in Scheme. I conjecture that doing so has the following effects:

it makes contextual equivalence useful again, making the two variants of foldr given above indistinguishable³, but
it simultaneously makes the system exactly equivalent in power to Scheme with, say, a syntax-case macro system.

So by doing this, we’ve given up most of Kernel’s approach, seemingly in exchange for, well, familiar old Scheme. We haven’t lost everything, though: we’ve retained the simplicity and power of Kernel’s approach to hygiene.

Conclusions

Without quite a bit of further research, I don’t think Kernel-style languages can be used to build secure or efficiently compiled systems. Making changes necessary to build a secure variant of Kernel may destroy many of the properties that make the language interesting in the first place, but doing so might leave us in a familiar setting seen through new eyes: Scheme.

I thought it was weird, actually, that this wasn’t pointed out in Shutt’s thesis, which refers to a function FV(x) without defining it and without further discussion. ↩
Operatives in Kernel are roughly equivalent to Fexprs in older lisps. ↩
While it doesn’t erase the distinction between ($lambda (x) (f x)) and f, it does make it amenable to local reasoning. This outcome is actually just the same as in Scheme-with-macros, where that particular equivalence doesn’t hold either. ↩

Building GNU Smalltalk on Mac OS X Snow Leopard

Mon 26 Sep 2011 14:52 EDT

Tony Garnock-Jones

#tech #smalltalk

permalink

(5 comments)

Recently I wanted to dust off some old code I’d written for GNU Smalltalk, and found that Homebrew’s formula for it didn’t build cleanly in 64-bit mode, and wouldn’t include LibSDL or Cairo support in 32-bit mode. So I rolled up my sleeves and checked out the git repository. UPDATE: I’ve updated the Homebrew formula in my fork of Homebrew; see below.

It turned out to be straightforward, after I made a few false starts. Here’s how I got it built and working, including SDL and Cairo support.

Prerequisites

GNU Smalltalk, when built from Git, depends on libffi and libsigsegv. Fortunately, Homebrew’s formulas for these work well:

$ brew install libffi
$ brew install libsigsegv

GNU Smalltalk itself

Check the code out from Git:

$ git clone https://github.com/bonzini/smalltalk.git smalltalk
$ cd smalltalk

If you’re on Snow Leopard, you’ll have autoconf version 2.61 and automake version 1.10. The GNU Smalltalk source code requests newer versions of these tools, but will build just fine with the versions shipped with Snow Leopard’s XCode. Edit configure.ac so that it has the line AC_PREREQ(2.61) instead of AC_PREREQ(2.63), and edit Makefile.am so that it has the line AUTOMAKE_OPTIONS = gnu 1.10.

(These changes are summarised in this patch.)

Once configure.ac and Makefile.am have been edited, carry on as you usually would for an autotools project:

$ autoreconf -fvi
$ ./configure --prefix=/opt/gnu-smalltalk PKG_CONFIG_PATH=/usr/local/Cellar/libffi/3.0.9/lib/pkgconfig
$ make
$ make install

The PKG_CONFIG_PATH variable definition on the ./configure line is necessary to let GNU Smalltalk’s configuration script find libffi, which is a keg-only Homebrew formula for Snow Leopard.

Updated Homebrew

2 Oct: I’ve just updated the Homebrew gnu-smalltalk formula to follow the steps above. It now builds from git HEAD rather than from a numbered release. My changes haven’t been accepted into the main branch of Homebrew yet, but for now you can see my formula here.

The Code Pane is a command line, not a text editor

Sun 11 Sep 2011 10:37 EDT

Tony Garnock-Jones

#tech #smalltalk

permalink

(3 comments)

The system browser in Cuis, a dialect of Squeak Smalltalk, looks like this:

The Cuis 3.3 System Browser

Notice that there is no UI for creating new methods!

There’s a strange and subtle shift in UI design thinking in effect here. Instead of the purpose of the code pane being to “edit some little snippet of code already associated with the current class”, thereby necessitating UI for creating new methods, its purpose is to “interpret any submitted piece of code in the context of the current class”, thereby permitting both definition of new methods and updates to existing methods.

Whenever you type new text into the code pane and accept the changes (by pressing M-s), it compiles whatever text is present as if it were a method definition in the selected class. If that happens to be a new variation on saveTo:, so be it; if it’s an entirely new method, no problem, a new method is compiled and inserted into the class’s method dictionary.

The code pane is really a procedure, accepting arbitrary method source code and having the side effect of compiling that code in the context of the currently-selected class.

The system browser has been this way in Smalltalk since 1980. It’s eye-opening to contrast it with the rigidly file-oriented editors present in IDEs from other systems, from Emacs and Vi up through IntelliJ IDEA and Eclipse.

Constructing Kernel's $if

Tue 16 Aug 2011 20:17 EDT

Tony Garnock-Jones

#tech #scheme

permalink

John Shutt’s PhD thesis introduces the Kernel programming language. Kernel includes an $if, which is treated as a built-in primitive.

It doesn’t have to be primitive, though: in Kernel a construct that does the same thing as the primitive $if operation (in well-typed programs) can be constructed.

In Scheme, one needs to make use of the macro facility to construct an if if there’s not one built-in:

(define true (lambda (first second) first))
(define false (lambda (first second) second))
(define-syntax if
  (syntax-rules ()
    ((_ test if-true if-false)
     ((test (lambda () if-true) (lambda () if-false))))))

In Kernel, we use fexprs instead:

($define! true ($lambda (first second) first))
($define! false ($lambda (first second) second))
($define! $if
  ($vau (test if-true if-false) env
    (eval ((eval test env) if-true if-false) env)))

Here we can see the central opposition between Scheme and Kernel: Scheme’s evaluation of forms is implicit, and we use lambda to postpone evaluation; Kernel’s evaluation of forms is explicit, and we use eval to cause evaluation.

Expressions used as tests absolutely must evaluate to true or false, and nothing else, for these definitions to work. One of the benefits of treating $if as primitive is that one can provide for a separation between Booleans and the rest of the universe that’s not possible otherwise. It’s nice to know, though, that $if can be dispensed with if necessary.

Oakley Groups 2 and 14 with OpenSSL

Tue 16 Aug 2011 14:19 EDT

Tony Garnock-Jones

#tech

permalink

Yesterday, while experimenting with Diffie-Hellman key exchange as done by the SSH protocol, I discovered (after quite a bit of confusion) that the 1024-bit and 2048-bit parameter groups included with OpenSSL’s Diffie-Hellman implementation are not the same as the 1024-bit “group 2” from RFC 2409 and the 2048-bit “group 14” from RFC 3526! They’re completely unrelated.

Once I discovered this, I spent some time figuring out how to construct PEM files containing the right values formatted appropriately for use with OpenSSL.

If you need to use the “Second Oakley Group” (1024-bit MODP group) from RFC 2409 with OpenSSL, you should load the following parameter file, oakley-group-2.pem, using OpenSSL’s PEM_read_DHparams function:

-----BEGIN DH PARAMETERS-----
MIGHAoGBAP//////////yQ/aoiFowjTExmKLgNwc0SkCTgiKZ8x0Agu+pjsTmyJRSgh5jjQE
3e+VGbPNOkMbMCsKbfJfFDdP4TVtbVHCReSFtXZiXn7G9ExC6aY37WsL/1y29Aa37e44a/ta
iZ+lrp8kEXxLH+ZJKGZR7OZTgf//////////AgEC
-----END DH PARAMETERS-----

If you need to use the 2048-bit MODP group from RFC 3526 with OpenSSL, load the following parameter file, oakley-group-14.pem:

-----BEGIN DH PARAMETERS-----
MIIBCAKCAQEA///////////JD9qiIWjCNMTGYouA3BzRKQJOCIpnzHQCC76mOxObIlFKCHmO
NATd75UZs806QxswKwpt8l8UN0/hNW1tUcJF5IW1dmJefsb0TELppjftawv/XLb0Brft7jhr
+1qJn6WunyQRfEsf5kkoZlHs5Fs9wgB8uKFjvwWY2kg2HFXTmmkWP6j9JM9fg2VdI9yjrZYc
YvNWIIVSu57VKQdwlpZtZww1Tkq8mATxdGwIyhghfDKQXkYuNs474553LBgOhgObJ4Oi7Aei
j7XFXfBvTFLJ3ivL9pVYFxg5lUl86pVq5RXSJhiY+gUQFXKOWoqsqmj//////////wIBAg==
-----END DH PARAMETERS-----

Running the files through openssl asn1parse shows that the contents are the same as the numbers given in the RFCs:

$ openssl asn1parse < oakley-group-2.pem 
0:d=0  hl=3 l= 135 cons: SEQUENCE          
3:d=1  hl=3 l= 129 prim: INTEGER           :FFFFFFFFFFFFFFFFC90FDAA22168C234
C4C6628B80DC1CD129024E088A67CC74020BBEA63B139B22514A08798E3404DDEF9519B3CD3A431B
302B0A6DF25F14374FE1356D6D51C245E485B576625E7EC6F44C42E9A637ED6B0BFF5CB6F406B7ED
EE386BFB5A899FA5AE9F24117C4B1FE649286651ECE65381FFFFFFFFFFFFFFFF
  135:d=1  hl=2 l=   1 prim: INTEGER           :02

$ openssl asn1parse < oakley-group-14.pem 
0:d=0  hl=4 l= 264 cons: SEQUENCE          
--------------------------------------------------------------------------------
4:d=1  hl=4 l= 257 prim: INTEGER           :FFFFFFFFFFFFFFFFC90FDAA22168C234
C4C6628B80DC1CD129024E088A67CC74020BBEA63B139B22514A08798E3404DDEF9519B3CD3A431B
302B0A6DF25F14374FE1356D6D51C245E485B576625E7EC6F44C42E9A637ED6B0BFF5CB6F406B7ED
EE386BFB5A899FA5AE9F24117C4B1FE649286651ECE45B3DC2007CB8A163BF0598DA48361C55D39A
69163FA8FD24CF5F83655D23DCA3AD961C62F356208552BB9ED529077096966D670C354E4ABC9804
F1746C08CA18217C32905E462E36CE3BE39E772C180E86039B2783A2EC07A28FB5C55DF06F4C52C9
DE2BCBF6955817183995497CEA956AE515D2261898FA051015728E5A8AACAA68FFFFFFFFFFFFFFFF
  265:d=1  hl=2 l=   1 prim: INTEGER           :02

AMQP 1.0's "Timestamp" definition is bogus

Fri 22 Jul 2011 13:38 EDT

Tony Garnock-Jones

#tech #rabbitmq

permalink

The AMQP 1.0.0 draft of 2011-07-04 contains the following language defining the meaning of its 64-bit “timestamp” data type:

Encodes a point in time using a 64 bit signed integer representing milliseconds since Midnight Jan 1, 1970 UTC. For the purpose of this representation, milliseconds are taken to be (1/(24*60*60*1000))th of a day.

This language is broken.¹

The first sentence there makes it a fixed offset from TAI, which is good. The second sentence there is totally bogus and can have no purpose beyond confusing the crap out of anyone trying to work from this definition. It’d be massively improved by just removing the second sentence:

Encodes a point in time using a 64 bit signed integer representing milliseconds since Midnight Jan 1, 1970 UTC.

For clarity, you might want to add the word “elapsed” and also reconfirm that it’s TAI being discussed here:

Encodes a point in time using a 64 bit signed integer representing elapsed milliseconds since Midnight Jan 1, 1970 UTC. Note that time stamps change in lockstep with TAI, not UTC nor Unix time.

Using a TAI-based metric instead of Unix time (time_t) is good because the Unix time_t definition is a bad design: see this page. Furthermore, UTC, which time_t is an encoding of, cannot be used to talk precisely about moments beyond 6 months in the future, because of the leap-second table that it needs.

TAI-based metrics are almost perfect except that they’re damned awkward to use right on most computer systems out there, because most systems use Unix time. Worse, using TAI but offset so that second zero is the time_t epoch gives a number currently within 30s or so of time_t, which could lead to a lot of confusion.

Choosing a representation of an instant for use in a new protocol is hard. You get a tradeoff between:

time_t: ambiguous representation and lack of accuracy when referring to moments more than 6mo in the future; on the plus side, wide support and low potential for making mistakes (other than those caused by the inherent flaws in the definition).
UTC, but not encoded as a count of seconds since some epoch: unambiguous representation, but lack of accuracy with future dates still and also poor library support plus the hassle of defining some new representation.
TAI offset to be based at the time_t epoch: unambiguous representation, fully accurate, but looks so similar to Unix time that people will probably make mistakes. (Will the mistakes be harmful though? How harmful is being off by 30s? You might ask the Mars Climate Orbiter.)
Seconds since 2000-01-01T00:00:00Z, in lockstep with TAI: unambiguous representation, fully accurate, looks dissimilar enough that it won’t be mistaken for Unix time, but jolly weird and still needs a leap-second table for display purposes, like any other TAI-based metric.
Chronological Julian Days in GMT: unambiguous representation, fully accurate, but difficult to convert back and forth to Unix time without a full calendar package.

Additionally, if you ever want to simply subtract one timestamp from another to get some idea of the interval between them, Julian Days and Unix time don’t work; TAI-based metrics are the only ones that give accurate results for this use.

Ideally we want these timestamps to be able to be accurate up to and beyond 50 years in the future or so. If it were me, I’d use TAI either with its own built-in epoch, 1958-01-01T00:00:00Z, or with a suitable easy-to-remember epoch such as 2000-01-01T00:00:00Z.

(Thanks to @ciphergoth, @simonmacm, @squaremobius and @michaeljforster for an informative discussion of this issue.)

There’s an open bug report in the AMQP protocol JIRA system for this issue: ISSUE-167 ↩

Signs of an ill-factored system

Tue 19 Jul 2011 10:48 EDT

Tony Garnock-Jones

#tech

permalink

Dmitry Chestnykh has collected implementations of the echo command from UNIX V5, OpenBSD, Plan 9, FreeBSD, and GNU coreutils.¹

Implementation	Lines of code
UNIX V5	10
OpenBSD	28
Plan 9	40
FreeBSD	108
GNU coreutils	257

From this you might conclude that the GNU implementation has excessive bloat. But what’s really going on is (a) feature creep and (b) the inclusion of documentation within the source code of the program itself. UNIX has evolved away from having a command (with no built-in documentation) with a separate manpage toward having not just a detailed separate manpage but also having quick-reference documentation included in the command itself.

The feature creep leads to a perceived need for the quick-reference documentation. The source to the UNIX V5 implementation is its own reference card! There’d be nothing to write in a usage message.

Both the feature-creep and the awkwardly-placed documentation are signs of an ill-factored system. A better factoring would keep commands single-purpose and as simple as possible, with documentation managed consistently across the whole system.

Attribute	UNIX V5	Modern Unix	Smalltalk
Command (method) source code	Short, single-purpose	Long, multi-purpose	Short, single-purpose
Command documentation	Manpage only	Manpage, info file, built-in to program, web site	Method comment only
How are new variants on commands added?	Shell scripts	Add the feature to the existing command	Add new methods and refactor common implementation

I think UNIX V5 and Smalltalk are much better-factored than modern Unixes. Factoring out common functionality and assembling the pieces using scripting is the way things are done in both cases. Smalltalk has an advantage in that the scripting language is the systems programming language, but the core philosophies of the two systems have a lot in common.

Removing the inline quick-reference documentation from the GNU program lops 60 lines off the total. The remainder of the growth can be attributed to the baked-in extra features (GNU supports multiple command-line syntaxes, where UNIX V5 supports… nothing) and to premature² optimisation (Plan 9 and FreeBSD avoid multiple writes, where UNIX V5 goes for TSTTCPW).

To my mind, either of the UNIX V5 or OpenBSD implementations are perfectly acceptable. The remainder are signs of a sick ecosystem: there’s nothing wrong with them intrinsically, but when they’re seen as part of a larger whole things start to look unhealthy.

HT @silentbicycle ↩
That’s a little unfair. At the time, such optimisations were crucial to get a running system. Things are very different these days: JIT compilation, flexible kernel/userland boundaries, and quite simply the incredible raw speed of modern machines combine to make the difference between many printfs and a single writev irrelevant. ↩

Requirements

Tables & Queues

Enqueueing a Task

Worker Processes

The “Cleaner” Process

Evaluation

Variations

November 30, 2009: Why Object-Oriented Languages Need Tail Calls

Comments

Three serious problems with Kernel and $vau

Reasons to be cheerful

Having our cake and eating it, while also enjoying improved security

Conclusions

Prerequisites

GNU Smalltalk itself

Updated Homebrew

Three serious problems with Kernel and `$vau`