Reflection and Message-Oriented Programming

My thinking about a next-generation SmallTalk-like system and language has been shifting a bit over recent weeks.

To start with, I decided that the objects in the language would be immutable: in order to replace a field value, an entirely new object would be constructed, just like records in ML. Objects would no longer have any identity, and object equivalence would be decided by recursive comparison (a la Henry Baker’s egal predicate [Baker93]). This immutability would extend even to the object’s map (similar to its class) - adding a method to an object results in a fresh object, with a fresh map.

Mutable state isn’t entirely absent, though - it’s just kept strictly separate from other parts of the system, just as it is in ML. It would be possible to construct mutable reference cells. Synchronisation, communication and state access would be merged: when you retrieve a value from a cell, the value is removed from the cell and handed to the retrieving process. If no value is present, the receiving process blocks until one is sent by another process. If a sending process tries to place a value in a cell already occupied by a value, the sending process dies with an exception. Cells are the locus of object identity in the system, again as per [Baker93].

Metaprogramming and reflection would be enabled via locations, which group together a set of related processes into a location tree. Locations are responsible for the semantics of message dispatch and exception handling. They’re the basic unit of reflection, too - a location can be reified, which pauses and reifies all contained processes and sublocations. A reified location can be used for debugging, for mobile code, for become:-like operations, and many other things. A user can install user code at a new sublocation, which allows refinement or replacement of the default message dispatch behaviour in the style of [Malenfant92].

Code itself in the system is a distinct entity - the instruction stream contained in a method is a different kind of thing from all of the categories discussed so far. It’s the role of the location to interpret the code stream.

The dynamic state of a computation is held at the metalevel as a process. Processes correspond to the state of a particular interpreter: the registers, the stack, the current continuation etc. They’re only accessible by reflecting on a location.

To sum up, then:

  • objects are immutable, and have no identity;
  • cells are mutable, have identity, and are the means of communication and synchronisation in the system;
  • locations are metalevel constructs that serve as interpreters for code, that specify message dispatch and exception-handling algorithms, and that are the loci of reflection in the system.
  • code is a stream of instructions intended for interpretation by locations.
  • processes are computations in the system running some code at a location, manipulating cells and constructing and transmitting objects.
Shifting from Object-Oriented to Message-Oriented

The way locations and code are laid out suggests strongly the infinite tower of reflective interpreters discussed in [Jefferson92]. This tower of interpretation, taken with the immutability of objects and the similarity of cells to π-calculus ports, starts to make the system look more like a message-oriented system than an object-oriented system.

The object-orientation is still present since we still late-bind code to message sends, but the emphasis has changed: not only is there no longer any behaviour necessarily attached to the objects - all the behaviour is external, in the code resolved by the message-dispatch algorithm in use - but there is no longer necessarily any state associated with the objects either!

Objects in the system start to look more like messages than objects. A collection of messages is bundled up with a selector and sent to the metaobject for message dispatch, and a piece of code specialised for handling that combination of arguments is selected and invoked.

Smalltalk Self Slate ML (SML, OCaml) π-calculus ThiNG
Language entities Objects, Classes, Code, Method Contexts, Block Contexts Objects, Code, Method Contexts, Block Contexts Objects, Code, Method Contexts, Block Contexts Tuples, Reference Cells, Functions, Evaluation Contexts Messages, Channels, Processes Messages, Channels, Processes/Code, Locations
Transfer of control lookup/apply lookup/apply lookup/apply apply message-send lookup/message-send
Kind of lookup single dispatch single dispatch multiple dispatch multiple dispatch
Reflective ability full structural reflection; partial behavioural reflection full structural reflection; partial behavioural reflection (?) full structural reflection; partial behavioural reflection no reflection no reflection full structural, behavioural and lexical reflection

(Aside: I find it interesting that a lot of OO thinking seems to implicitly assume that in an OO system everything is an object when that’s clearly not the case. Not only are there other entities in the language - code, method contexts, and block contexts, for instance - but they are metalevel entities, and tend not to be first-class citizens. Their reified representations may be objects, but the entities themselves are not. If you write down expressions in an object calculus, you end up with things in the expressions that aren’t objects.)

I can’t decide whether to stick with the evaluate-every-argument-in-parallel model or not; it seems that there are three obvious things that could work:

  1. Evaluate every argument in parallel (just like the current prototype). This is very inefficient on current CPUs. It means that the system automatically exposes a lot of fine-grained concurrency, though, which is nice.

  2. Evaluate only annotated arguments in parallel (just like Slate). This gives the programmer control over how much concurrency they want in their program. Not so much fine-grained concurrency is exposed, but on the other hand the code generated could be quite efficient.

  3. Tell the programmer to expect that every argument will be evaluated in parallel, but secretly evaluate most of them in serial. Some finessing will be required to avoid deadlocks caused by overeager serialization of intercommunicating branches; one rough guideline could be to serialize provably noncommunicating parallel branches only up to the end of inlining. Once a call proceeds with a real out-of-line call frame, act as for the every-argument-in-parallel case. (I don’t know how to prove noncommunication, yet, either. I haven’t really thought about it yet.)

This concurrency business is looking more and more like the Next Big Thing: here’s an interesting article spelling out the trends and the coming end of the clock-speed-increase “free lunch”. (That link via LTU).

Message-Oriented Programming

It turns out that the term “Message-Oriented Programming” isn’t new - there’s an existing body of work using the term, sometimes in the way I’d like to use it, sometimes in a related but different sense:

References