Smalltalk programs strictly alternate between data and behaviour. Messages (the only kind of data in Smalltalk) are implicit, constructed fresh at each point in the program method call syntax is used, and are shallow, meaning that the slots in each message object always contain references to objects, never other messages.1
This is in contrast to most other languages with true data, where (for example) lists may contain other data, and are not constrained to containing only object references / function values.
So, what would a Smalltalk be like without this strict alternation, where messages could appear as values? Suddenly the universe of Smalltalk values grows larger: previously, everything was an object, but now some things are data!
Of course, objects acting as reified messages may appear! ↩
Usually, method lookup results in a single body to execute. This is analogous to the way pattern matching usually tries patterns in order, selecting just one continuation to execute.
What if, instead, we allowed all matching patterns from a bunch of alternatives to execute? (Or, in object-oriented terms: executed all methods potentially matching a given method call.)
Pattern alternatives do not become a kind of superposition, because there’s no notion of mutual exclusion; instead, they become a way of creating multiple concurrent branches of execution, somehow. (Not to say there’s any particular kind of interleaving or parallel execution of these branches that makes any sense! One could well limit consideration to sequential execution of each matching method, to start with.)
Can we recover true alternatives from this kind of every-match construct (it needs a name!)? One approach is to follow the idea from Alex’s, Mahdi’s and my PEG paper,1 treating alternation (/) as a kind of parallel match construct and using negative lookahead to cause a later branch to fail if some earlier branch succeeds.
T. Garnock-Jones, M. Eslamimehr, and A. Warth, “Recognising and Generating Terms using Derivatives of Parsing Expression Grammars,” Jan. 2018. https://arxiv.org/abs/1801.10490↩
A fantasy abstract is a short piece of writing in the style of the abstract of an academic
paper presenting the outline of a piece of research that has not (yet) been done. It acts as
inspiration, as a means of communicating an idea, and as a place to nucleate further thinking
on the topic, perhaps eventually kicking off the desired research.
It’s something I started doing during my PhD studies to help record and structure the
relationships among ideas. I record a little bit of metadata about each abstract: not much more
than the names of one or more broad research “themes” or threads that the idea might fit into.
For example, here’s one from January 2011, shortly after I started experimenting with the form.
Fourteen years later it remains fantastic and unexplored (by me at least!):
Contracts for Protocols
Created: 2011-01-10
Thread
Network Languages
Abstract
Existing messaging middleware systems provide very low-level facilities to application developers, ranging from simple point-to-point datagram transfer up through simple stereotypical interaction patterns such as (optionally transacted) request-reply or publish-subscribe. These low-level facilities are then composed by the application developer into higher-level interactions, but without the benefit of any formal way of describing the higher-level interactions. This paper introduces contracts for messaging protocols implemented using messaging middleware, describes a prototype implementation, and discusses lessons learned.
Some things to note:
Citations are useful if you have them, but the main point is to capture the idea, not do
an exhaustive background literature survey. In the example above there’s the ludicrous
omission of any mention of session types, for example; what I had in mind was something akin
to what, these days, are called “dynamic monitors” in the session types literature. Perhaps
if I’d expanded this abstract into an actual paper at the time it’d have been a timely
contribution :-) It’s a bit stale now…
It can be an absolute fantasy. Feel free to refer to nonexistent (but plausible?) research
results. If you ever pick up the idea or gift it to someone else, it’ll be made rigorous and
realistic then. Use the fantasy abstract to get the feeling of your idea.
Small experiments in the use of libliftoff to try out the modern Linux graphics stack drove
home quite how slow DRM “dumb buffers” can be, but also that it’s reading that’s slow, not
writing.
Reading from a “dumb buffer” on my AMD GPU is orders of magnitude slower than reading from
RAM. It can take seconds to read out a full 4k frame. It’s roughly a thousand times slower
than reading RAM.12
Writing, by contrast, is quick.
While it is folklore that “dumb buffers are slow”, I found it challenging to find any
authoritative source on the matter. However, I did find something. In
/usr/include/drm/drm.h, we see the following comment, which sort of hints at the wider
situation:
/**
* DRM_CAP_DUMB_PREFER_SHADOW
*
* If set to 1, the driver prefers userspace to render to a shadow buffer
* instead of directly rendering to a dumb buffer. For best speed, userspace
* should do streaming ordered memory copies into the dumb buffer and never
* read from it.
*
* Note that this preference only applies to dumb buffers, it's irrelevant for
* other types of buffers.
*/#define DRM_CAP_DUMB_PREFER_SHADOW 0x4
Indeed, “for best speed […] never read from it.”
Update: Subsequent experimentation using gbm to allocate buffer objects shows that it
doesn’t help if you need to read or write pixel data to them (as opposed to, presumably, using
the GPU to render into them). Setting the GBM_BO_USE_WRITE flag when allocating a buffer
object, to allow subsequent writing of pixel data, causes the dri backend of
gbm
to simply allocate a “dumb buffer”!
Quick-and-dirty C experimentation shows speeds of ~2ms to read a full
3840×2160×32bit frame out of normal RAM. That’s about 16GB/s. Eyeballing the slow “dumb
buffer” read times suggests then perhaps about 16MB/s for that! ↩
As a corollary to this realisation, I learned that
attempting to use surfaces backed solely by “dumb buffers” to do fallback software
composition is a losing proposition. Hence the whole idea of “shadow” buffers,
presumably! ↩
However, the details are a bit fiddly. Here’s a cheat-sheet I used recently for a simple TCP
service written using Erlang.
My program was a single module, running outside of any OTP application context. The
instructions here need minor emendation to either explicitly list modules to purge and reload
or to discover all modules within a single application; see the places in server-reload
below mentioning the atom my_server.
I did not use the -on_load() directive, because I wanted to be able to use multiple nodes
rather than controlling reloads from a single node’s shell repl, and I couldn’t figure out how
to make the two play nicely together.
The Erlang
I exported a code_change/0 from my module, to be called after loading a new version of the
module into a node. It sends a message code_change to each “global” actor in my program (in
this case, there was only one).
-export([code_change/0]).code_change()->io:format("+ code_change~n"),%% name registered previously with `global:register_name/2`:
global:send(name_of_my_global_actor,code_change),ok.
That actor distributes the notification on to any inferior actors it is managing, and then does
an “MFA” self-call to upgrade its own codebase.
That’s all. The end result worked well: I used it to run a hotfix to my TCP service with many
tens of live, active connections, and not one of them noticed a thing.
I had a small insight yesterday while building a component for a small web
app: the user interface for editing an incomplete value of sum type
A+B needs to remember a product of input 2×A×B from the user:
A + B ⟿ 2 × A × B
This allows the user to ergonomically change their mind about whether
they’re building an A or a B without losing partially constructed
values.
More precisely, the UI for a value of type A+B needs in general to be
able to remember and manipulate 2×(A+1)×(B+1):
A + B ⟿ 2 × (A+1) × (B+1)
The extra 1s allow for nulls, for temporarily missing but required values.
You could similarly generalise to allow for temporarily invalid or
unparseable values.
Example
Consider UI for creating a new project in an IDE, with two available
options: create a new local project, by simply creating a new directory,
or clone an existing git repository.
The user interface for this will look something like
Here we see that while a value of type NewProject is being built, we need
to remember four strings (abstractly, Str×Str×Str×Str),
plus a boolean indicating whether we ultimately want a “local” or “clone”
project type (abstractly, 2).
All told, that’s
Str + Str×Str×Str ⟿ 2 × Str×Str×Str×Str
which exactly fits the pattern of
A + B ⟿ 2 × A × B
Generalization to bigger sums
The translation can be applied recursively, but it (harmlessly) remembers
slightly too much transient UI state,
A+(B+C) ⟿ 2 × A × (2 × B × C)
so perhaps it’s better to think about it applying directly to n-ary sums:
I noticed a bug in Guile
3.0.9’s aarch64 atomics
handling, and found a couple of apparent solutions (1, 2), but one of them
is weird enough for me to write this post.
(ETA: Nonstory. The problem was that the mov instruction isn’t idempotent! Hat tip to
Andy Wingo for figuring out what the issue was. I’ve updated the rest of the article, and I’ll
leave it here for posterity.)
Long story short, the problem was with the equivalent of C’s
atomic_exchange. Here’s the code
that Guile’s JIT was generating:
This code appears to occasionally lose writes (!).ETA: This code definitely loses
writes when interference means it has to go around the loop.
The first patch I wrote boringly replaced the lot with a single
swpal x0, x0, [x1]
which is fine, if you have an ARM v8.1 device to hand, but not if you don’t have a machine with
Large System
Extensions. So I
tried, on a hunch, the second patch, which just changed the target of the cbnz,
producing code like this:
… and the issue disappeared! What! This shouldn’t have made a difference! Should it?ETA: And fair enough, too! If the branch targets the mov instruction, the value of x0
that ldaxr set is used, meaning that the whole operation simply becomes a no-op assignment.
Are aarch64 atomics really this sensitive? Is there only One True Instruction Sequence that
should be used to implement atomic_exchange? Why does making this seemingly-insignificant
change produce such a noticeable effect?ETA: Nothing to see here :-)
This can bite you in unexpected ways. Imagine you have a CSP-like Channel<T> class for
sending Ts back and forth. Channel<T> might have a method like this:
asyncpop():Promise<T|undefined>{...}
There’s an obvious problem here: what if undefined ∈ T? So you make sure to note, in the
comment attached to Channel<T>, that T is not allowed to include undefined.
But the less obvious problem is that T is not allowed to contain Promise<undefined> either,
even though in other contexts a promise of undefined cannot be confused with undefined:
To see why this is a problem, instantiate T with Promise<undefined>, and look at the type
of pop():
Promise<Promise<undefined>|undefined>
Because JavaScript collapses promises-of-promises to just promises, this is equivalent to just
Promise<undefined>
and you’ve lost the ability to tell whether pop() yielded a T or an undefined.
TypeScript does not warn you about this, incidentally. (Ask me how I know.)
Workaround
Instead of accepting this loss of structure and adding another caveat to Channel<T> to work
around JavaScript’s broken design—“T must not include either undefined or
Promise<undefined> or Promise<Promise<undefined>> etc.”—I decided to change the signature
of pop():
Now both Channel<undefined> and Channel<Promise<undefined>> are sensible and work as
expected. No more exceptions regarding what Ts a Channel may carry.
When T is Promise<undefined>, in particular, we see that the type of pop() is
Promise<{ item: Promise<undefined> } | undefined>
Because the Promises aren’t immediately nested, JavaScript won’t erase our structure.
(Ironically, we’ve introduced a monad (Maybe<T>) to fix the bad behaviour of something that
should have been a monad…)