Part of a series: #squeak-phone
Hand-written binary parsing/unparsing sucks
As I’ve been working on a mobile Smalltalk system, I’ve found myself needing to decode and encode a number of complex telephony packet formats1 such as the following, an incoming SMS delivery message containing an SMS-DELIVER TPDU in GSM 03.40 format, containing seven-bit (!) GSM 03.38-encoded text:
02 01 ffff 01 28 07911356131313f3 04 0b911316325476f8 000002909021044480 0ec67219644e83cc6f90b9de0e01
It turns out there are a plethora of such binary formats needed to get a working cellphone.
I started off hand-rolling them, but it quickly became too much, so
borrowed liberally stole from Erlang, and implemented
BitSyntax for Smalltalk. (After all, I am
Erlang-influenced actors for
the Smalltalk system daemons!)
Every language needs a BitSyntax, it seems!
What does BitSyntax do?
The BitSyntax package includes a
BitSyntaxCompiler class which
BitSyntaxSpecification objects, producing reasonably
efficient Smalltalk for decoding and encoding binary structures,
mapping from bytes to instance variables and back again.
The interface to the compiled code is simple. After compiling a
BitSyntaxSpecification for the data format above, we can analyze the
example message straightforwardly:
and, if we wish, serialize it again:
How does it work?
Syntax specifications are built using an embedded domain-specific language (EDSL).
For example, for the above data format, we would supply the following
spec for class
along with appropriate specs for
for space reasons here) and the following for the
These are non-trivial examples; the simple cases are simple, and the complex cases are usually possible to express without having to write code by hand. The EDSL is extensible, so more combinators and parser types can be easily added as the need arises.
How do I get it?
Load it into an up-to-date trunk Squeak image:
You can also visit the project page directly.
BitSyntax-Help contains an extensive manual written for
Squeak’s built-in documentation system.
Telephony packet formats are particularly squirrelly in places. Seven-bit text encoding? Really? Multiple ways to encode phone numbers. Lengths sometimes in octets, sometimes in half-octets, sometimes in septets (!) with padding implicit. Occasional eight-bit data shoehorned into a septet-based section of a message. Bit fields everywhere. Everything is an acronym, cross-referenced to yet another document. Looking at the 3GPP and GSM specs gave me flashbacks to the last time I worked in telephony, nearly 20 years ago… ↩