Equality, Hashing and Canonicalization
Tue 22 Nov 2011 13:15 EST
I’ve just been reading “Type-Safe Modular Hash-Consing” (Filliâtre & Conchon, 2006, http://www.lri.fr/~filliatr/ftp/publis/hash-consing2.pdf) and it got me thinking about the relationship between finding a canonical element of a set via some equivalence and simply deciding the equivalence itself.
In order to hashcons,
according to the paper, you need an equivalence predicate T
×
T
→ Bool
and a hash function T
→ Int
. They have to
line up. This lining up seems interesting. The hash function is naming
an equivalence class of T
s - or at least it would be if it weren’t
permitted to spuriously identify things sometimes! So it’s a rough
equivalence class.
But my main idea is that say we were hashconsing append-lists, defined by
T ::= Nil | T ++ T | [T] |
We’d want our equivalence predicate and our hash function to honour t
++ Nil
≡ t
. But while it’s doing so, there’s also the
opportunity for it to pick a canonical element: in this case it should
choose just t
rather than t ++ Nil
, since it is structurally
smaller.
So perhaps hashconsing should be expressed using a canonicalization
function T
→ T
and a hash function T
→ Int
, rather than
referring to an equivalence predicate at all?
In some sense a hashcons operator (per the paper taking T
→
Hashconsed T
) is a kind of canonicalizing function, of course, but
slightly weird in that it has links to the resource management
facility of the underlying VM rather than simply expressing an
object-level equivalence.
I wonder what would be a good way of writing a canonicalization
function. It feels a lot like a standard reduction, with the notion of
reduction being rules like t ++ Nil
→ t
, Nil ++ t
→
t
etc.
So a hashconsing system is a kind of VM-supported normal-form-finding interpreter. For data, rather than programs. Huh.