Equality, Hashing and Canonicalization

I’ve just been reading “Type-Safe Modular Hash-Consing” (Filliâtre & Conchon, 2006, http://www.lri.fr/~filliatr/ftp/publis/hash-consing2.pdf) and it got me thinking about the relationship between finding a canonical element of a set via some equivalence and simply deciding the equivalence itself.

In order to hashcons, according to the paper, you need an equivalence predicate T × TBool and a hash function TInt. They have to line up. This lining up seems interesting. The hash function is naming an equivalence class of Ts - or at least it would be if it weren’t permitted to spuriously identify things sometimes! So it’s a rough equivalence class.

But my main idea is that say we were hashconsing append-lists, defined by

T ::= Nil T ++ T [T]

We’d want our equivalence predicate and our hash function to honour t ++ Nilt. But while it’s doing so, there’s also the opportunity for it to pick a canonical element: in this case it should choose just t rather than t ++ Nil, since it is structurally smaller.

So perhaps hashconsing should be expressed using a canonicalization function TT and a hash function TInt, rather than referring to an equivalence predicate at all?

In some sense a hashcons operator (per the paper taking THashconsed T) is a kind of canonicalizing function, of course, but slightly weird in that it has links to the resource management facility of the underlying VM rather than simply expressing an object-level equivalence.

I wonder what would be a good way of writing a canonicalization function. It feels a lot like a standard reduction, with the notion of reduction being rules like t ++ Nilt, Nil ++ tt etc.

So a hashconsing system is a kind of VM-supported normal-form-finding interpreter. For data, rather than programs. Huh.