## Equality, Hashing and Canonicalization

#### by Tony Garnock-Jones / @leastfixedpoint

I’ve just been reading “Type-Safe Modular Hash-Consing” (Filliâtre & Conchon, 2006, http://www.lri.fr/~filliatr/ftp/publis/hash-consing2.pdf) and it got me thinking about the relationship between finding a canonical element of a set via some equivalence and simply deciding the equivalence itself.

In order to hashcons,
according to the paper, you need an equivalence predicate `T`

×
`T`

→ `Bool`

and a hash function `T`

→ `Int`

. They have to
line up. This lining up seems interesting. The hash function is naming
an equivalence class of `T`

s - or at least it would be if it weren’t
permitted to spuriously identify things sometimes! So it’s a rough
equivalence class.

But my main idea is that say we were hashconsing append-lists, defined by

T ::= Nil | T ++ T | [T] |

We’d want our equivalence predicate and our hash function to honour ```
t
++ Nil
```

≡ `t`

. But while it’s doing so, there’s also the
opportunity for it to pick a canonical element: in this case it should
choose just `t`

rather than `t ++ Nil`

, since it is structurally
smaller.

So perhaps hashconsing should be expressed using a *canonicalization*
function `T`

→ `T`

and a hash function `T`

→ `Int`

, rather than
referring to an equivalence predicate at all?

In some sense a hashcons operator (per the paper taking `T`

→
`Hashconsed T`

) *is* a kind of canonicalizing function, of course, but
slightly weird in that it has links to the resource management
facility of the underlying VM rather than simply expressing an
object-level equivalence.

I wonder what would be a good way of writing a canonicalization
function. It feels a lot like a standard reduction, with the notion of
reduction being rules like `t ++ Nil`

→ `t`

, `Nil ++ t`

→
`t`

etc.

So a hashconsing system is a kind of VM-supported normal-form-finding interpreter. For data, rather than programs. Huh.