Linkage is about joining two databases together using common keys or
identifiers. In the context of de-identified clinical records linkage, it is
often desirable to link without using any identity information.
One way to do so is to “pseudonymise” both databases, creating a research ID
(pseudonym, tag) from an identifier (such as an NHS number in the UK). A common
operation is for institution A to say to institution B: “please send me
de-identified data for the following people”. If the two institutions share a
common passphrase (secret key), they can both “hash” their identifiers in the
same way, and then check for matches using the resulting pseudonyms. This could
work as follows:
Institutions A and B agree a secret passphrase.
Institution A hashes the identifiers of relevant people, for whom it would
like de-identified data from institution B.
Institution A sends the resulting pseudonyms to institution B.
Institution B hashes all its identifiers with the same passphrase.
Institution B looks for pseudonyms that match those requested by A.
Institution B sends de-identified data for those people (only) back to A.
For example, using the passphrase “tiger” and the HMAC-MD5 algorithm, the
following hashes (expressed as hexadecimal) can be generated consistently: