6.1. Overview of anonymisation

You set things up as follows.

  • You start with one or more source database(s) and blank destination database(s), plus blank secret database(s) that CRATE uses for temporary storage and storing PID-to-RID lookup information.

    • You may need to preprocess your source database a little, if it has an odd or inconvenient format.

  • You create an anonymiser config file that points to your databases and governs high-level parameters relating to the anonymisation process.

  • You create a data dictionary that describes what to do with each column of the source database(s). For example, some columns may be allowed through unchanged; some may be skipped; some may contain patient identifiers; some may contain free text that needs to have identifiers “scrubbed” out. You tell your config file about your data dictionary.

    • CRATE can draft one for you, but you will need to check it manually.

  • You run the anonymiser, pointing it at your config file.

    • There are some additional options here (for example, to restrict to specific patients or eliminate all free text fields), which allow you to use a standard config file and data dictionary but produce variant versions of your database without too much effort.