4.1. Overview of anonymisation

You set things up as follows.

You start with one or more source database(s) and blank destination database(s), plus blank secret database(s) that CRATE uses for temporary storage and storing PID-to-RID lookup information.
- You may need to preprocess your source database a little, if it has an odd or inconvenient format.
You create an anonymiser config file that points to your databases and governs high-level parameters relating to the anonymisation process.
You create a data dictionary that describes what to do with each column of the source database(s). For example, some columns may be allowed through unchanged; some may be skipped; some may contain patient identifiers; some may contain free text that needs to have identifiers “scrubbed” out. You tell your config file about your data dictionary.
- CRATE can draft one for you, but you will need to check it manually.
You run the anonymiser, pointing it at your config file.
- There are some additional options here (for example, to restrict to specific patients or eliminate all free text fields), which allow you to use a standard config file and data dictionary but produce variant versions of your database without too much effort.