6.1. Overview of anonymisation¶
You set things up as follows.
- You start with one or more source database(s) and blank destination
database(s), plus blank secret database(s) that CRATE uses for
temporary storage and storing PID-to-RID lookup information.
- You may need to preprocess your source database a little, if it has an odd or inconvenient format.
- You create an anonymiser config file that points to your databases and governs high-level parameters relating to the anonymisation process.
- You create a data dictionary that describes what to do with each column
of the source database(s). For example, some columns may be allowed through
unchanged; some may be skipped; some may contain patient identifiers; some
may contain free text that needs to have identifiers “scrubbed” out. You
tell your config file about your data dictionary.
- CRATE can draft one for you, but you will need to check it manually.
- You run the anonymiser, pointing it at your config file.
- There are some additional options here (for example, to restrict to specific patients or eliminate all free text fields), which allow you to use a standard config file and data dictionary but produce variant versions of your database without too much effort.