6.1. Overview of anonymisation

You set things up as follows.

  • You start with one or more source database(s) and blank destination database(s), plus blank secret database(s) that CRATE uses for temporary storage and storing PID-to-RID lookup information.
    • You may need to preprocess your source database a little, if it has an odd or inconvenient format.
  • You create an anonymiser config file that points to your databases and governs high-level parameters relating to the anonymisation process.
  • You create a data dictionary that describes what to do with each column of the source database(s). For example, some columns may be allowed through unchanged; some may be skipped; some may contain patient identifiers; some may contain free text that needs to have identifiers “scrubbed” out. You tell your config file about your data dictionary.
    • CRATE can draft one for you, but you will need to check it manually.
  • You run the anonymiser, pointing it at your config file.