6.2. Data and database prerequisites

6.2.1. Source database(s)

  • There should be a database-wide integer patient ID field, present in every table (or view, if you need to add it) containing patient-identifiable data. Tables should have an index on this field, for speed.

  • For non-patient tables, it is usually faster to have an integer primary key (PK) (particularly in a multiprocessing environment, where CRATE divides up the work in part by PK). However, this is not obligatory.

Summary: all tables should be indexed on an integer PK. All patient tables should also be indexed on an integer patient number.

If you are working with a RiO database, the preprocessor will do this for you. See below.

6.2.2. Destination database(s)

You are likely to want one destination database for every set of source databases that share the same PID. So, for example (EMR = electronic medical record):

Source database



Destination database

Brand X EMR

Brand X number

NHS number

Destination database A

Legacy hospital system 1

Trust ‘M’ number

NHS number

Destination database B

Legacy hospital system 2

Trust ‘M’ number

NHS number

Destination database B


IAPT reference number

NHS number

Destination database C

CRATE will create the contents for you; you just need to create the database, and tell CRATE about it via an SQLAlchemy URL.

You will be able to link records later from databases A–C in this example using the MRID (= hashed NHS number in this example).

6.2.3. Secret administrative database(s)

You will need one secret administrative database for every destination database. This will store information like the PID-to-RID mapping, the MPID-to-MRID mapping, and state information to make incremental updates faster.

6.2.4. Web site administrative database

You’ll need a database (and it’s probably easiest to have it separate) to store secret administrative information for the CRATE web front end.