17. Abbreviations

Abbreviation

Meaning

C4C

consent for contact

CPFT

Cambridgeshire & Peterborough NHS Foundation Trust

CRATE

Clinical Records Anonymisation and Text Extraction (software) 4

CRIS

Clinical Records Interactive Search 2 3

CSV

comma-separated value (file)

DD

data dictionary

EMR

electronic medical record (system)

GATE

General Architecture for Text Engineering (software)

IAPT

UK Improving Access to Psychological Therapies service

ID

identifier

KCL

King’s College London

MPID

master patient identifier

MRID

master research identifier

NHS

UK National Health Service

NLP

natural language processing

PID

patient identifier

RCEP

RiO CRIS Extraction Program (by Servelec)

RDBM

Research database manager

RID

research identifier

RiO

An EMR product from Servelec

SLAM

South London & Maudsley NHS Foundation Trust

SQL

Structured Query Language 1

TRID

transient research identifier

TSV

tab-separated value (file)

UK

United Kingdom

18. Glossary

  • Master patient ID (MPID). A number that uniquely identifies a patient across many databases. In the UK, the NHS number is the usual MPID.

  • Master research ID (MRID). A research identifier that is unique to a de-identified patient’s record across many linked research databases. A securely hashed version of the MPID.

  • Patient ID (PID). A number that uniquely identifies a patient within a given database. For example, in a Servelec RiO database, the RiO number is the PID.

  • Research database administrator (RDBM). A person authorized to run a research database. They may also function as a member of the clinical administrative team, to whom clinicians may delegate work.

  • Research ID (RID). A research identifier that is unique to a de-identified patient’s record in a research database. A securely hashed version of the PID.

  • Transient research ID (TRID). An integer that is unique to a de-identified patient within a given database, but which is susceptible to being destroyed and replaced by a different number if the database is de-identified again. It’s faster than the RID, because it’s an integer, and it can be used reliably to link tables within a query, but it can’t be stored and relied on again later, unlike the RID or MRID.


Footnotes

1

Codd EF (1970). “A Relational Model of Data for Large Shared Data Banks.” Commun. ACM 13: 377–387. https://doi.org/10.1145/362384.362685.

2

Stewart R et al. (2009). “The South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLAM BRC) case register: development and descriptive data.” BMC Psychiatry 9: 51. https://www.ncbi.nlm.nih.gov/pubmed/19674459; https://doi.org/10.1186/1471-244X-9-51.

3

Fernandes A et al. (2013). “Development and evaluation of a de-identification procedure for a case register sourced from mental health electronic records.” BMC Medical Informatics and Decision Making 13: 71. https://www.ncbi.nlm.nih.gov/pubmed/23842533; https://doi.org/10.1186/1472-6947-13-71.

4

Cardinal RN (2017). “Clinical records anonymisation and text extraction (CRATE): an open-source software system.” BMC Medical Informatics and Decision Making 17: 50. https://www.ncbi.nlm.nih.gov/pubmed/28441940; https://doi.org/10.1186/s12911-017-0437-1.