14.7.6. crate_anon.preprocess.preprocess_rio
crate_anon/preprocess/preprocess_rio.py
Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).
This file is part of CRATE.
CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.
Preprocess RiO tables for CRATE.
RiO is a mental health EMR system from Servelec.
Things to do
Todo
preprocess_rio: specific supposed PK failing (non-unique) on incremental
Todo
preprocess_rio: Imperfectly tested: Audit_Created_Date, Audit_Updated_Date … some data for Audit_Created_Date, but incomplete audit table
Todo
preprocess_rio: Similarly, all cross-checks to RCEP output (currently limited by data availability)
Primary keys
In RCEP, Document_ID is VARCHAR(MAX), and is often:
'global_table_id_9_or_10_digits' + '_' + 'pk_int_as_string'
HOWEVER, the last part is not always unique; e.g. Care_Plan_Interventions.
Care_Plan_Interventions has massive tranches of ENTIRELY identical rows, including a column called, ironically, “Unique_Key”.
Therefore, we could either ditch the key entirely, or just use a non-UNIQUE index (and call it “key” not “pk”).
AND THEN… In Client_Family, we have Document_ID values like
773577794_1000000_1000001 ^^^^^^^^^ ^^^^^^^ ^^^^^^^ table ID RiO# Family member's RiO#
… there is no unique ID. And we don’t need the middle part as we already have Client_ID. So this is not very useful. We could mangle out the second and subsequent ‘_’ characters to give a unique number here, which would meaning having PK as BIGINT not INTEGER.
SQL Server’s
ROW_NUMBER()
relates to result sets.However,
ADD pkname INT IDENTITY(1, 1)
works beautifully and autopopulates existing tables.CHUCKED this way of back-mangling DocumentID, since it doesn’t work well:
ensure_columns_present(... RCEP_COL_MANGLED_KEY...) {pk} = CAST( SUBSTRING( {rcep_mangled_pk}, CHARINDEX('_', {rcep_mangled_pk}) + 1, LEN({rcep_mangled_pk}) - CHARINDEX('_', {rcep_mangled_pk}) ) AS INTEGER ), # pk=CRATE_COL_PK, # rcep_mangled_pk=RCEP_COL_MANGLED_KEY,
How is RiO non-core structured?
Index tables
AssessmentDates associates AssessmentID and ClientID with dates AssessmentFormGroupsIndex, e.g.: Name Description Version Deleted CoreAssess Core Assessment 16 0 CoreAssess Core Assessment 17 0 CoreAssessNewV1 Core Assessment v1 0 0 CoreAssessNewV1 Core Assessment v1 1 0 CoreAssessNewV2 Core Assessment v2 0 0 CoreAssessNewV2 Core Assessment v2 1 0 CoreAssessNewV2 Core Assessment v2 2 0 ^^^ ^^^ RiO form groups Nice names AssessmentFormGroupsStructure, e.g.: name FormName AddedDate FormgroupVersion FormOrder CoreAssessNewV2 coreasspresprob 2013-10-30 15:46:00.000 0 0 CoreAssessNewV2 coreassesspastpsy 2013-10-30 15:46:00.000 0 1 CoreAssessNewV2 coreassessbackhist 2013-10-30 15:46:00.000 0 2 CoreAssessNewV2 coreassesmentstate 2013-10-30 15:46:00.000 0 3 CoreAssessNewV2 coreassescapsafrisk 2013-10-30 15:46:00.000 0 4 CoreAssessNewV2 coreasssumminitplan 2013-10-30 15:46:00.000 0 5 CoreAssessNewV2 coreasspresprob 2014-12-14 19:19:06.410 1 0 CoreAssessNewV2 coreassesspastpsy 2014-12-14 19:19:06.410 1 1 CoreAssessNewV2 coreassessbackhist 2014-12-14 19:19:06.413 1 2 CoreAssessNewV2 coreassesmentstate 2014-12-14 19:19:06.413 1 3 CoreAssessNewV2 coreassescapsafrisk 2014-12-14 19:19:06.417 1 4 CoreAssessNewV2 coreasssumminitplan 2014-12-14 19:19:06.417 1 5 CoreAssessNewV2 coresocial1 2014-12-14 19:19:06.420 1 6 CoreAssessNewV2 coreasspresprob 2014-12-14 19:31:25.377 2 0 } NB CoreAssessNewV2 coreassesspastpsy 2014-12-14 19:31:25.377 2 1 } CoreAssessNewV2 coreassessbackhist 2014-12-14 19:31:25.380 2 2 } CoreAssessNewV2 coreassesmentstate 2014-12-14 19:31:25.380 2 3 } CoreAssessNewV2 coreassescapsafrisk 2014-12-14 19:31:25.380 2 4 } CoreAssessNewV2 coreasssumminitplan 2014-12-14 19:31:25.383 2 5 } CoreAssessNewV2 coresocial1 2014-12-14 19:31:25.383 2 6 } CoreAssessNewV2 kcsahyper 2014-12-14 19:31:25.387 2 7 } ^^^ ^^^ Form groups RiO forms; these correspond to UserAssess___ tables. AssessmentFormsIndex, e.g. Name InUse Style Deleted Description ... core_10 1 6 0 Clinical Outcomes in Routine Evaluation Screening Measure-10 (core-10) corealcsub 1 6 0 Alcohol and Substance Misuse coreassescapsafrisk 1 6 0 Capacity, Safeguarding and Risk coreassesmentstate 1 6 0 Mental State coreassessbackhist 1 6 0 Background and History coreassesspastpsy 1 6 0 Past Psychiatric History and Physical Health coreasspresprob 1 6 0 Presenting Problem coreasssumminitplan 1 6 0 Summary and Initial Plan corecarer 1 6 0 Carers and Cared For corediversity 1 6 0 Diversity Needs coremedsum 1 6 0 Medication, Allergies and Adverse Reactions coremenhis 1 6 0 Mental Health / Psychiatric History coremenstate 1 6 0 Mental State and Formulation coreperdev 1 6 0 Personal History and Developmental History ^^^ ^^^ ||| Nice names. ||| RiO forms; these correspond to UserAssess___ tables, e.g. UserAssesscoreassesmentstate AssessmentFormsLocks system only; not relevant AssessmentFormsTimeout system only; not relevant AssessmentImageForms SequenceID, FormName, ClientID, AssessmentDate, UserID, ImagePath ? no data AssessmentIndex, e.g. Name InUse Version DateBound RequiresClientID Deleted Description ... ConsentShare 1 3 1 0 1 Consent to Share Information CoreAssess 1 1 0 1 0 Core Assessment CoreAssess 1 2 0 1 0 Core Assessment CoreAssess 1 3 0 1 0 Core Assessment CoreAssess 1 4 0 1 0 Core Assessment CoreAssess 1 5 0 1 0 Core Assessment CoreAssess 1 6 0 1 0 Core Assessment CoreAssess 1 7 0 1 0 Core Assessment crhtaaucp 1 1 0 0 0 CRHTT / AAU Care Plan ^^^ These correspond to AssessmentStructure.Assessment AssessmentMasterTableIndex, e.g. TableName TableDescription core10 core10 Corealc1 TAUDIT - Q1 Corealc2 TAUDIT Q2 Corealc3 TAUDIT - Q3,4,5,6,7,8 Corealc4 TAUDIT - Q9,10 Corealc5 Dependence Corealc6 Cocaine Use CoreOtherAssess Other Assessments crhttcpstat CRHTT Care Plan Status ^^^ These correspond to UserMaster___ tables. ... Find with: SELECT * FROM rio_data_raw.information_schema.columns WHERE table_name LIKE '%core10%'; AssessmentPseudoForms, e.g. (all rows): Name Link CaseNoteBar ../Letters/LetterEditableMain.aspx?ClientID CaseNoteoview ../Reports/RioReports.asp?ReportID=15587&ClientID kcsahyper tfkcsa physv1hypa physassess16a&readonlymode=1 physv1hypb1 physasses16b1&readonlymode=1 physv1hypb2 physasses16b22&readonlymode=1 physv1hypbody testbmap&readonlymode=1 physv1hypvte vte&readonlymode=1 AssessmentReadOnlyFields, e.g. Code CodeDescription SQLStatementLookup SQLStatementSearch ADCAT Adminstrative Cat... SELECT TOP 1 u.Cod... ... ADD Client Address SELECT '$LookupVal... ... AdmCons Consultant SELECT '$LookupVal... ... AdmglStat Status at Admission SELECT '$LookupVal... ... AdmitDate Admission Date SELECT '$LookupVal... ... AEDEXLI AED Exceptions... SELECT TOP 1 ISNUL... ... Age Client Age SELECT '$LookupVal... ... Allergies Client Allergies SELECT dbo.LocalCo... ... bg Background (PSOC323) SELECT TOP 1 ISNUL... ... That Allergies one in full: - SQLStatementLookup SELECT dbo.LocalConfig_GetClientAllergies('$key$') AS Allergies - SQLStatementSearch = SQLStatementLookup And the bg/Background... one: - SQLStatementLookup SELECT TOP 1 ISNULL(Men03,'History of Mental Health Problems / Psychiatric History section of core assessment not filled'), ISNULL(Men03,'History of Mental Health Problems / Psychiatric History section of core assessment not filled') FROM dbo.view_userassesscoremenhis -- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -- view in which data column names renamed 'Men01', 'Men02'... WHERE ClientID = '$ClientID$' AND dbo.udf_Config_SystemValidationStatus(system_validationData,'Men03','v') = 1 ORDER BY AssessmentDate DESC, type12_UpdatedDate DESC - SQLStatementSearch = SQLStatementLookup - EXEC sp_helptext 'production.rio62camlive.dbo.udf_Config_SystemValidationStatus'; ... can't view this at present (am on the wrong machine?). AssessmentStructure, e.g.: FormGroup Assessment AssessmentVersion FormGroupVersion FormGroupOrder CoreAssessNewV1 CoreAssess 7 1 1 CoreAssessNewV2 CoreAssess 7 2 0 CoreAssessNewV2 CoreAssess 6 1 0 CoreAssessNewV2 CoreAssess 5 0 0 CoreAssessNewV2 CoreAssess 2 0 1 CoreAssessNewV2 CoreAssess 3 0 0 ... FORM GROUP to ASSESSMENT mapping
Main data tables
e.g.: UserAssesscoreassesmentstate ClientID system_ValidationData -- e.g. (with newlines added): '<v n="3"> <MentState s="v" a="<userID>" v="" d="" e="10/11/2013 13:23" o="1" n="3" b="" c=""> </MentState> </v>' ... where <userID> was a specific user ID NHSNum -- as VARCHAR AssessmentDate ServRef MentState -- this contains the text type12_NoteID -- PK type12_OriginalNoteID -- can be NULL type12_DeletedDate -- can be NULL type12_UpdatedBy type12_UpdatedDate formref UserAssesscoreassesspastpsy ClientID system_ValidationData NHSNum AssessmentDate ServRef PastPsyHist -- contains text PhyHealth -- contains text Allergies -- contains text type12_NoteID type12_OriginalNoteID type12_DeletedDate type12_UpdatedBy type12_UpdatedDate formref frailty -- numeric; in passing, here's the Rockwood frailty score
Lookup tables
UserMasterfrailty, in full: Code CodeDescription Deleted 1 1 - Very Fit 0 2 2 - Well 0 3 3 - Managing Well 0 4 4 - Vulnerable 0 5 5 - Mildly Frail 0 7 7 - Severely Frail 0 6 6 - Moderately Frail 0 9 9 - Terminally Ill 0 8 8 - Very Serverely Frail 0
So, overall structure, approximately:
RiO front-end example: Assessments [on menu] -> Core Assessment [menu dropdown] -> Core Assessment v2 [LHS, expands to...] -> Presenting Problem [LHS] Past Psychiatric History and Physical Health -> Service/Team Past Psychiatric History Physical Health / Medical History Allergies Frailty Score Background and History Mental State Capacity, Safeguarding and Risk Summary and Initial Plan Social Circumstances and Employment Keeping Children Safe Assessment So, hierarchy at the backend (> forward, < backward keys): AssessmentIndex.Name(>) / .Description ('Core Assessment') AssessmentStructure.Assessment(<) / .FormGroup(>) AssessmentFormGroupsIndex.Name(<) / .Description ('Core Assessment v2') AssessmentFormGroupsStructure.name(<) / .FormName(>) ('coreassesspastpsy') AssessmentFormsIndex.Name(<) / .Description ('Past Psychiatric History and Physical Health') UserAssesscoreassesspastpsy = data _________________(<) UserAssesscoreassesspastpsy.frailty(>) [lookup] UserMasterfrailty.Code(<) / .CodeDescription
Simplifying views (for core and non-core RiO) could be implemented in the preprocessor, or after anonymisation.
Better to do it in the preprocessor, because this knows about RiO. The two points of “RiO knowledge” should be:
the preprocessor;
PK, RiO number as integer, views
the ddgen_* information in the anonymiser config file.
tables to omit
fields to omit
default actions on fields
e.g. exclude if type12_DeletedDate is None
however, we could also do that more efficiently as a view, and that suits all use cases so far.
Scrubbing references to other patients
There are two ways to do this, in principle.
The first is to reshape the data so that data from “referred-to” patients appear in fields that can be marked as “third-party”. The difficulty is that the mapping is not 1:1 with any database row. For example, if row A has fields “MainCarer” and “OtherCarer” that can refer to other patients, then if the “OtherCarer” field changes, the number of rows to be examined changes. This prohibits using a real-world PK. (A view that joined according to these fields would not have an immutable pseudo-PK either.) And that causes difficulties for a change-detection system. One would have to mark such a view as something not otherwise read/copied by the anonymiser.
The other method, which is more powerful, is to do this work in the anonymiser itself, by defining fields that are marked as “third_party_xref_pid”, and building the scrubber recursively with “depth” and “max_depth” parameters; if depth > 0, the information is taken as third-party.
Well, that sounds achievable.
Done.
RiO audit trail and change history
- AuditTrail
SequenceID -- PK for AuditTrail
UserNumber -- FK to GenUser.UserNumber
ActionDateTime
AuditAction -- 2 = insert, 3 = update
RowID -- row number -- how does that work?
... cheerfully, SQL Server doesn't have an automatic row ID;
https://stackoverflow.com/questions/909155/equivalent-of-oracles-rowid-in-sql-server # noqa
... so is it the PK we've already identified and called crate_pk?
TableNumber -- FK to GenTable.Code
ClientID -- FK to ClientIndex.ClientID
...
- crate_anon.preprocess.preprocess_rio.add_postcode_geography_view(engine: sqlalchemy.engine.base.Engine, configoptions: crate_anon.preprocess.rio_view_func.RioViewConfigOptions, ddhint: crate_anon.preprocess.ddhint.DDHint) None [source]
Creates a RiO source view to add geography columns to an address table including postcodes, linking in e.g. LSOA/IMD information from an ONS postcode table (e.g. imported by CRATE; see postcodes.py).
- Parameters
engine – an SQLAlchemy Engine
configoptions – an instance of
RioViewConfigOptions
ddhint – a
crate_anon/preprocess/ddhint.DDHint
, which will be modified
- crate_anon.preprocess.preprocess_rio.create_rio_views(engine: sqlalchemy.engine.base.Engine, metadata: sqlalchemy.sql.schema.MetaData, configoptions: crate_anon.preprocess.rio_view_func.RioViewConfigOptions, ddhint: crate_anon.preprocess.ddhint.DDHint) None [source]
Creates all views on a RiO/RCEP database.
- Parameters
engine – an SQLAlchemy Engine
metadata – SQLAlchemy MetaData containing reflected details of database
configoptions – an instance of
RioViewConfigOptions
ddhint – a
crate_anon/preprocess/ddhint.DDHint
, which will be modified
- crate_anon.preprocess.preprocess_rio.drop_for_clindocs_table(table: sqlalchemy.sql.schema.Table, engine: sqlalchemy.engine.base.Engine) None [source]
Reverses the changes made by
process_clindocs_table()
to the RiO Clinical Documents table.- Parameters
table – SQLAlchemy Table to process
engine – an SQLAlchemy database Engine
- crate_anon.preprocess.preprocess_rio.drop_for_master_patient_table(table: sqlalchemy.sql.schema.Table, engine: sqlalchemy.engine.base.Engine) None [source]
Drop CRATE columns for the RiO master patient table.
- Parameters
table – SQLAlchemy Table to process
engine – an SQLAlchemy database Engine
- crate_anon.preprocess.preprocess_rio.drop_for_nonpatient_table(table: sqlalchemy.sql.schema.Table, engine: sqlalchemy.engine.base.Engine) None [source]
Drop CRATE indexes and CRATE columns for a non-patient table.
- Parameters
table – SQLAlchemy Table to process
engine – an SQLAlchemy database Engine
- crate_anon.preprocess.preprocess_rio.drop_for_patient_table(table: sqlalchemy.sql.schema.Table, engine: sqlalchemy.engine.base.Engine) None [source]
Drop CRATE indexes and CRATE columns for a patient table.
- Parameters
table – SQLAlchemy Table to process
engine – an SQLAlchemy database Engine
- crate_anon.preprocess.preprocess_rio.drop_for_progress_notes(table: sqlalchemy.sql.schema.Table, engine: sqlalchemy.engine.base.Engine) None [source]
Reverses the changes made by
process_progress_notes()
to the RiO Progress Note table.- Parameters
table – SQLAlchemy Table to process
engine – an SQLAlchemy database Engine
- crate_anon.preprocess.preprocess_rio.drop_rio_views(engine: sqlalchemy.engine.base.Engine, metadata: sqlalchemy.sql.schema.MetaData, configoptions: crate_anon.preprocess.rio_view_func.RioViewConfigOptions, ddhint: crate_anon.preprocess.ddhint.DDHint) None [source]
Drops all views on a RiO/RCEP database.
- Parameters
engine – an SQLAlchemy Engine
metadata – SQLAlchemy MetaData containing reflected details of database
configoptions – an instance of
RioViewConfigOptions
ddhint – a
crate_anon/preprocess/ddhint.DDHint
, which will be modified
- crate_anon.preprocess.preprocess_rio.get_rio_patient_id_col(table: sqlalchemy.sql.schema.Table) str [source]
Returns the RiO patient ID column for a table.
- Parameters
table – SQLAlchemy Table
- Returns
the column name for patient ID
- crate_anon.preprocess.preprocess_rio.get_rio_views(engine: sqlalchemy.engine.base.Engine, configoptions: crate_anon.preprocess.rio_view_func.RioViewConfigOptions, ddhint: crate_anon.preprocess.ddhint.DDHint, suppress_basetables: bool = True, suppress_lookup: bool = True) List[crate_anon.common.sql.ViewMaker] [source]
Gets all view definitions for RiO.
- Parameters
engine – an SQLAlchemy database Engine
configoptions – instance of
RioViewConfigOptions
ddhint – a
crate_anon/preprocess/ddhint.DDHint
, which will be modifiedsuppress_basetables – suppress (for data-dictionary generating hints) the RiO base tables for views that are made on them?
suppress_lookup – suppress (for data-dictionary generating hints) lookup tables whose information has been linked into views?
- Returns
a list of
crate_anon.common.sql.ViewMaker
objects
- crate_anon.preprocess.preprocess_rio.main() None [source]
Command-line parser. See command-line help.
- crate_anon.preprocess.preprocess_rio.process_all_tables(engine: sqlalchemy.engine.base.Engine, metadata: sqlalchemy.sql.schema.MetaData, configoptions: crate_anon.preprocess.rio_view_func.RioViewConfigOptions) None [source]
Process all RiO-like tables via
process_table()
.- Parameters
metadata – SQLAlchemy MetaData containing reflected details of database
engine – an SQLAlchemy database Engine
configoptions – instance of
RioViewConfigOptions
- crate_anon.preprocess.preprocess_rio.process_clindocs_table(table: sqlalchemy.sql.schema.Table, engine: sqlalchemy.engine.base.Engine, configoptions: crate_anon.preprocess.rio_view_func.RioViewConfigOptions) None [source]
Process the RiO (not RCEP) Clinical Documents Table.
For RiO only, not RCEP.
Index on document serial number.
Add/calculate/index the
crate_max_docver_for_doc
column.Add/calculate/index the
crate_last_doc_in_chain
column.
- Parameters
table – SQLAlchemy Table to process
engine – an SQLAlchemy database Engine
configoptions – instance of
RioViewConfigOptions
- crate_anon.preprocess.preprocess_rio.process_master_patient_table(table: sqlalchemy.sql.schema.Table, engine: sqlalchemy.engine.base.Engine, configoptions: crate_anon.preprocess.rio_view_func.RioViewConfigOptions) None [source]
Process a RiO master patient table:
Add an integer version of the NHS number.
- Parameters
table – SQLAlchemy Table to process
engine – an SQLAlchemy database Engine
configoptions – instance of
RioViewConfigOptions
- crate_anon.preprocess.preprocess_rio.process_nonpatient_table(table: sqlalchemy.sql.schema.Table, engine: sqlalchemy.engine.base.Engine, configoptions: crate_anon.preprocess.rio_view_func.RioViewConfigOptions) None [source]
Process a RiO or RiO-like non-patient table: - ensure it has an integer PK - add indexes
- Parameters
table – SQLAlchemy Table to process
engine – an SQLAlchemy database Engine
configoptions – instance of
RioViewConfigOptions
- crate_anon.preprocess.preprocess_rio.process_patient_table(table: sqlalchemy.sql.schema.Table, engine: sqlalchemy.engine.base.Engine, configoptions: crate_anon.preprocess.rio_view_func.RioViewConfigOptions) None [source]
Processes a RiO or RiO-like table:
Add
pk
andrio_number
columns, if not presentUpdate
pk
andrio_number
values, if not NULLAdd indexes, if absent
- Parameters
table – SQLAlchemy Table to process
engine – an SQLAlchemy database Engine
configoptions – instance of
RioViewConfigOptions
- crate_anon.preprocess.preprocess_rio.process_progress_notes(table: sqlalchemy.sql.schema.Table, engine: sqlalchemy.engine.base.Engine, configoptions: crate_anon.preprocess.rio_view_func.RioViewConfigOptions) None [source]
Process the RiO Progress Notes table.
Index by patient ID/note number.
Add/calculate/index the
crate_max_subnum_for_notenum
column.Add/calculate/index the
crate_last_note_in_edit_chain
column.If on an RCEP database, create a view.
- Parameters
table – SQLAlchemy Table to process
engine – an SQLAlchemy database Engine
configoptions – instance of
RioViewConfigOptions
- crate_anon.preprocess.preprocess_rio.process_table(table: sqlalchemy.sql.schema.Table, engine: sqlalchemy.engine.base.Engine, configoptions: crate_anon.preprocess.rio_view_func.RioViewConfigOptions) None [source]
Process a RiO-like table; the specific action is selected by the database type and there are custom processors for some tables (e.g. master patient table, progress notes, clinical documents).
- Parameters
table – SQLAlchemy Table to process
engine – an SQLAlchemy database Engine
configoptions – instance of
RioViewConfigOptions
- crate_anon.preprocess.preprocess_rio.table_is_rio_type(tablename: str, configoptions: crate_anon.preprocess.rio_view_func.RioViewConfigOptions) bool [source]
Is the named table one that uses the original RiO format?
- Parameters
tablename – name of the table
configoptions – instance of
RioViewConfigOptions