14.7.7. crate_anon.preprocess.preprocess_systmone
crate_anon/preprocess/preprocess_systmone.py
Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).
This file is part of CRATE.
CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.
Preprocess a copy of a SystmOne database – primarily to index it.
- crate_anon.preprocess.preprocess_systmone.add_postcode_geography_view(engine: sqlalchemy.engine.base.Engine, address_table: str, postcode_db: str, geog_cols: List[str], view_name: str) None [source]
Creates a source view to add geography columns to an address table including postcodes, linking in e.g. LSOA/IMD information from an ONS postcode table (e.g. imported by CRATE; see postcodes.py).
- Parameters
engine – An SQLAlchemy Engine.
address_table – The name of the address table in the source SystmOne database.
postcode_db – The name of the database (and, for SQL Server, the schema) in which ONS postcode information is stored.
geog_cols – Columns to merge in from the postcode database.
view_name – The name of the view to create in the source SystmOne database.
Re SystmOne postcode encoding: - CPFT creates PostCode_NoSpaces. - However
SELECT COUNT(*) FROM S1_PatientAddress WHERE CHARINDEX(' ', FullPostcode) > 0; -- ... gives lots of hits (e.g. 593k); so they mostly have spaces in. SELECT COUNT(*) FROM S1_PatientAddress WHERE CHARINDEX(' ', FullPostcode) = 0; -- ... a few (e.g. 20); all are just the first halves, and none are a -- full postcode with space missing. SELECT DISTINCT LEN(FullPostcode) FROM S1_PatientAddress ORDER BY LEN(FullPostcode); -- NULL, 4, 5, 6, 7, 8
So, with a bit more testing, we conclude that SystmOne uses STANDARD VARIABLE-LENGTH FORMAT WITH SPACES.
- crate_anon.preprocess.preprocess_systmone.add_testpatient_view(engine: sqlalchemy.engine.base.Engine, patient_table: str, view_name: str) None [source]
Creates a source view to find extra test patients, where they have not been correctly identified by the “official” method or caught by additional local filters.
- Parameters
engine – An SQLAlchemy Engine.
patient_table – The name of the patient table in the source SystmOne database.
view_name – The name of the view to create in the source SystmOne database.
Test with:
SELECT t.IDPatient, p.FirstName, p.Surname FROM vw_crate_FindExtraTestPatients t INNER JOIN S1_Patient p ON p.RowIdentifier = t.RowIdentifier
- crate_anon.preprocess.preprocess_systmone.main() None [source]
Command-line parser. See command-line help.
- crate_anon.preprocess.preprocess_systmone.preprocess_systmone(engine: sqlalchemy.engine.base.Engine, context: crate_anon.preprocess.systmone_ddgen.SystmOneContext, allow_unprefixed_tables: bool = False, drop_danger_drop: bool = False, postcode_db_name: Optional[str] = None, geog_cols: Optional[List[str]] = None) None [source]
Add indexes to a SystmOne source database. Without this, anonymisation is very slow. Also adds pseudo-PK columns to selected tables.