14.7.7. crate_anon.preprocess.preprocess_systmone

crate_anon/preprocess/preprocess_systmone.py


Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.


Preprocess a copy of a SystmOne database – primarily to index it.

crate_anon.preprocess.preprocess_systmone.add_postcode_geography_view(engine: sqlalchemy.engine.base.Engine, address_table: str, postcode_db: str, geog_cols: List[str], view_name: str) None[source]

Creates a source view to add geography columns to an address table including postcodes, linking in e.g. LSOA/IMD information from an ONS postcode table (e.g. imported by CRATE; see postcodes.py).

Parameters
  • engine – An SQLAlchemy Engine.

  • address_table – The name of the address table in the source SystmOne database.

  • postcode_db – The name of the database (and, for SQL Server, the schema) in which ONS postcode information is stored.

  • geog_cols – Columns to merge in from the postcode database.

  • view_name – The name of the view to create in the source SystmOne database.

Re SystmOne postcode encoding: - CPFT creates PostCode_NoSpaces. - However

SELECT COUNT(*) FROM S1_PatientAddress
WHERE CHARINDEX(' ', FullPostcode) > 0;
-- ... gives lots of hits (e.g. 593k); so they mostly have spaces in.

SELECT COUNT(*) FROM S1_PatientAddress
WHERE CHARINDEX(' ', FullPostcode) = 0;
-- ... a few (e.g. 20); all are just the first halves, and none are a
-- full postcode with space missing.

SELECT DISTINCT LEN(FullPostcode) FROM S1_PatientAddress
ORDER BY LEN(FullPostcode);
-- NULL, 4, 5, 6, 7, 8

So, with a bit more testing, we conclude that SystmOne uses STANDARD VARIABLE-LENGTH FORMAT WITH SPACES.

crate_anon.preprocess.preprocess_systmone.add_testpatient_view(engine: sqlalchemy.engine.base.Engine, patient_table: str, view_name: str) None[source]

Creates a source view to find extra test patients, where they have not been correctly identified by the “official” method or caught by additional local filters.

Parameters
  • engine – An SQLAlchemy Engine.

  • patient_table – The name of the patient table in the source SystmOne database.

  • view_name – The name of the view to create in the source SystmOne database.

Test with:

SELECT t.IDPatient, p.FirstName, p.Surname
FROM vw_crate_FindExtraTestPatients t
INNER JOIN S1_Patient p
    ON p.RowIdentifier = t.RowIdentifier
crate_anon.preprocess.preprocess_systmone.main() None[source]

Command-line parser. See command-line help.

crate_anon.preprocess.preprocess_systmone.preprocess_systmone(engine: sqlalchemy.engine.base.Engine, context: crate_anon.preprocess.systmone_ddgen.SystmOneContext, allow_unprefixed_tables: bool = False, drop_danger_drop: bool = False, postcode_db_name: Optional[str] = None, geog_cols: Optional[List[str]] = None) None[source]

Add indexes to a SystmOne source database. Without this, anonymisation is very slow. Also adds pseudo-PK columns to selected tables.