Source code for crate_anon.preprocess.preprocess_rio

#!/usr/bin/env python

"""
crate_anon/preprocess/preprocess_rio.py

===============================================================================

    Copyright (C) 2015, University of Cambridge, Department of Psychiatry.
    Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

    This file is part of CRATE.

    CRATE is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    CRATE is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with CRATE. If not, see <https://www.gnu.org/licenses/>.

===============================================================================

**Preprocess RiO tables for CRATE.**

RiO is a mental health EMR system from Servelec.


**Things to do**

.. todo:: preprocess_rio:
    specific supposed PK failing (non-unique) on incremental

.. todo:: preprocess_rio:
    Imperfectly tested: Audit_Created_Date, Audit_Updated_Date
    ... some data for Audit_Created_Date, but incomplete audit table

.. todo:: preprocess_rio:
    Similarly, all cross-checks to RCEP output (currently
    limited by data availability)


**Primary keys**

In RCEP, Document_ID is VARCHAR(MAX), and is often:

.. code-block:: none

    'global_table_id_9_or_10_digits' + '_' + 'pk_int_as_string'

HOWEVER, the last part is not always unique; e.g. Care_Plan_Interventions.

-   Care_Plan_Interventions has massive tranches of ENTIRELY identical rows,
    including a column called, ironically, "Unique_Key".

-   Therefore, we could either ditch the key entirely, or just use a non-UNIQUE
    index (and call it "key" not "pk").

-   AND THEN... In Client_Family, we have Document_ID values like

    .. code-block:: none

        773577794_1000000_1000001
        ^^^^^^^^^ ^^^^^^^ ^^^^^^^
        table ID  RiO#    Family member's RiO#

    ... there is no unique ID. And we don't need the middle part as we already
    have Client_ID. So this is not very useful. We could mangle out the second
    and subsequent '_' characters to give a unique number here, which would
    meaning having PK as BIGINT not INTEGER.

-   SQL Server's ``ROW_NUMBER()`` relates to result sets.

-   However, ``ADD pkname INT IDENTITY(1, 1)`` works beautifully and
    autopopulates existing tables.

-   CHUCKED this way of back-mangling DocumentID, since it doesn't work well:

    .. code-block:: none

        ensure_columns_present(... RCEP_COL_MANGLED_KEY...)

        {pk} = CAST(
            SUBSTRING(
                {rcep_mangled_pk},
                CHARINDEX('_', {rcep_mangled_pk}) + 1,
                LEN({rcep_mangled_pk}) - CHARINDEX('_', {rcep_mangled_pk})
            ) AS INTEGER
        ),

        # pk=CRATE_COL_PK,
        # rcep_mangled_pk=RCEP_COL_MANGLED_KEY,


**How is RiO non-core structured?**

- Index tables

  .. code-block:: none

    AssessmentDates
        associates AssessmentID and ClientID with dates

    AssessmentFormGroupsIndex, e.g.:
        Name               Description          Version    Deleted
        CoreAssess         Core Assessment      16          0
        CoreAssess         Core Assessment      17          0
        CoreAssessNewV1    Core Assessment v1   0           0
        CoreAssessNewV1    Core Assessment v1   1           0
        CoreAssessNewV2    Core Assessment v2   0           0
        CoreAssessNewV2    Core Assessment v2   1           0
        CoreAssessNewV2    Core Assessment v2   2           0
        ^^^                ^^^
        RiO form groups    Nice names

    AssessmentFormGroupsStructure, e.g.:
        name            FormName            AddedDate               FormgroupVersion FormOrder
        CoreAssessNewV2 coreasspresprob     2013-10-30 15:46:00.000 0                0
        CoreAssessNewV2 coreassesspastpsy   2013-10-30 15:46:00.000 0                1
        CoreAssessNewV2 coreassessbackhist  2013-10-30 15:46:00.000 0                2
        CoreAssessNewV2 coreassesmentstate  2013-10-30 15:46:00.000 0                3
        CoreAssessNewV2 coreassescapsafrisk 2013-10-30 15:46:00.000 0                4
        CoreAssessNewV2 coreasssumminitplan 2013-10-30 15:46:00.000 0                5
        CoreAssessNewV2 coreasspresprob     2014-12-14 19:19:06.410 1                0
        CoreAssessNewV2 coreassesspastpsy   2014-12-14 19:19:06.410 1                1
        CoreAssessNewV2 coreassessbackhist  2014-12-14 19:19:06.413 1                2
        CoreAssessNewV2 coreassesmentstate  2014-12-14 19:19:06.413 1                3
        CoreAssessNewV2 coreassescapsafrisk 2014-12-14 19:19:06.417 1                4
        CoreAssessNewV2 coreasssumminitplan 2014-12-14 19:19:06.417 1                5
        CoreAssessNewV2 coresocial1         2014-12-14 19:19:06.420 1                6
        CoreAssessNewV2 coreasspresprob     2014-12-14 19:31:25.377 2                0 } NB
        CoreAssessNewV2 coreassesspastpsy   2014-12-14 19:31:25.377 2                1 }
        CoreAssessNewV2 coreassessbackhist  2014-12-14 19:31:25.380 2                2 }
        CoreAssessNewV2 coreassesmentstate  2014-12-14 19:31:25.380 2                3 }
        CoreAssessNewV2 coreassescapsafrisk 2014-12-14 19:31:25.380 2                4 }
        CoreAssessNewV2 coreasssumminitplan 2014-12-14 19:31:25.383 2                5 }
        CoreAssessNewV2 coresocial1         2014-12-14 19:31:25.383 2                6 }
        CoreAssessNewV2 kcsahyper           2014-12-14 19:31:25.387 2                7 }
        ^^^             ^^^
        Form groups     RiO forms; these correspond to UserAssess___ tables.

    AssessmentFormsIndex, e.g.
        Name                InUse Style Deleted    Description ...
        core_10             1     6     0          Clinical Outcomes in Routine Evaluation Screening Measure-10 (core-10)
        corealcsub          1     6     0          Alcohol and Substance Misuse
        coreassescapsafrisk 1     6     0          Capacity, Safeguarding and Risk
        coreassesmentstate  1     6     0          Mental State
        coreassessbackhist  1     6     0          Background and History
        coreassesspastpsy   1     6     0          Past Psychiatric History and Physical Health
        coreasspresprob     1     6     0          Presenting Problem
        coreasssumminitplan 1     6     0          Summary and Initial Plan
        corecarer           1     6     0          Carers and Cared For
        corediversity       1     6     0          Diversity Needs
        coremedsum          1     6     0          Medication, Allergies and Adverse Reactions
        coremenhis          1     6     0          Mental Health / Psychiatric History
        coremenstate        1     6     0          Mental State and Formulation
        coreperdev          1     6     0          Personal History and Developmental History
        ^^^                                        ^^^
        |||                                        Nice names.
        |||
        RiO forms; these correspond to UserAssess___ tables,
        e.g. UserAssesscoreassesmentstate

    AssessmentFormsLocks
        system only; not relevant

    AssessmentFormsTimeout
        system only; not relevant

    AssessmentImageForms
        SequenceID, FormName, ClientID, AssessmentDate, UserID, ImagePath
        ?
        no data

    AssessmentIndex, e.g.
        Name          InUse Version DateBound RequiresClientID  Deleted Description ...
        ConsentShare  1     3       1         0                 1       Consent to Share Information
        CoreAssess    1     1       0         1                 0       Core Assessment
        CoreAssess    1     2       0         1                 0       Core Assessment
        CoreAssess    1     3       0         1                 0       Core Assessment
        CoreAssess    1     4       0         1                 0       Core Assessment
        CoreAssess    1     5       0         1                 0       Core Assessment
        CoreAssess    1     6       0         1                 0       Core Assessment
        CoreAssess    1     7       0         1                 0       Core Assessment
        crhtaaucp     1     1       0         0                 0       CRHTT / AAU Care Plan
        ^^^
        These correspond to AssessmentStructure.Assessment

    AssessmentMasterTableIndex, e.g.
        TableName       TableDescription
        core10          core10
        Corealc1        TAUDIT - Q1
        Corealc2        TAUDIT Q2
        Corealc3        TAUDIT - Q3,4,5,6,7,8
        Corealc4        TAUDIT - Q9,10
        Corealc5        Dependence
        Corealc6        Cocaine Use
        CoreOtherAssess Other Assessments
        crhttcpstat     CRHTT Care Plan Status
        ^^^
        These correspond to UserMaster___ tables.
        ... Find with:
            SELECT * FROM rio_data_raw.information_schema.columns
            WHERE table_name LIKE '%core10%';

    AssessmentPseudoForms, e.g. (all rows):
        Name            Link
        CaseNoteBar     ../Letters/LetterEditableMain.aspx?ClientID
        CaseNoteoview   ../Reports/RioReports.asp?ReportID=15587&ClientID
        kcsahyper       tfkcsa
        physv1hypa      physassess16a&readonlymode=1
        physv1hypb1     physasses16b1&readonlymode=1
        physv1hypb2     physasses16b22&readonlymode=1
        physv1hypbody   testbmap&readonlymode=1
        physv1hypvte    vte&readonlymode=1

    AssessmentReadOnlyFields, e.g.
        Code        CodeDescription       SQLStatementLookup    SQLStatementSearch
        ADCAT       Adminstrative Cat...  SELECT TOP 1 u.Cod... ...
        ADD         Client  Address       SELECT '$LookupVal... ...
        AdmCons     Consultant            SELECT '$LookupVal... ...
        AdmglStat   Status at Admission   SELECT '$LookupVal... ...
        AdmitDate   Admission Date        SELECT '$LookupVal... ...
        AEDEXLI     AED Exceptions...     SELECT TOP 1 ISNUL... ...
        Age         Client Age            SELECT '$LookupVal... ...
        Allergies   Client Allergies      SELECT dbo.LocalCo... ...
        bg          Background (PSOC323)  SELECT TOP 1 ISNUL... ...

        That Allergies one in full:
        - SQLStatementLookup
            SELECT dbo.LocalConfig_GetClientAllergies('$key$') AS Allergies
        - SQLStatementSearch = SQLStatementLookup

        And the bg/Background... one:
        - SQLStatementLookup
            SELECT TOP 1
                ISNULL(Men03,'History of Mental Health Problems / Psychiatric History section of core assessment not filled'),
                ISNULL(Men03,'History of Mental Health Problems / Psychiatric History section of core assessment not filled')
            FROM dbo.view_userassesscoremenhis
              -- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
              -- view in which data column names renamed 'Men01', 'Men02'...
            WHERE ClientID = '$ClientID$'
            AND dbo.udf_Config_SystemValidationStatus(system_validationData,'Men03','v') = 1
            ORDER BY
                AssessmentDate DESC,
                type12_UpdatedDate DESC
        - SQLStatementSearch = SQLStatementLookup

        - EXEC sp_helptext 'production.rio62camlive.dbo.udf_Config_SystemValidationStatus';
          ... can't view this at present (am on the wrong machine?).

    AssessmentStructure, e.g.:
        FormGroup       Assessment  AssessmentVersion FormGroupVersion FormGroupOrder
        CoreAssessNewV1 CoreAssess    7    1    1
        CoreAssessNewV2 CoreAssess    7    2    0
        CoreAssessNewV2 CoreAssess    6    1    0
        CoreAssessNewV2 CoreAssess    5    0    0
        CoreAssessNewV2 CoreAssess    2    0    1
        CoreAssessNewV2 CoreAssess    3    0    0

        ... FORM GROUP to ASSESSMENT mapping

- Main data tables

  .. code-block:: none

    e.g.:
    UserAssesscoreassesmentstate
        ClientID
        system_ValidationData  -- e.g. (with newlines added):
            '<v n="3">
                <MentState s="v" a="<userID>" v="" d="" e="10/11/2013 13:23" o="1" n="3" b="" c="">
                </MentState>
            </v>'
            ... where <userID> was a specific user ID
        NHSNum  -- as VARCHAR
        AssessmentDate
        ServRef
        MentState   -- this contains the text
        type12_NoteID -- PK
        type12_OriginalNoteID  -- can be NULL
        type12_DeletedDate  -- can be NULL
        type12_UpdatedBy
        type12_UpdatedDate
        formref

    UserAssesscoreassesspastpsy
        ClientID
        system_ValidationData
        NHSNum
        AssessmentDate
        ServRef
        PastPsyHist  -- contains text
        PhyHealth    -- contains text
        Allergies    -- contains text
        type12_NoteID
        type12_OriginalNoteID
        type12_DeletedDate
        type12_UpdatedBy
        type12_UpdatedDate
        formref
        frailty  -- numeric; in passing, here's the Rockwood frailty score

- Lookup tables

  .. code-block:: none

    UserMasterfrailty, in full:
        Code CodeDescription            Deleted
        1    1 - Very Fit               0
        2    2 - Well                   0
        3    3 - Managing Well          0
        4    4 - Vulnerable             0
        5    5 - Mildly Frail           0
        7    7 - Severely Frail         0
        6    6 - Moderately Frail       0
        9    9 - Terminally Ill         0
        8    8 - Very Serverely Frail   0

- So, overall structure, approximately:

  .. code-block:: none

    RiO front-end example:
        Assessments [on menu]
            -> Core Assessment [menu dropdown]
            -> Core Assessment v2 [LHS, expands to...]
                ->  Presenting Problem [LHS]
                    Past Psychiatric History and Physical Health
                        ->  Service/Team
                            Past Psychiatric History
                            Physical Health / Medical History
                            Allergies
                            Frailty Score
                    Background and History
                    Mental State
                    Capacity, Safeguarding and Risk
                    Summary and Initial Plan
                    Social Circumstances and Employment
                    Keeping Children Safe Assessment

    So, hierarchy at the backend (> forward, < backward keys):

        AssessmentIndex.Name(>) / .Description ('Core Assessment')
            AssessmentStructure.Assessment(<) / .FormGroup(>)
                AssessmentFormGroupsIndex.Name(<) / .Description ('Core Assessment v2')
                AssessmentFormGroupsStructure.name(<) / .FormName(>) ('coreassesspastpsy')
                    AssessmentFormsIndex.Name(<) / .Description ('Past Psychiatric History and Physical Health')
                    UserAssesscoreassesspastpsy = data
                              _________________(<)
                        UserAssesscoreassesspastpsy.frailty(>) [lookup]
                            UserMasterfrailty.Code(<) / .CodeDescription

- Simplifying views (for core and non-core RiO) could be implemented in the
  preprocessor, or after anonymisation.

  Better to do it in the preprocessor, because this knows about RiO.
  The two points of "RiO knowledge" should be:

  - the preprocessor;

    - PK, RiO number as integer, views

  - the ddgen_* information in the anonymiser config file.

    - tables to omit
    - fields to omit
    - default actions on fields

      - e.g. exclude if type12_DeletedDate is None
      - however, we could also do that more efficiently as a view, and that
        suits all use cases so far.


**Scrubbing references to other patients**

There are two ways to do this, in principle.

The first is to reshape the data so that data from "referred-to" patients
appear in fields that can be marked as "third-party". The difficulty is that
the mapping is not 1:1 with any database row. For example, if row A has
fields "MainCarer" and "OtherCarer" that can refer to other patients, then
if the "OtherCarer" field changes, the number of rows to be examined changes.
This prohibits using a real-world PK. (A view that joined according to these
fields would not have an immutable pseudo-PK either.) And that causes
difficulties for a change-detection system. One would have to mark such a view
as something not otherwise read/copied by the anonymiser.

The other method, which is more powerful, is to do this work in the anonymiser
itself, by defining fields that are marked as "third_party_xref_pid", and
building the scrubber recursively with "depth" and "max_depth" parameters;
if depth > 0, the information is taken as third-party.

Well, that sounds achievable.

Done.

**RiO audit trail and change history**

.. code-block:: none

    - AuditTrail
        SequenceID -- PK for AuditTrail
        UserNumber -- FK to GenUser.UserNumber
        ActionDateTime
        AuditAction -- 2 = insert, 3 = update
        RowID -- row number -- how does that work?
            ... cheerfully, SQL Server doesn't have an automatic row ID;
            https://stackoverflow.com/questions/909155/equivalent-of-oracles-rowid-in-sql-server  # noqa
            ... so is it the PK we've already identified and called crate_pk?
        TableNumber -- FK to GenTable.Code
        ClientID -- FK to ClientIndex.ClientID
        ...


"""

import argparse
import logging
from typing import List

# from cardinal_pythonlib.debugging import pdb_run
from cardinal_pythonlib.logs import configure_logger_for_colour
from cardinal_pythonlib.sqlalchemy.schema import (
    get_effective_int_pk_col,
    hack_in_mssql_xml_type,
    make_bigint_autoincrement_column,
)
from rich_argparse import RawDescriptionRichHelpFormatter
from sqlalchemy import create_engine, MetaData
from sqlalchemy.engine.base import Engine
from sqlalchemy.schema import Column, Table
from sqlalchemy.sql.sqltypes import BigInteger, Integer

from crate_anon.anonymise.constants import CHARSET
from crate_anon.common.sql import (
    add_columns,
    add_indexes,
    assert_view_has_same_num_rows,
    create_view,
    drop_columns,
    drop_indexes,
    drop_view,
    ensure_columns_present,
    execute,
    get_column_names,
    get_table_names,
    get_view_names,
    IndexCreationInfo,
    set_print_not_execute,
    sql_fragment_cast_to_int,
    ViewMaker,
)
from crate_anon.preprocess.constants import (
    CRATE_COL_PK,
    DEFAULT_GEOG_COLS,
    ONSPD_TABLE_POSTCODE,
)
from crate_anon.preprocess.rio_constants import (
    CPFT_RCEP_TABLE_FULL_PROGRESS_NOTES,
    CRATE_COL_LAST_DOC,
    CRATE_COL_LAST_NOTE,
    CRATE_COL_MAX_DOCVER,
    CRATE_COL_MAX_SUBNUM,
    CRATE_COL_NHS_NUMBER,
    CRATE_COL_RIO_NUMBER,
    CRATE_IDX_LAST_DOC,
    CRATE_IDX_LAST_NOTE,
    CRATE_IDX_MAX_DOCVER,
    CRATE_IDX_MAX_SUBNUM,
    CRATE_IDX_PK,
    CRATE_IDX_RIONUM,
    CRATE_IDX_RIONUM_NOTENUM,
    CRATE_IDX_RIONUM_SERIALNUM,
    RCEP_COL_NHS_NUMBER,
    RCEP_COL_PATIENT_ID,
    RCEP_COL_POSTCODE,
    RCEP_TABLE_ADDRESS,
    RCEP_TABLE_MASTER_PATIENT,
    RIO_COL_NHS_NUMBER,
    RIO_COL_PATIENT_ID,
    RIO_COL_POSTCODE,
    RIO_TABLE_ADDRESS,
    RIO_TABLE_CLINICAL_DOCUMENTS,
    RIO_TABLE_MASTER_PATIENT,
    RIO_TABLE_PROGRESS_NOTES,
    VIEW_ADDRESS_WITH_GEOGRAPHY,
    VIEW_RCEP_CPFT_PROGRESS_NOTES_CURRENT,
)
from crate_anon.preprocess.rio_ddgen import (
    DDHint,
    get_rio_dd_settings,
)
from crate_anon.preprocess.rio_pk import (
    RIO_6_2_ATYPICAL_PATIENT_ID_COLS,
)
from crate_anon.preprocess.rio_view_func import (
    rio_add_audit_info,
    RioViewConfigOptions,
)
from crate_anon.preprocess.rio_views import RIO_VIEWS

log = logging.getLogger(__name__)


# =============================================================================
# Generic table processors
# =============================================================================


[docs]def table_is_rio_type( tablename: str, configoptions: RioViewConfigOptions ) -> bool: """ Is the named table one that uses the original RiO format? Args: tablename: name of the table configoptions: instance of :class:`RioViewConfigOptions` """ if configoptions.rio: return True if not configoptions.cpft: return False # RCEP + CPFT modifications: there's one RiO table in the mix return tablename == configoptions.full_prognotes_table
[docs]def get_rio_patient_id_col(table: Table) -> str: """ Returns the RiO patient ID column for a table. Args: table: SQLAlchemy Table Returns: the column name for patient ID """ patient_id_col = RIO_6_2_ATYPICAL_PATIENT_ID_COLS.get( table.name, RIO_COL_PATIENT_ID ) # log.debug(f"get_rio_patient_id_col: {table.name} -> {patient_id_col}") return patient_id_col
[docs]def process_patient_table( table: Table, engine: Engine, configoptions: RioViewConfigOptions ) -> None: """ Processes a RiO or RiO-like table: - Add ``pk`` and ``rio_number`` columns, if not present - Update ``pk`` and ``rio_number`` values, if not NULL - Add indexes, if absent Args: table: SQLAlchemy Table to process engine: an SQLAlchemy database Engine configoptions: instance of :class:`RioViewConfigOptions` """ log.info(f"Preprocessing patient table: {table.name!r}") rio_type = table_is_rio_type(table.name, configoptions) if rio_type: pk_col = get_effective_int_pk_col(table) rio_pk = pk_col if pk_col != CRATE_COL_PK else None string_pt_id = get_rio_patient_id_col(table) required_cols = [string_pt_id] else: # RCEP type rio_pk = None required_cols = [RCEP_COL_PATIENT_ID] string_pt_id = RCEP_COL_PATIENT_ID if not configoptions.print_sql_only: required_cols.extend([CRATE_COL_PK, CRATE_COL_RIO_NUMBER]) # ------------------------------------------------------------------------- # Add pk and rio_number columns, if not present # ------------------------------------------------------------------------- if rio_type and rio_pk is not None: crate_pk_col = Column(CRATE_COL_PK, BigInteger, nullable=True) # ... can't do NOT NULL; need to populate it required_cols.append(rio_pk) else: # RCEP type, or no PK in RiO crate_pk_col = make_bigint_autoincrement_column( CRATE_COL_PK, engine.dialect ) # ... autopopulates crate_rio_number_col = Column( CRATE_COL_RIO_NUMBER, BigInteger, nullable=True ) # ... even if RiO numbers are INT, they come from VARCHAR(15) here, and # that can (aod does) look numeric and overflow an INT. # SQL Server requires Table-bound columns in order to generate DDL: table.append_column(crate_pk_col) table.append_column(crate_rio_number_col) add_columns(engine, table, [crate_pk_col, crate_rio_number_col]) # ------------------------------------------------------------------------- # Update pk and rio_number values, if not NULL # ------------------------------------------------------------------------- ensure_columns_present( engine, tablename=table.name, column_names=required_cols ) cast_id_to_int = sql_fragment_cast_to_int( string_pt_id, dialect=engine.dialect ) if rio_type and rio_pk: log.info( f"Table {table.name!r}: updating columns {CRATE_COL_PK!r} " f"and {CRATE_COL_RIO_NUMBER!r}" ) execute( engine, f""" UPDATE {table.name} SET {CRATE_COL_PK} = {rio_pk}, {CRATE_COL_RIO_NUMBER} = {cast_id_to_int} WHERE {CRATE_COL_PK} IS NULL OR {CRATE_COL_RIO_NUMBER} IS NULL """, ) else: # RCEP format, or RiO with no PK # crate_pk is autogenerated as an INT IDENTITY field log.info( f"Table {table.name!r}: " f"updating column {CRATE_COL_RIO_NUMBER!r}" ) execute( engine, f""" UPDATE {table.name} SET {CRATE_COL_RIO_NUMBER} = {cast_id_to_int} WHERE {CRATE_COL_RIO_NUMBER} IS NULL """, ) # ------------------------------------------------------------------------- # Add indexes, if absent # ------------------------------------------------------------------------- # Note that the indexes are unlikely to speed up the WHERE NOT NULL search # above, so it doesn't matter that we add these last. Their use is for # the subsequent CRATE anonymisation table scans. add_indexes( engine, table, [ IndexCreationInfo( index_name=CRATE_IDX_PK, column=CRATE_COL_PK, unique=True ), IndexCreationInfo( index_name=CRATE_IDX_RIONUM, column=CRATE_COL_RIO_NUMBER, ), ], )
[docs]def drop_for_patient_table(table: Table, engine: Engine) -> None: """ Drop CRATE indexes and CRATE columns for a patient table. Args: table: SQLAlchemy Table to process engine: an SQLAlchemy database Engine """ drop_indexes(engine, table, [CRATE_IDX_PK, CRATE_IDX_RIONUM]) drop_columns(engine, table, [CRATE_COL_PK, CRATE_COL_RIO_NUMBER])
[docs]def process_nonpatient_table( table: Table, engine: Engine, configoptions: RioViewConfigOptions ) -> None: """ Process a RiO or RiO-like non-patient table: - ensure it has an integer PK - add indexes Args: table: SQLAlchemy Table to process engine: an SQLAlchemy database Engine configoptions: instance of :class:`RioViewConfigOptions` """ if configoptions.rcep: return log.info(f"Preprocessing non-patient table {table.name!r}") pk_col = get_effective_int_pk_col(table) other_pk_col = pk_col if pk_col != CRATE_COL_PK else None if other_pk_col: # table has a primary key already crate_pk_col = Column(CRATE_COL_PK, BigInteger, nullable=True) else: crate_pk_col = make_bigint_autoincrement_column( CRATE_COL_PK, engine.dialect ) table.append_column(crate_pk_col) # must be Table-bound, as above add_columns(engine, table, [crate_pk_col]) if not configoptions.print_sql_only: ensure_columns_present( engine, tablename=table.name, column_names=[CRATE_COL_PK] ) if other_pk_col: execute( engine, f""" UPDATE {table.name} SET {CRATE_COL_PK} = {other_pk_col} WHERE {CRATE_COL_PK} IS NULL """, ) add_indexes( engine, table, [ IndexCreationInfo( index_name=CRATE_IDX_PK, column=CRATE_COL_PK, unique=True ) ], )
[docs]def drop_for_nonpatient_table(table: Table, engine: Engine) -> None: """ Drop CRATE indexes and CRATE columns for a non-patient table. Args: table: SQLAlchemy Table to process engine: an SQLAlchemy database Engine """ drop_indexes(engine, table, [CRATE_IDX_PK]) drop_columns(engine, table, [CRATE_COL_PK])
# ============================================================================= # Specific table processors # =============================================================================
[docs]def process_master_patient_table( table: Table, engine: Engine, configoptions: RioViewConfigOptions ) -> None: """ Process a RiO master patient table: - Add an integer version of the NHS number. Args: table: SQLAlchemy Table to process engine: an SQLAlchemy database Engine configoptions: instance of :class:`RioViewConfigOptions` """ crate_col_nhs_number = Column( CRATE_COL_NHS_NUMBER, BigInteger, nullable=True ) table.append_column(crate_col_nhs_number) add_columns(engine, table, [crate_col_nhs_number]) if configoptions.rcep: nhscol = RCEP_COL_NHS_NUMBER else: nhscol = RIO_COL_NHS_NUMBER log.info(f"Table {table.name!r}: updating column {nhscol!r}") ensure_columns_present(engine, tablename=table.name, column_names=[nhscol]) if not configoptions.print_sql_only: ensure_columns_present( engine, tablename=table.name, column_names=[CRATE_COL_NHS_NUMBER] ) execute( engine, f""" UPDATE {table.name} SET {CRATE_COL_NHS_NUMBER} = CAST({nhscol} AS BIGINT) WHERE {CRATE_COL_NHS_NUMBER} IS NULL """, )
[docs]def drop_for_master_patient_table(table: Table, engine: Engine) -> None: """ Drop CRATE columns for the RiO master patient table. Args: table: SQLAlchemy Table to process engine: an SQLAlchemy database Engine """ drop_columns(engine, table, [CRATE_COL_NHS_NUMBER])
[docs]def process_progress_notes( table: Table, engine: Engine, configoptions: RioViewConfigOptions ) -> None: """ Process the RiO Progress Notes table. - Index by patient ID/note number. - Add/calculate/index the ``crate_max_subnum_for_notenum`` column. - Add/calculate/index the ``crate_last_note_in_edit_chain`` column. - If on an RCEP database, create a view. Args: table: SQLAlchemy Table to process engine: an SQLAlchemy database Engine configoptions: instance of :class:`RioViewConfigOptions` """ crate_col_max_subnum = Column(CRATE_COL_MAX_SUBNUM, Integer, nullable=True) crate_col_last_note = Column(CRATE_COL_LAST_NOTE, Integer, nullable=True) table.append_column(crate_col_max_subnum) table.append_column(crate_col_last_note) add_columns(engine, table, [crate_col_max_subnum, crate_col_last_note]) # We're always in "RiO land", not "RCEP land", for this one. add_indexes( engine, table, [ IndexCreationInfo( # Joint index, for JOIN in UPDATE statement below index_name=CRATE_IDX_RIONUM_NOTENUM, column=f"{CRATE_COL_RIO_NUMBER}, NoteNum", ), IndexCreationInfo( # Speeds up WHERE below. (Much, much faster for second run.) index_name=CRATE_IDX_MAX_SUBNUM, column=CRATE_COL_MAX_SUBNUM, ), IndexCreationInfo( # Speeds up WHERE below. (Much, much faster for second run.) index_name=CRATE_IDX_LAST_NOTE, column=CRATE_COL_LAST_NOTE, ), ], ) ensure_columns_present( engine, tablename=table.name, column_names=["NoteNum", "SubNum", "EnteredInError", "EnteredInError"], ) if not configoptions.print_sql_only: ensure_columns_present( engine, tablename=table.name, column_names=[ CRATE_COL_MAX_SUBNUM, CRATE_COL_LAST_NOTE, CRATE_COL_RIO_NUMBER, ], ) # Find the maximum SubNum for each note, and store it. # Slow query, even with index. log.info( f"Progress notes table {table.name!r}: " f"updating {CRATE_COL_MAX_SUBNUM!r}" ) execute( engine, f""" UPDATE p1 SET p1.{CRATE_COL_MAX_SUBNUM} = subq.max_subnum FROM {table.name} p1 JOIN ( SELECT {CRATE_COL_RIO_NUMBER}, NoteNum, MAX(SubNum) AS max_subnum FROM {table.name} p2 GROUP BY {CRATE_COL_RIO_NUMBER}, NoteNum ) subq ON subq.{CRATE_COL_RIO_NUMBER} = p1.{CRATE_COL_RIO_NUMBER} AND subq.NoteNum = p1.NoteNum WHERE p1.{CRATE_COL_MAX_SUBNUM} IS NULL """, ) # Set a single column accordingly log.info( f"Progress notes table {table.name!r}: " f"updating {CRATE_COL_LAST_NOTE!r}" ) execute( engine, f""" UPDATE {table.name} SET {CRATE_COL_LAST_NOTE} = CASE WHEN SubNum = {CRATE_COL_MAX_SUBNUM} THEN 1 ELSE 0 END WHERE {CRATE_COL_LAST_NOTE} IS NULL """, ) # Create a view, if we're on an RCEP database if configoptions.rcep and configoptions.cpft: select_sql = f""" SELECT * FROM {table.name} WHERE (EnteredInError <> 1 OR EnteredInError IS NULL) AND {CRATE_COL_LAST_NOTE} = 1 """ create_view(engine, VIEW_RCEP_CPFT_PROGRESS_NOTES_CURRENT, select_sql)
[docs]def drop_for_progress_notes(table: Table, engine: Engine) -> None: """ Reverses the changes made by :func:`process_progress_notes` to the RiO Progress Note table. Args: table: SQLAlchemy Table to process engine: an SQLAlchemy database Engine """ drop_view(engine, VIEW_RCEP_CPFT_PROGRESS_NOTES_CURRENT) drop_indexes( engine, table, [CRATE_IDX_RIONUM_NOTENUM, CRATE_IDX_MAX_SUBNUM, CRATE_IDX_LAST_NOTE], ) drop_columns(engine, table, [CRATE_COL_MAX_SUBNUM, CRATE_COL_LAST_NOTE])
[docs]def process_clindocs_table( table: Table, engine: Engine, configoptions: RioViewConfigOptions ) -> None: """ Process the RiO (not RCEP) Clinical Documents Table. - For RiO only, not RCEP. - Index on document serial number. - Add/calculate/index the ``crate_max_docver_for_doc`` column. - Add/calculate/index the ``crate_last_doc_in_chain`` column. Args: table: SQLAlchemy Table to process engine: an SQLAlchemy database Engine configoptions: instance of :class:`RioViewConfigOptions` """ crate_col_max_docver = Column(CRATE_COL_MAX_DOCVER, Integer, nullable=True) crate_col_last_doc = Column(CRATE_COL_LAST_DOC, Integer, nullable=True) table.append_column(crate_col_max_docver) table.append_column(crate_col_last_doc) add_columns(engine, table, [crate_col_max_docver, crate_col_last_doc]) add_indexes( engine, table, [ IndexCreationInfo( index_name=CRATE_IDX_RIONUM_SERIALNUM, column=[CRATE_COL_RIO_NUMBER, "SerialNumber"], ), IndexCreationInfo( index_name=CRATE_IDX_MAX_DOCVER, column=CRATE_COL_MAX_DOCVER, ), IndexCreationInfo( index_name=CRATE_IDX_LAST_DOC, column=CRATE_COL_LAST_DOC, ), ], ) required_cols = ["SerialNumber", "RevisionID"] if not configoptions.print_sql_only: required_cols.extend( [CRATE_COL_MAX_DOCVER, CRATE_COL_LAST_DOC, CRATE_COL_RIO_NUMBER] ) ensure_columns_present( engine, tablename=table.name, column_names=required_cols ) # Find the maximum SerialNumber for each note, and store it. # Slow query, even with index. log.info( f"Clinical documents table {table.name!r}: " f"updating {CRATE_COL_MAX_DOCVER!r}" ) execute( engine, f""" UPDATE p1 SET p1.{CRATE_COL_MAX_DOCVER} = subq.max_docver FROM {table.name} p1 JOIN ( SELECT {CRATE_COL_RIO_NUMBER}, SerialNumber, MAX(RevisionID) AS max_docver FROM {table.name} p2 GROUP BY {CRATE_COL_RIO_NUMBER}, SerialNumber ) subq ON subq.{CRATE_COL_RIO_NUMBER} = p1.{CRATE_COL_RIO_NUMBER} AND subq.SerialNumber = p1.SerialNumber WHERE p1.{CRATE_COL_MAX_DOCVER} IS NULL """, ) # Set a single column accordingly log.info( f"Clinical documents table {table.name!r}: " f"updating {CRATE_COL_LAST_DOC!r}" ) execute( engine, f""" UPDATE {table.name} SET {CRATE_COL_LAST_DOC} = CASE WHEN RevisionID = {CRATE_COL_MAX_DOCVER} THEN 1 ELSE 0 END WHERE {CRATE_COL_LAST_DOC} IS NULL """, )
[docs]def drop_for_clindocs_table(table: Table, engine: Engine) -> None: """ Reverses the changes made by :func:`process_clindocs_table` to the RiO Clinical Documents table. Args: table: SQLAlchemy Table to process engine: an SQLAlchemy database Engine """ drop_indexes( engine, table, [CRATE_IDX_RIONUM_SERIALNUM, CRATE_IDX_MAX_DOCVER, CRATE_IDX_LAST_DOC], ) drop_columns(engine, table, [CRATE_COL_MAX_DOCVER, CRATE_COL_LAST_DOC])
# ============================================================================= # RiO views # =============================================================================
[docs]def get_rio_views( engine: Engine, configoptions: RioViewConfigOptions, ddhint: DDHint, suppress_basetables: bool = True, suppress_lookup: bool = True, ) -> List[ViewMaker]: """ Gets all view definitions for RiO. Args: engine: an SQLAlchemy database Engine configoptions: instance of :class:`RioViewConfigOptions` ddhint: a :class:`crate_anon/preprocess/ddhint.DDHint`, which will be modified suppress_basetables: suppress (for data-dictionary generating hints) the RiO base tables for views that are made on them? suppress_lookup: suppress (for data-dictionary generating hints) lookup tables whose information has been linked into views? Returns: a list of :class:`crate_anon.common.sql.ViewMaker` objects """ views = [] # type: List[ViewMaker] all_tables_lower = get_table_names(engine, to_lower=True) all_views_lower = get_view_names(engine, to_lower=True) all_selectables_lower = list(set(all_tables_lower + all_views_lower)) for viewname, viewdetails in RIO_VIEWS.items(): basetable = viewdetails["basetable"] if basetable.lower() not in all_selectables_lower: log.warning( f"Skipping view {viewname} as base table/view " f"{basetable} not present" ) continue suppress_basetable = viewdetails.get( "suppress_basetable", suppress_basetables ) suppress_other_tables = viewdetails.get("suppress_other_tables", []) if suppress_basetable: ddhint.suppress_table(basetable) ddhint.suppress_tables(suppress_other_tables) rename = viewdetails.get("rename", None) enforce_same_n_rows_as_base = viewdetails.get( "enforce_same_n_rows_as_base", True ) # noinspection PyTypeChecker viewmaker = ViewMaker( viewname=viewname, engine=engine, basetable=basetable, rename=rename, userobj=configoptions, enforce_same_n_rows_as_base=enforce_same_n_rows_as_base, ) if "add" in viewdetails: for addition in viewdetails["add"]: func = addition["function"] kwargs = addition.get("kwargs", {}) kwargs["viewmaker"] = viewmaker func(**kwargs) # will alter viewmaker if configoptions.audit_info: rio_add_audit_info(viewmaker) # will alter viewmaker if suppress_lookup: ddhint.suppress_tables(viewmaker.get_lookup_tables()) ddhint.add_bulk_source_index_request( viewmaker.get_index_request_dict() ) views.append(viewmaker) return views
[docs]def create_rio_views( engine: Engine, metadata: MetaData, configoptions: RioViewConfigOptions, ddhint: DDHint, ) -> None: """ Creates all views on a RiO/RCEP database. Args: engine: an SQLAlchemy Engine metadata: SQLAlchemy MetaData containing reflected details of database configoptions: an instance of :class:`RioViewConfigOptions` ddhint: a :class:`crate_anon/preprocess/ddhint.DDHint`, which will be modified """ rio_views = get_rio_views(engine, configoptions, ddhint) for viewmaker in rio_views: viewmaker.create_view(engine) ddhint.create_indexes(engine, metadata)
[docs]def drop_rio_views( engine: Engine, metadata: MetaData, configoptions: RioViewConfigOptions, ddhint: DDHint, ) -> None: # ddhint modified """ Drops all views on a RiO/RCEP database. Args: engine: an SQLAlchemy Engine metadata: SQLAlchemy MetaData containing reflected details of database configoptions: an instance of :class:`RioViewConfigOptions` ddhint: a :class:`crate_anon/preprocess/ddhint.DDHint`, which will be modified """ rio_views = get_rio_views(engine, configoptions, ddhint) ddhint.drop_indexes(engine, metadata) for viewmaker in rio_views: viewmaker.drop_view(engine)
# ============================================================================= # Geography views # =============================================================================
[docs]def add_postcode_geography_view( engine: Engine, configoptions: RioViewConfigOptions, ddhint: DDHint ) -> None: """ Creates a RiO source view to add geography columns to an address table including postcodes, linking in e.g. LSOA/IMD information from an ONS postcode table (e.g. imported by CRATE; see postcodes.py). Args: engine: an SQLAlchemy Engine configoptions: an instance of :class:`RioViewConfigOptions` ddhint: a :class:`crate_anon/preprocess/ddhint.DDHint`, which will be modified """ # Re-read column names, as we may have inserted some recently by hand that # may not be in the initial metadata. if configoptions.rio: addresstable = RIO_TABLE_ADDRESS rio_postcodecol = RIO_COL_POSTCODE else: addresstable = RCEP_TABLE_ADDRESS rio_postcodecol = RCEP_COL_POSTCODE orig_column_names = get_column_names( engine, tablename=addresstable, sort=True ) # Remove any original column names being overridden by new ones. # (Could also do this the other way around!) geogcols_lowercase = [x.lower() for x in configoptions.geogcols] orig_column_names = [ x for x in orig_column_names if x.lower() not in geogcols_lowercase ] orig_column_specs = [f"{addresstable}.{col}" for col in orig_column_names] geog_col_specs = [ f"{configoptions.postcodedb}.{ONSPD_TABLE_POSTCODE}.{col}" for col in sorted(configoptions.geogcols, key=lambda x: x.lower()) ] overlap = set(orig_column_names) & set(configoptions.geogcols) if overlap: raise ValueError( f"Columns overlap: address table contains columns " f"{orig_column_names}; geogcols = {configoptions.geogcols}; " f"overlap = {overlap}" ) ensure_columns_present( engine, tablename=addresstable, column_names=[rio_postcodecol] ) colsep = ",\n " select_sql = """ SELECT {origcols}, {geogcols} FROM {addresstable} LEFT JOIN {pdb}.{pcdtab} ON {addresstable}.{rio_postcodecol} = {pdb}.{pcdtab}.pcds -- RCEP, and presumably RiO, appear to use the ONS pcds format, of -- 2-4 char outward code; space; 3-char inward code. -- If this fails, use this slower version: -- ON REPLACE({addresstable}.{rio_postcodecol}, -- ' ', -- '') = {pdb}.{pcdtab}.pcd_nospace """.format( addresstable=addresstable, origcols=colsep.join(orig_column_specs), geogcols=colsep.join(geog_col_specs), pdb=configoptions.postcodedb, pcdtab=ONSPD_TABLE_POSTCODE, rio_postcodecol=rio_postcodecol, ) create_view(engine, VIEW_ADDRESS_WITH_GEOGRAPHY, select_sql) assert_view_has_same_num_rows( engine, addresstable, VIEW_ADDRESS_WITH_GEOGRAPHY ) ddhint.suppress_table(addresstable)
# ============================================================================= # Table action selector # =============================================================================
[docs]def process_table( table: Table, engine: Engine, configoptions: RioViewConfigOptions ) -> None: """ Process a RiO-like table; the specific action is selected by the database type and there are custom processors for some tables (e.g. master patient table, progress notes, clinical documents). Args: table: SQLAlchemy Table to process engine: an SQLAlchemy database Engine configoptions: instance of :class:`RioViewConfigOptions` """ tablename = table.name column_names = table.columns.keys() log.debug(f"TABLE: {tablename}; COLUMNS: {column_names}") if configoptions.rio: patient_table_indicator_column = get_rio_patient_id_col(table) else: # RCEP: patient_table_indicator_column = RCEP_COL_PATIENT_ID is_patient_table = ( patient_table_indicator_column in column_names or tablename == configoptions.full_prognotes_table ) # ... special for RCEP/CPFT, where a RiO table (with different patient ID # column) lives within an RCEP database. if configoptions.drop_not_create: # --------------------------------------------------------------------- # DROP STUFF! Opposite order to creation (below) # --------------------------------------------------------------------- # Specific if tablename == configoptions.master_patient_table: drop_for_master_patient_table(table, engine) elif tablename == configoptions.full_prognotes_table: drop_for_progress_notes(table, engine) elif configoptions.rio and tablename == RIO_TABLE_CLINICAL_DOCUMENTS: drop_for_clindocs_table(table, engine) # Generic if is_patient_table: drop_for_patient_table(table, engine) else: drop_for_nonpatient_table(table, engine) else: # --------------------------------------------------------------------- # CREATE STUFF! # --------------------------------------------------------------------- # Generic if is_patient_table: process_patient_table(table, engine, configoptions) else: process_nonpatient_table(table, engine, configoptions) # Specific if tablename == configoptions.master_patient_table: process_master_patient_table(table, engine, configoptions) elif configoptions.rio and tablename == RIO_TABLE_CLINICAL_DOCUMENTS: process_clindocs_table(table, engine, configoptions) elif tablename == configoptions.full_prognotes_table: process_progress_notes(table, engine, configoptions)
[docs]def process_all_tables( engine: Engine, metadata: MetaData, configoptions: RioViewConfigOptions ) -> None: """ Process all RiO-like tables via :func:`process_table`. Args: metadata: SQLAlchemy MetaData containing reflected details of database engine: an SQLAlchemy database Engine configoptions: instance of :class:`RioViewConfigOptions` """ for table in sorted( metadata.tables.values(), key=lambda t: t.name.lower() ): process_table(table, engine, configoptions)
# ============================================================================= # Main # =============================================================================
[docs]def main() -> None: """ Command-line parser. See command-line help. """ # noinspection PyTypeChecker parser = argparse.ArgumentParser( formatter_class=RawDescriptionRichHelpFormatter, description=r""" * Alters a RiO database to be suitable for CRATE. * By default, this treats the source database as being a copy of a RiO database (slightly later than version 6.2; exact version unclear). Use the "--rcep" (+/- "--cpft") switch(es) to treat it as a Servelec RiO CRIS Extract Program (RCEP) v2 output database. """, ) parser.add_argument("--url", required=True, help="SQLAlchemy database URL") parser.add_argument("-v", "--verbose", action="store_true", help="Verbose") parser.add_argument( "--print", action="store_true", help="Print SQL but do not execute it. (You can redirect the printed " "output to create an SQL script.)", ) parser.add_argument("--echo", action="store_true", help="Echo SQL") parser.add_argument( "--rcep", action="store_true", help="Treat the source database as the product of Servelec's RiO CRIS " "Extract Program v2 (instead of raw RiO)", ) parser.add_argument( "--drop_danger_drop", action="store_true", help="REMOVES new columns and indexes, rather than creating them. " "(There's not very much danger; no real information is lost, but " "it might take a while to recalculate it.)", ) parser.add_argument( "--cpft", action="store_true", help="Apply hacks for Cambridgeshire & Peterborough NHS Foundation " "Trust (CPFT) RCEP database. Only applicable with --rcep", ) parser.add_argument( "--debug_skiptables", action="store_true", help="DEBUG-ONLY OPTION. Skip tables (view creation only)", ) prog_curr_group = parser.add_mutually_exclusive_group() prog_curr_group.add_argument( "--prognotes_current_only", dest="prognotes_current_only", action="store_true", help="Progress_Notes view restricted to current versions only " "(* default)", ) prog_curr_group.add_argument( "--prognotes_all", dest="prognotes_current_only", action="store_false", help="Progress_Notes view shows old versions too", ) parser.set_defaults(prognotes_current_only=True) clindocs_curr_group = parser.add_mutually_exclusive_group() clindocs_curr_group.add_argument( "--clindocs_current_only", dest="clindocs_current_only", action="store_true", help="Clinical_Documents view restricted to current versions only (*)", ) clindocs_curr_group.add_argument( "--clindocs_all", dest="clindocs_current_only", action="store_false", help="Clinical_Documents view shows old versions too", ) parser.set_defaults(clindocs_current_only=True) allerg_curr_group = parser.add_mutually_exclusive_group() allerg_curr_group.add_argument( "--allergies_current_only", dest="allergies_current_only", action="store_true", help="Client_Allergies view restricted to current info only", ) allerg_curr_group.add_argument( "--allergies_all", dest="allergies_current_only", action="store_false", help="Client_Allergies view shows deleted allergies too (*)", ) parser.set_defaults(allergies_current_only=False) audit_group = parser.add_mutually_exclusive_group() audit_group.add_argument( "--audit_info", dest="audit_info", action="store_true", help="Audit information (creation/update times) added to views", ) audit_group.add_argument( "--no_audit_info", dest="audit_info", action="store_false", help="No audit information added (*)", ) parser.set_defaults(audit_info=False) parser.add_argument( "--postcodedb", help="Specify database (schema) name for ONS Postcode Database (as " "imported by CRATE) to link to addresses as a view. With SQL " "Server, you will have to specify the schema as well as the " 'database; e.g. "--postcodedb ONS_PD.dbo"', ) parser.add_argument( "--geogcols", nargs="*", default=DEFAULT_GEOG_COLS, help=f"List of geographical information columns to link in from ONS " f"Postcode Database. BEWARE that you do not specify anything too " f"identifying. Default: {' '.join(DEFAULT_GEOG_COLS)}", ) parser.add_argument( "--settings_filename", help="Specify filename to write draft ddgen_* settings to, for use in " "a CRATE anonymiser configuration file.", ) progargs = parser.parse_args() rootlogger = logging.getLogger() configure_logger_for_colour( rootlogger, level=logging.DEBUG if progargs.verbose else logging.INFO ) rio = not progargs.rcep if progargs.rcep: # RCEP master_patient_table = RCEP_TABLE_MASTER_PATIENT if progargs.cpft: full_prognotes_table = CPFT_RCEP_TABLE_FULL_PROGRESS_NOTES # We (CPFT) may have a hacked-in copy of the RiO main progress # notes table added to the RCEP output database. else: full_prognotes_table = None # The RCEP does not export sufficient information to distinguish # current and non-current versions of progress notes. else: # RiO master_patient_table = RIO_TABLE_MASTER_PATIENT full_prognotes_table = RIO_TABLE_PROGRESS_NOTES log.info( "CRATE in-place preprocessor for RiO or RiO CRIS Extract Program " "(RCEP) databases" ) safeargs = {k: v for k, v in vars(progargs).items() if k != "url"} log.debug(f"args (except url): {safeargs!r}") log.info("RiO mode" if rio else "RCEP mode") if progargs.postcodedb and not progargs.geogcols: raise ValueError( "If you specify postcodedb, you must specify some geogcols" ) set_print_not_execute(progargs.print) hack_in_mssql_xml_type() engine = create_engine(progargs.url, echo=progargs.echo, encoding=CHARSET) metadata = MetaData() metadata.bind = engine log.info(f"Database: {engine.url!r}") # ... repr (!r) hides p/w log.debug(f"Dialect: {engine.dialect.name}") log.info("Reflecting (inspecting) database...") metadata.reflect(engine) log.info("... inspection complete") configoptions = RioViewConfigOptions( rio=rio, rcep=progargs.rcep, cpft=progargs.cpft, print_sql_only=progargs.print, drop_not_create=progargs.drop_danger_drop, master_patient_table=master_patient_table, full_prognotes_table=full_prognotes_table, prognotes_current_only=progargs.prognotes_current_only, clindocs_current_only=progargs.clindocs_current_only, allergies_current_only=progargs.allergies_current_only, audit_info=progargs.audit_info, postcodedb=progargs.postcodedb, geogcols=progargs.geogcols, ) ddhint = DDHint() if progargs.drop_danger_drop: # Drop views (and view-induced table indexes) first if rio: drop_rio_views(engine, metadata, configoptions, ddhint) drop_view(engine, VIEW_ADDRESS_WITH_GEOGRAPHY) if not progargs.debug_skiptables: process_all_tables(engine, metadata, configoptions) else: # Tables first, then views if not progargs.debug_skiptables: process_all_tables(engine, metadata, configoptions) if progargs.postcodedb: add_postcode_geography_view(engine, configoptions, ddhint) if rio: create_rio_views(engine, metadata, configoptions, ddhint) if progargs.settings_filename: with open(progargs.settings_filename, "w") as f: print(get_rio_dd_settings(ddhint), file=f)
if __name__ == "__main__": # pdb_run(main) main()