12.7.14. crate_anon.preprocess.systmone_ddgen
crate_anon/preprocess/systmone_ddgen.py
Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).
This file is part of CRATE.
CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.
Generate a CRATE data dictionary for SystmOne data.
Notes
SystmOne is a general-purpose electronic health record (EHR) system from TPP (The Phoenix Partnership): https://tpp-uk.com/products/.
It’s widely used in general practice (GP), and in Cambridgeshire/Peterborough, ~80% of GP surgeries use it (2018 data, https://pubmed.ncbi.nlm.nih.gov/29490968/, Figure 2).
Cambridgeshire & Peterborough NHS Foundation Trust (CPFT) used to use SystmOne for community services, and then moved nearly all the rest of its services to SystmOne (from RiO, in the case of mental health services): Children’s Directorate (12 Oct 2020), Community Hospital wards (30 Nov 2020), the rest of the Older People, Adults, and Community Directorate (7 Dec 2020), and finally the Adult and Specialist Directorate (14 Jun 2021).
SystmOne is centrally hosted by TPP.
TPP provide a nightly “Strategic Reporting extract” (SRE) of SystmOne data.
Its primary coding mechanisms are (1) CTV3 (Read) codes, and (2) SNOMED codes (see e.g. https://termbrowser.nhs.uk/) – the latter are gradually taking over (as of 2021). Coded values can be numeric. For example, one entry might include:
SNOMED code 718087004
SNOMED text “QRISK2 cardiovascular disease 10 year risk score”
CTV3 code “XaQVY”
CTV3 text “QRISK2 cardiovascular disease 10 year risk score”
Numeric unit “%”
Numeric value 10.4
SystmOne collects data mostly via “templates” and “questionnaires”. Templates are perhaps closer to the heart of SystmOne (e.g. better presented in the long-form journal view) and values entered into templates are (always?) coded. Questionnaires are more free-form. Both can have free text attached to coded values.
12.7.14.1. Strategic Reporting extract
SpecificationDirectory.zip (e.g. 2021-02-18) contains e.g. Specification
v123.csv, which is a full description of the SRE. Principles:
All these tables start
SR, e.g.SR18WeekWait,SRAAndEAttendance.Columns in that spreadsheet are:
TableName TableDescription ColumnName ColumnDescription ColumnDataType -- possible values include: Boolean Date Date and Time Numeric - Integer Numeric - Real Text - Fixed Text - Variable ColumnLength -- possible values include: empty (e.g. boolean, date, date/time) 8 for integer 4 for real the VARCHAR length -- for both "variable" and "fixed" text types DateDefining ColumnOrdinal -- sequence number of column within table LinkedTable } LinkedColumn1 }-+ LinkedColumn2 } | +-- e.g. SROrganisation, ID SRStaffMember, RowIdentifier SRPatient, RowIdentifier, IDOrganisationVisibleToTo get a table list:
# Poor for CSVs with newlines within their strings: tail -n+2 "Specification v123.csv" | cut -d, -f1 | sort | uniq # Much better: python3 -c 'import csv; print("\n".join(row[0] for i, row in enumerate(csv.reader(open("Specification v123.csv"))) if i > 0))' | sort | uniq
Tables and their descriptions:
import csv s = set() for i, row in enumerate(csv.reader(open("Specification v123.csv"))): if i > 0: s.add(f"{row[0]} - {row[1]}") print("\n".join((x for x in sorted(s))))
Translating that to a single line: https://www.python.org/dev/peps/pep-0289/ … meh, hard.
SRPatientlooks to be the master patient table here – including names, dates of birth/death, NHS number.Tpp Strategic Reporting Table Specification v123.rtfcontains a nicer version of (exactly?) the same information.Strategic Reporting downloads can be configured. Options include:
Whether to include the shared record. (I’m not sure if that means a national thing or data from SystmOne that each patient may have consented to sharing “‘out’ from another organization, then ‘in’ to mine”.)
When a download is set up, the recipient gets one CSV file per table selected, such as
SRPatient.csvfor theSRPatienttable, plus some ever-present system tables:SRManifest.csv, describing what you’ve received;SRMapping.csvandSRMappingGroup.csv, providing text for built-inlists.
The date format is e.g. “29 Sep 2011 14:53:28”. Unknown times are marked as “00:00:00”. Unknown dates give an empty string. Boolean values are
TRUEorFALSE.
12.7.14.2. Free-text data
The SRE does not contain free text data or binary documents by default. For some Trusts, an augmented SRE is provided also, with that information.
From FreeText Model.xlsx, 2021-04-15, some of this data comes in the
following format:
Field Name Type Description
RowIdentifier bigint The unique identifier of the
record
IDPatient bigint Links to patient ID in
demographics
IDReferralIn bigint ID of referral
IDEvent bigint Links to activity event ID
Question varchar(MAX) The questionnaire question
[FreeText] varchar(MAX) The answer given to the above
question
EventDate datetime The data/time of the
questionnaire
SRTable varchar(100) Which SR table the record
relates to
IDSRTable bigint The ID of the above table
QuestionnaireName varchar(255) The name of the questionnaire
IDAnsweredQuestionnaire bigint The ID of the above
questionnaire
QuestionnaireVersionNumber int The version number of the above
questionnaire
IDOrganisation bigint Organisation ID of the
questionnaire record
CPFTGroup int Group (directorate)
Directorate varchar(50) Directorate name
TeamName varchar(100) Name of team linked to the
referral
IsMentalHealth int Mental or physical health
Imported date Date imported to the database
(SR = Strategic Reporting.)
Specimen values:
SRTable: ‘SRAnsweredQuestionnaire’
IDSRTable: this varies for rows with SRTable = ‘SRAnsweredQuestionnaire’, so I think it’s the PK within the table indicated by SRTable.
QuestionnaireName = ‘CPFT Risk Assessment’
IDAnsweredQuestionnaire = this is unique for rows with QuestionnaireName = ‘CPFT Risk Assessment’, so I think it’s the ID of the Questionnaire, and is probably a typo.
(This ends up (in our environment) in the S1_FreeText table, as below, so it likely arrives as SRFreeText.)
However, note that RowIdentifier is not unique in this table. Whatever
they mean by “record”, it isn’t that. For example, there are 7 rows with one
common value of RowIdentifier that are clearly the 7 questions (in
Question) and textually coded answers (in FreeText) to a SWEMWBS
questionnaire. That means that to apply a FULLTEXT index, which requires an
indexed unique value, we have to add one.
12.7.14.3. Key fields
IDPatient– the SystmOne patient number, in all patient tables (PID, in CRATE terms).SRPatient.NHSNumber– the NHS number (MPID, in CRATE terms).
12.7.14.4. Notable tables in the SRE
[SR]Patient, as above
Patient identifiers and relationship/third-party details:
[SR]PatientAddressHistory
[SR]PatientContactDetails
[SR]HospitalAAndENumber
Relationship/third-party details:
[SR]PatientRelationship
some of the safeguarding tables
[SR]NDOptOutPreference, re NHS national data opt out (for NHS Act s251 use)
This has an IDPatient column; presumably presence indicates an active opt-out.
Full text and binary:
[SR]Media – contains filenames and some metadata
[SR]FreeText – if supplied
12.7.14.5. Notable additional tables/columns in the CPFT environment
S1_FreeText – this includes all answers to Questionnaires (linked via
IDAnsweredQuestionnaireetc.). Comes from the “upgraded” SRE.Several tables have identifiers linked in. For example, try:
SELECT * FROM information_schema.columns WHERE column_name = 'FirstName'
12.7.14.6. Notable tables omitted from the CPFT environment
Questionnaire – data is linked into to AnsweredQuestionnaire (which still contains the column
IDQuestionnaire).
12.7.14.7. CPFT copy
This broadly follows the SRE, but is expanded. Some notable differences:
Tables named
SR*in the SRE are namedS1_*in the CPFT version (e.g.SRPatientbecomesS1_Patient).There is a
S1_Patient.NationalDataOptOutcolumn (0 or 1).The local opt-out information appears in S1_ClinicalOutcome_ConsentResearch (as the OptOut field, a text field) but is clearer in S1_ClinicalOutcome_ConsentResearch_OptOutCheck, which only contains patients opting out and has:
IDPatient = <ID_of_patient_opting_out> SNOMEDCode = 1091881000000109 CTV3Code = 'XaaDb' CTV3Text = 'Declined invitation to participate in research study'
So for CPFT, we will autodetect this table/column (S1_ClinicalOutcome_ConsentResearch_OptOutCheck.SNOMEDCode) and the config file should contain:
optout_col_values = [1091881000000109]
There seem to be quite a few extra tables, such as:
S1_ClinicalMeasure_QRisk S1_ClinicalMeasure_SWEMWBS S1_ClinicalMeasure_Section58
These look like CPFT-created tables pulling data from questionnaires or similar.
There is
S1_FreeText, where someone (NP!) has helpfully imported that additional data.There is
S1_ClinicalOutcome_ConsentResearch, which is the traffic-light system for the CPFT Research Database.
In more detail:
All data is loaded via stored procedures, available via Microsoft SQL Server Management Studio in . Right-click any and choose “Modify” to view the source. For example, the stored procedure named
dbo.load_S1_Patientcreates theS1_Patienttable.RwNoorRwNo_Patientis frequently used, typically via:SELECT -- stuff, ROW_NUMBER() OVER ( PARTITION BY IDPatient ORDER BY DateEventRecorded DESC ) AS RwNo FROM -- somewhere WHERE RwNo = 1 ; SELECT -- stuff, ROW_NUMBER() OVER ( PARTITION BY IDPatient ORDER BY DateEvent DESC ) AS RwNo_Patient FROM -- somewhere ;… in other words, picking the most recent for each patient (or, without the WHERE clause, showing its sequencing within each patient).
12.7.14.8. Test patients in the live system?
There are some test patients in our live system.
SELECT COUNT(*) -- or DISTINCT firstname, surname
FROM S1_Patient
WHERE firstname LIKE '%test%' AND surname LIKE '%test%';
-- Several present. However, in the CPFT copy, column "TestPatient" from
-- this table (BOOLEAN in SRE docs) is missing. How to distinguish?
There are several present. They should be distinguished by the TestPatient
column (BOOLEAN, as per the SRE docs). Our code looks for the “TestPatient”
column and marks it as an opt-out flag.
Todo
TestPatient column missing in CPFT copy. [A/w NP 2022-03-21.]
12.7.14.9. Manual review after first draft
Reviewing CPFT de-identified output for patient-related content only (not staff-related), per local ethics approvals.
-- Tables in the de-identified database:
SELECT table_name FROM information_schema.tables WHERE table_catalog = 'S1' ORDER BY table_name;
All reviewed and this code tweaked accordingly.
- class crate_anon.preprocess.systmone_ddgen.CPFTAddressCol[source]
CPFT variants for the address table.
- class crate_anon.preprocess.systmone_ddgen.CPFTGenericCol[source]
” CPFT variants for generic column names.
- class crate_anon.preprocess.systmone_ddgen.CPFTPatientCol[source]
CPFT variants for the patient table.
- class crate_anon.preprocess.systmone_ddgen.CPFTTable[source]
Selected tables that CPFT have renamed or created.
- class crate_anon.preprocess.systmone_ddgen.CrateS1ViewCol[source]
Additional columns added by CRATE’s preprocessor
- class crate_anon.preprocess.systmone_ddgen.CrateView[source]
Views created by CRATE, which do not have contextual prefixes.
- class crate_anon.preprocess.systmone_ddgen.S1AddressCol[source]
Columns in the PatientAddressHistory table.
- class crate_anon.preprocess.systmone_ddgen.S1ContactCol[source]
Columns in the PatientContactDetails table.
- class crate_anon.preprocess.systmone_ddgen.S1GenericCol[source]
Columns used in many SystmOne tables.
- class crate_anon.preprocess.systmone_ddgen.S1HospNumCol[source]
Columns in the HospitalAAndENumber table.
- class crate_anon.preprocess.systmone_ddgen.S1RelCol[source]
Columns in the PatientRelationship table. (This is also one for which we specify everything in detail, since CPFT add in extra identifiers.)
- class crate_anon.preprocess.systmone_ddgen.S1Table[source]
SystmOne “core” table names, with no prefix.
- class crate_anon.preprocess.systmone_ddgen.ScrubSrcAlterMethodInfo(change_comment_and_indexing_only: bool = False, src_flags: str = '', scrub_src: ~crate_anon.anonymise.constants.ScrubSrc | None = None, scrub_method: ~crate_anon.anonymise.constants.ScrubMethod | None = None, decision: ~crate_anon.anonymise.constants.Decision = Decision.OMIT, alter_methods: ~typing.List[~crate_anon.anonymise.altermethod.AlterMethod] = <factory>, dest_datatype: str | None = None, dest_field: str | None = None)[source]
For describing scrub-source and alter-method information.
- __init__(change_comment_and_indexing_only: bool = False, src_flags: str = '', scrub_src: ~crate_anon.anonymise.constants.ScrubSrc | None = None, scrub_method: ~crate_anon.anonymise.constants.ScrubMethod | None = None, decision: ~crate_anon.anonymise.constants.Decision = Decision.OMIT, alter_methods: ~typing.List[~crate_anon.anonymise.altermethod.AlterMethod] = <factory>, dest_datatype: str | None = None, dest_field: str | None = None) None
- add_alter_method(alter_method: AlterMethod) None[source]
Adds an alteration method.
- class crate_anon.preprocess.systmone_ddgen.SystmOneContext(value)[source]
Environments in which we might have SystmOne data.
- class crate_anon.preprocess.systmone_ddgen.SystmOneSRESpecRow(d: Dict[str, Any])[source]
Represents a row in the SystmOne SRE specification CSV file.
- comment(context: SystmOneContext, with_table: bool = True) str[source]
Used to generate a comment for the CRATE data dictionary.
- Parameters:
context – The SystmOneContext in which data is being processed.
with_table – Include information about the table.
- description(context: SystmOneContext, with_table: bool = True) str[source]
Full description line.
- Parameters:
context – The SystmOneContext in which data is being processed.
with_table – Include information about the table.
- property linked_table_core: str
Core part of the linked table name.
- property tablename_core: str
Core part of the tablename.
- class crate_anon.preprocess.systmone_ddgen.SystmOneSRESpecs(context: SystmOneContext, filename: str)[source]
Loads and represents the SystmOne SRE specifications.
- __init__(context: SystmOneContext, filename: str) None[source]
Initialize by reading a SystmOne SRE specification CSV file.
- context:
The context from which SystmOne data is being extracted (e.g. the raw TPP Strategic Reporting Extract (SRE), or a local version processed into CPFT’s Data Warehouse).
- filename:
Optional filename for the TPP SRE specification file, in comma-separated value (CSV) format.
- get_spec_row(tablename_core: str, columnname: str) SystmOneSRESpecRow[source]
Look up a row specification.
- class crate_anon.preprocess.systmone_ddgen.TableCommentWorking(dd: DataDictionary, specifications: SystmOneSRESpecs, append_comments: bool = False, allow_unprefixed_tables: bool = False)[source]
Class used to store data temporarily about table comments, during SystmOne data dictionary annotation. Slightly complex because
- __init__(dd: DataDictionary, specifications: SystmOneSRESpecs, append_comments: bool = False, allow_unprefixed_tables: bool = False) None[source]
- Parameters:
dd – The data dictionary.
specifications – Details of the TPP SRE specifications.
append_comments – Append comments to any that were autogenerated, rather than replacing them. (If you use the SRE specifications, you may as well set this to False as the SRE specification comments are much better.)
allow_unprefixed_tables – Permit tables that don’t start with the expected contextual prefix? Discouraged; you may get odd tables and views.
- maybe_add_table_comment(ddr: DataDictionaryRow)[source]
We scan each data dictionary row via this function.
If we already have seen a comment for this table in the data dictionary, we don’t do anything, UNLESS this row is itself that comment, and then if
Otherwise, we add the SystmOne comment, if found, as an extra DDR, storing it in our “extra_table_comment_rows” list.
- crate_anon.preprocess.systmone_ddgen.annotate_systmone_dd_row(ddr: DataDictionaryRow, context: SystmOneContext, specifications: SystmOneSRESpecs, append_comments: bool = False, include_generic: bool = False, allow_unprefixed_tables: bool = False, table_info_in_comments: bool = True) None[source]
Modifies (in place) a data dictionary row for SystmOne.
- Parameters:
ddr – The data dictionary row to amend.
context – The context from which SystmOne data is being extracted (e.g. the raw TPP Strategic Reporting Extract (SRE), or a local version processed into CPFT’s Data Warehouse).
specifications – Details of the TPP SRE specifications.
append_comments – Append comments to any that were autogenerated, rather than replacing them. (If you use the SRE specifications, you may as well set this to False as the SRE specification comments are much better.)
include_generic – Include all fields that are not known about by this code and treated specially? If False, the config file settings are used (which may omit or include). If True, all such fields are included.
allow_unprefixed_tables – Permit tables that don’t start with the expected contextual prefix? Discouraged; you may get odd tables and views. A few (see INCLUDE_TABLES_REGEX) are explicitly included anyway.
table_info_in_comments – Include table descriptions in column comments?
- crate_anon.preprocess.systmone_ddgen.contextual_columnname(tablename_core: str, columnname_core: str, to_context: SystmOneContext) str[source]
Translates a “core” column name to its contextual variant, if applicable.
- crate_anon.preprocess.systmone_ddgen.contextual_tablename(tablename_core: str, to_context: SystmOneContext) str[source]
Prefixes the “core” table name for a given context, and sometimes translates it too.
- crate_anon.preprocess.systmone_ddgen.core_columnname(tablename_core: str, columnname_context: str, from_context: SystmOneContext) str[source]
Some contexts rename their column names. This function puts them back into the “core” (TPP SRE) name space.
- crate_anon.preprocess.systmone_ddgen.core_tablename(tablename: str, from_context: SystmOneContext, allow_unprefixed: bool = False) str[source]
Is this a table of an expected format that we will consider? - If so, returns the “core” part of the tablename, in the given context. - Otherwise, if
allow_unprefixedreturn the input. - Otherwise, return an empty string.
- crate_anon.preprocess.systmone_ddgen.cpft_s1_tablename(core_tablename: str) str[source]
Helper function for the consent-for-contact system, but conceptually it sits reasonably well here.
- Parameters:
core_tablename – Table name in S1 “core” format (devoid of any prefix).
- Returns:
Returns the local CPFT table name.
- crate_anon.preprocess.systmone_ddgen.eq(x: str, y: str) bool[source]
Case-insensitive string comparison.
- crate_anon.preprocess.systmone_ddgen.eq_re(x: str, y_regex: str) bool[source]
Returns True if the regex matches at the start of the string.
- crate_anon.preprocess.systmone_ddgen.get_index_flag(tablename: str, colname: str, ddr: DataDictionaryRow, context: SystmOneContext) IndexType | None[source]
Should this be indexed? Returns an indexing flag, or
Noneif it should not be indexed.
- crate_anon.preprocess.systmone_ddgen.get_scrub_alter_details(tablename: str, colname: str, ddr: DataDictionaryRow, context: SystmOneContext, include_generic: bool = False) ScrubSrcAlterMethodInfo[source]
The main “thinking” function.
Is this a sensitive field that should be used for scrubbing? Should it be modified in transit?
- Parameters:
tablename – The “core” tablename being considered, without any prefix (e.g. “Patient”, not “SRPatient” or “S1_Patient”).
colname – The database column name.
ddr – Data dictionary row.
context – The context from which SystmOne data is being extracted (e.g. the raw TPP Strategic Reporting Extract (SRE), or a local version processed into CPFT’s Data Warehouse).
include_generic – Include all fields that are not known about by this code and treated specially? If False, the config file settings are used (which may omit or include). If True, all such fields are included.
- crate_anon.preprocess.systmone_ddgen.is_free_text(tablename: str, colname: str, context: SystmOneContext, ddr: DataDictionaryRow | None = None) bool[source]
Is this a free-text field requiring scrubbing?
Unusually, there is not very much free text, and it is mostly collated. (We haven’t added binary support yet. Do we have the binary documents?)
- crate_anon.preprocess.systmone_ddgen.is_in(x: str, y: Iterable[str]) bool[source]
Case-insensitive version of “in”, to replace “if x in y”.
- crate_anon.preprocess.systmone_ddgen.is_in_re(x: str, y_regexes: Iterable[str]) bool[source]
Case-insensitive regex-based version of “in”, to replace “if x in y”.
- crate_anon.preprocess.systmone_ddgen.is_master_patient_table(tablename: str) bool[source]
Is this the master patient table?
- crate_anon.preprocess.systmone_ddgen.is_mpid(colname: str, context: SystmOneContext) bool[source]
Is this column the master patient identifier (MPID), i.e. the NHS number?
- crate_anon.preprocess.systmone_ddgen.is_other_system_id(colname: str, context: SystmOneContext) bool[source]
Is this column an ID from another system (e.g. RiO, PCMIS)?
- crate_anon.preprocess.systmone_ddgen.is_pair_in(a: str, b: str, y: Iterable[Tuple[str, str]]) bool[source]
Case-insensitive version of “in”, to replace “if a, b in y”.
- crate_anon.preprocess.systmone_ddgen.is_pair_in_re(a: str, b: str, y_regexes: Iterable[Tuple[str, str]]) bool[source]
Case-insensitive regex-based version of “in”, to replace “if a, b in y”.
- crate_anon.preprocess.systmone_ddgen.is_pid(colname: str, context: SystmOneContext) bool[source]
Is this column the SystmOne primary patient identifier (PID)?
It’s nearly always S1GenericCol.PID. But occasionally something else (e.g. in CPFT-created tables).
This works for all tables EXCEPT the main “Patient” table, where the PK takes its place.
Occasionally, CPFT tables blend SystmOne patients with other patients using IDs from other EHR systems. However, those patients won’t be in our master patient index, so their data won’t be brought through.
- crate_anon.preprocess.systmone_ddgen.is_pk(tablename: str, colname: str, context: SystmOneContext, ddr: DataDictionaryRow | None = None) bool[source]
Is this a primary key (PK) column within its table?
- crate_anon.preprocess.systmone_ddgen.join_comments(comments: List[str]) str[source]
Joins comment elements, skipping any blanks.
- crate_anon.preprocess.systmone_ddgen.modify_dd_for_systmone(dd: DataDictionary, context: SystmOneContext, sre_spec_csv_filename: str = '', debug_specs: bool = False, append_comments: bool = False, include_generic: bool = False, allow_unprefixed_tables: bool = False, alter_loaded_rows: bool = False, table_info_in_comments: bool = True) None[source]
Modifies a data dictionary in place.
- Parameters:
dd – The data dictionary to amend.
context – The context from which SystmOne data is being extracted (e.g. the raw TPP Strategic Reporting Extract (SRE), or a local version processed into CPFT’s Data Warehouse).
sre_spec_csv_filename – Optional filename for the TPP SRE specification file, in comma-separated value (CSV) format. If present, this will be used to add proper descriptive comments to all known fields. Highly recommended.
debug_specs – Report the SRE specifications to the log.
append_comments – Append comments to any that were autogenerated, rather than replacing them. (If you use the SRE specifications, you may as well set this to False as the SRE specification comments are much better.)
include_generic – Include all fields that are not known about by this code and treated specially? If False, the config file settings are used (which may omit or include). If True, all such fields are included.
allow_unprefixed_tables – Permit tables that don’t start with the expected contextual prefix? Discouraged; you may get odd tables and views.
alter_loaded_rows – Alter rows that were loaded from disk (not read from a database)? The default is to leave such rows untouched.
table_info_in_comments – Include table descriptions in column comments?
- crate_anon.preprocess.systmone_ddgen.not_just_at_start(x: str) str[source]
Apply a prefix so that a regex string doesn’t just work at the start of a string.
- crate_anon.preprocess.systmone_ddgen.process_generic_table_column(tablename: str, colname: str, ddr: DataDictionaryRow, ssi: ScrubSrcAlterMethodInfo, context: SystmOneContext) bool[source]
Performs operations applicable to columns any SystmOne table, except a few very special ones like Patient. Modifies
ssiin place.Returns: recognized and dealt with?
- crate_anon.preprocess.systmone_ddgen.should_be_fulltext_indexed(tablename: str, colname: str) bool[source]
Is this a field that should get a FULLTEXT index? That’s not just “a column that contains free text and should be scrubbed”, that is “a column with a lot of interesting free text that should get a special index”.
- crate_anon.preprocess.systmone_ddgen.tablename_prefix(context: SystmOneContext) str[source]
The tablename prefix in the given context.
- crate_anon.preprocess.systmone_ddgen.tcmatch(table1: str, column1: str, table2: str, column2: str) bool[source]
Equal (in case-insensitive fashion) for table and column?
- crate_anon.preprocess.systmone_ddgen.terminate(x: str) str[source]
Apply an end-of-string terminator to a regex string.
- crate_anon.preprocess.systmone_ddgen.translate_tablename(from_tablename: str, from_context: SystmOneContext, to_context: SystmOneContext)[source]
Translates a table name from one S1 context to another.