14.1.2. crate_anon.anonymise.altermethod
crate_anon/anonymise/altermethod.py
Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).
This file is part of CRATE.
CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.
The AlterMethod class.
- class crate_anon.anonymise.altermethod.AlterMethod(config: Config, text_value: str = None, scrub: bool = False, truncate_date: bool = False, extract_from_filename: bool = False, extract_from_file_format: bool = False, file_format_str: str = '', extract_from_blob: bool = False, skip_if_text_extract_fails: bool = False, extract_ext_field: str = '', hash_: bool = False, hash_config_section: str = '', html_unescape: bool = False, html_untag: bool = False)[source]
Implements a SINGLE transformation of source data on its way to the destination database.
Knows how to represent itself as a text element in the relevant column of a data dictionary row, and how to create itself from one of those text elements.
A
crate_anon.anonymise.ddr.DataDictionaryRow
may include multiple instances ofcrate_anon.anonymise.altermethod.AlterMethod
in a sequence.- __init__(config: Config, text_value: str = None, scrub: bool = False, truncate_date: bool = False, extract_from_filename: bool = False, extract_from_file_format: bool = False, file_format_str: str = '', extract_from_blob: bool = False, skip_if_text_extract_fails: bool = False, extract_ext_field: str = '', hash_: bool = False, hash_config_section: str = '', html_unescape: bool = False, html_untag: bool = False) None [source]
- Parameters:
config – a
crate_anon.anonymise.config.Config
text_value – string (from the data dictionary) to parse via
set_from_text()
; may set many of the other attributesscrub – Boolean; “the source field contains sensitive text; scrub it”
truncate_date – Boolean; “the source is a date; truncate it to the first of the month”
extract_from_filename – Boolean; “the source is a filename; extract the text from it”
extract_from_file_format – Boolean; “the source is a partial filename; combine it with
file_format_str
to calculate the full filename, then extract the text from it”file_format_str – format string for use with
extract_from_file_format
extract_from_blob – Boolean; “the source is binary (the database contains a BLOB); extract text from it”. See also
extract_ext_field
.skip_if_text_extract_fails – Boolean: “if text extraction fails, skip the record entirely”
extract_ext_field – For when the database contains a BLOB: this parameter indicates a database column (field) name, in the same row, that contains the file’s extension, to help identify the BLOB.
hash – Boolean. If true, transform the source by hashing it.
hash_config_section – If
hash_
is true, this specifies the config section in which the hash is defined.html_unescape – Boolean: “transform the source by HTML-unescaping it”. For example, this would convert
≤
to<
.html_untag – Boolean: “transform the source by removing HTML tags”. For example, this would convert
hello <b>bold</b> world
tohello bold world
.
- alter(value: Any, ddr: DataDictionaryRow, row: List[Any], ddrows: List[DataDictionaryRow], patient: patient.Patient = None) Tuple[Any, bool] [source]
Performs the alteration.
- Parameters:
value – source value of interest
ddr – corresponding
crate_anon.anonymise.ddr.DataDictionaryRow
row – all values in the same source row
ddrows – all data dictionary rows
patient –
crate_anon.anonymise.patient.Patient
object
- Returns:
newvalue, skiprow
- Return type:
tuple
If multiple transformations are specified within one
AlterMethod
, only one is performed, and in the following order:scrub
truncate_date
extract_text
hash
html_unescape
html_untag
skip_if_text_extract_fails
However, multiple alteration methods can be specified for one field. See
crate_anon.anonymise.anonymise.process_table()
andcrate_anon.anonymise.ddr.DataDictionaryRow
.
- property as_text: str
Return the
alter_method
fragment from the working fields; effectively the reverse ofset_from_text()
.