14.1.2. crate_anon.anonymise.altermethod

crate_anon/anonymise/altermethod.py


Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.


The AlterMethod class.

class crate_anon.anonymise.altermethod.AlterMethod(config: Config, text_value: str = None, scrub: bool = False, truncate_date: bool = False, extract_from_filename: bool = False, extract_from_file_format: bool = False, file_format_str: str = '', extract_from_blob: bool = False, skip_if_text_extract_fails: bool = False, extract_ext_field: str = '', hash_: bool = False, hash_config_section: str = '', html_unescape: bool = False, html_untag: bool = False)[source]

Implements a SINGLE transformation of source data on its way to the destination database.

Knows how to represent itself as a text element in the relevant column of a data dictionary row, and how to create itself from one of those text elements.

A crate_anon.anonymise.ddr.DataDictionaryRow may include multiple instances of crate_anon.anonymise.altermethod.AlterMethod in a sequence.

__init__(config: Config, text_value: str = None, scrub: bool = False, truncate_date: bool = False, extract_from_filename: bool = False, extract_from_file_format: bool = False, file_format_str: str = '', extract_from_blob: bool = False, skip_if_text_extract_fails: bool = False, extract_ext_field: str = '', hash_: bool = False, hash_config_section: str = '', html_unescape: bool = False, html_untag: bool = False) None[source]
Parameters:
  • config – a crate_anon.anonymise.config.Config

  • text_value – string (from the data dictionary) to parse via set_from_text(); may set many of the other attributes

  • scrub – Boolean; “the source field contains sensitive text; scrub it”

  • truncate_date – Boolean; “the source is a date; truncate it to the first of the month”

  • extract_from_filename – Boolean; “the source is a filename; extract the text from it”

  • extract_from_file_format – Boolean; “the source is a partial filename; combine it with file_format_str to calculate the full filename, then extract the text from it”

  • file_format_str – format string for use with extract_from_file_format

  • extract_from_blob – Boolean; “the source is binary (the database contains a BLOB); extract text from it”. See also extract_ext_field.

  • skip_if_text_extract_fails – Boolean: “if text extraction fails, skip the record entirely”

  • extract_ext_field – For when the database contains a BLOB: this parameter indicates a database column (field) name, in the same row, that contains the file’s extension, to help identify the BLOB.

  • hash – Boolean. If true, transform the source by hashing it.

  • hash_config_section – If hash_ is true, this specifies the config section in which the hash is defined.

  • html_unescape – Boolean: “transform the source by HTML-unescaping it”. For example, this would convert &le; to <.

  • html_untag – Boolean: “transform the source by removing HTML tags”. For example, this would convert hello <b>bold</b> world to hello bold world.

alter(value: Any, ddr: DataDictionaryRow, row: List[Any], ddrows: List[DataDictionaryRow], patient: patient.Patient = None) Tuple[Any, bool][source]

Performs the alteration.

Parameters:
Returns:

newvalue, skiprow

Return type:

tuple

If multiple transformations are specified within one AlterMethod, only one is performed, and in the following order:

  1. scrub

  2. truncate_date

  3. extract_text

  4. hash

  5. html_unescape

  6. html_untag

  7. skip_if_text_extract_fails

However, multiple alteration methods can be specified for one field. See crate_anon.anonymise.anonymise.process_table() and crate_anon.anonymise.ddr.DataDictionaryRow.

property as_text: str

Return the alter_method fragment from the working fields; effectively the reverse of set_from_text().

set_from_text(value: str) None[source]

Take the string from the alter_method field of the data dictionary, and use it to set a bunch of internal attributes.

To get the configuration string back, see get_text().