14.1.24. crate_anon.anonymise.subset_db

crate_anon/anonymise/subset_db.py


Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.


Create a simple subset of a database.

class crate_anon.anonymise.subset_db.DatabaseFilterSource(name: str, url: str, table: str, column: str, echo: bool = False)[source]
__init__(name: str, url: str, table: str, column: str, echo: bool = False) None
class crate_anon.anonymise.subset_db.SubsetConfig(src_db_url: str, dst_db_url: str, filter_column: str | None = None, filter_values: List[str] | None = None, filter_value_filenames: List[str] | None = None, filter_value_db_urls: List[str] | None = None, filter_value_tablecols: List[str] | None = None, include_rows_filtercol_null: bool = False, include_tables_without_filtercol: bool = True, include_tables: List[str] | None = None, include_table_filenames: List[str] | None = None, exclude_tables: List[str] | None = None, exclude_table_filenames: List[str] | None = None, echo: bool = False)[source]

Simple configuration class for subsetting databases.

__init__(src_db_url: str, dst_db_url: str, filter_column: str | None = None, filter_values: List[str] | None = None, filter_value_filenames: List[str] | None = None, filter_value_db_urls: List[str] | None = None, filter_value_tablecols: List[str] | None = None, include_rows_filtercol_null: bool = False, include_tables_without_filtercol: bool = True, include_tables: List[str] | None = None, include_table_filenames: List[str] | None = None, exclude_tables: List[str] | None = None, exclude_table_filenames: List[str] | None = None, echo: bool = False) None[source]
Parameters:
  • src_db_url – SQLAlchemy URL for the source database.

  • dst_db_url – SQLAlchemy URL for the destination database.

  • filter_column – Name of column to filter on (e.g. “patient_id”). If blank, might copy everything.

  • filter_values – Values, treated as strings, to accept.

  • filter_value_filenames – Filename(s), containing values, treated as strings, to accept.

  • include_rows_filtercol_null – Allow the filter column to be NULL as well?

  • include_tables_without_filtercol – Include tables that don’t possess the filter column (e.g. system/lookup tables)?

  • include_tables – Specific named tables to include.

  • include_table_filenames – Filename(s), containin specific named tables to include.

  • exclude_tables – Specific named tables to exclude.

  • exclude_table_filenames – Filename(s), containin specific named tables to exclude.

  • echo – Echo SQL (debugging only)?

permit_table_name(table_name: str) bool[source]

Should this table be permitted (judging only by its name)?

property safe_dst_db_url: str

Password-obscured version of the destination database URL.

property safe_src_db_url: str

Password-obscured version of the source database URL.

class crate_anon.anonymise.subset_db.Subsetter(cfg: SubsetConfig)[source]

Class to take a subset of data from one database to another.

__init__(cfg: SubsetConfig) None[source]
column_names(table_name: str) List[str][source]

Returns column names for a (source) table column.

commit() None[source]

Commit changes to the destination database.

contains_filter_col(table_name: str) bool[source]

Does this table contain our target filter column?

create_dst_table(table_name: str) None[source]

Create a table on the destination side.

drop_dst_table_if_exists(table_name: str) None[source]

Drop a table on the destination side. Also remove it from the destination metadata, so we can recreate it (if necessary) without complaint.

dst_sqla_table(table_name: str) Table[source]

Returns the SQLAlchemy Table from the destination database.

gen_filtered_rows(table_name: str) Generator[Row, None, None][source]

Generate filtered source rows from the database.

gen_src_rows(table_name: str) Generator[Row, None, None][source]

Generate unfiltered source rows from the database.

permit_table(table_name: str) bool[source]

Is this table name permitted to go through to the destination?

src_sqla_table(table_name: str) Table[source]

Returns the SQLAlchemy Table from the source database.

subset_db() None[source]

Main function – create a subset of the source database.

subset_table(table_name: str) None[source]

Read rows from the source table; filter them as required; store them in the destination table.

crate_anon.anonymise.subset_db.main() None[source]

Command-line entry point.

crate_anon.anonymise.subset_db.to_str(x: Any) str | None[source]

Convert to a string, or None.