14.5.15. crate_anon.nlp_manager.input_field_config

crate_anon/nlp_manager/input_field_config.py


Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.


Class to define input fields for NLP.

class crate_anon.nlp_manager.input_field_config.InputFieldConfig(nlpdef: NlpDefinition, cfg_input_name: str)[source]

Class defining an input field for NLP (containing text).

See the documentation for the NLP config file.

__init__(nlpdef: NlpDefinition, cfg_input_name: str) None[source]

Read config from a configparser section, and also associate with a specific NLP definition.

Parameters:
delete_all_progress_records() None[source]

Deletes all records from the progress database for this NLP definition (across all source tables/columns).

delete_progress_records_where_srcpk_not(temptable: Table | None) None[source]

If temptable is None, deletes all progress records for this input field/NLP definition.

If temptable is a table, deletes records from the progress database (from this input field/NLP definition) whose source PK is not in the temporary table. (Used for deleting NLP records when the source has subsequently been deleted.)

gen_src_pks() Generator[Tuple[int, str | None], None, None][source]

Generate integer PKs from the source table.

For tables with an integer PK, yields tuples: pk_value, None.

For tables with a string PK, yields tuples: pk_hash, pk_value.

  • Timing is subsumed under the timer named TIMING_DELETE_WHERE_NO_SOURCE.

gen_text(tasknum: int = 0, ntasks: int = 1) Generator[Tuple[str, Dict[str, Any]], None, None][source]

Generate text strings from the source database, for NLP. Text fields that are NULL, empty, or contain only whitespace, are skipped.

Yields:

tupletext, dict, where text is the source text and dict is a column-to-value mapping for all other fields (source reference fields, copy fields).

get_copy_columns() List[Column][source]

Returns the columns that the user has requested to be copied from the source table to the NLP destination table.

Returns:

a list of SQLAlchemy Column objects

get_copy_indexes() List[Index][source]

Returns indexes that should be made in the destination table for columns that the user has requested to be copied from the source.

Returns:

a list of SQLAlchemy Index objects

static get_core_columns_for_dest() List[Column][source]

Returns the columns used NLP destination tables, primarily describing the source. See Standard NLP output columns.

Returns:

a list of SQLAlchemy Column objects

static get_core_indexes_for_dest() List[Index][source]

Returns the core indexes to be applied to the destination tables. Primarily, these are for columns that refer to the source.

Returns:

a list of SQLAlchemy Index objects

See

get_count() int[source]

Counts records in the source table.

Used for progress monitoring.

get_progress_record(srcpkval: int, srcpkstr: str | None = None) NlpRecord | None[source]

Fetch a progress record for the given source record, if one exists.

Returns:

crate_anon.nlp_manager.models.NlpRecord, or None

is_pk_integer() bool[source]

Is the primary key (PK) of the source table an integer?

property source_session: Session

Returns the SQLAlchemy ORM Session for the source database.

property srcdatetimefield: str

Returns the name of the field (column) in the source table that defines the date/time of the source text.

property srcdb: str

Returns the name of the source database.

property srcfield: str

Returns the name of the text field (column) in the source table.

property srcpkfield: str

Returns the name of the primary key (PK) field (column) in the source table.

property srctable: str

Returns the name of the source table.