14.5.15. crate_anon.nlp_manager.input_field_config

crate_anon/nlp_manager/input_field_config.py


Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.


Class to define input fields for NLP.

class crate_anon.nlp_manager.input_field_config.InputFieldConfig(nlpdef: crate_anon.nlp_manager.nlp_definition.NlpDefinition, cfg_input_name: str)[source]

Class defining an input field for NLP (containing text).

See the documentation for the NLP config file.

__init__(nlpdef: crate_anon.nlp_manager.nlp_definition.NlpDefinition, cfg_input_name: str) None[source]

Read config from a configparser section, and also associate with a specific NLP definition.

Parameters
delete_all_progress_records() None[source]

Deletes all records from the progress database for this NLP definition (across all source tables/columns).

delete_progress_records_where_srcpk_not(temptable: Optional[sqlalchemy.sql.schema.Table]) None[source]

If temptable is None, deletes all progress records for this input field/NLP definition.

If temptable is a table, deletes records from the progress database (from this input field/NLP definition) whose source PK is not in the temporary table. (Used for deleting NLP records when the source has subsequently been deleted.)

gen_src_pks() Generator[Tuple[int, Optional[str]], None, None][source]

Generate integer PKs from the source table.

For tables with an integer PK, yields tuples: pk_value, None.

For tables with a string PK, yields tuples: pk_hash, pk_value.

  • Timing is subsumed under the timer named TIMING_DELETE_WHERE_NO_SOURCE.

gen_text(tasknum: int = 0, ntasks: int = 1) Generator[Tuple[str, Dict[str, Any]], None, None][source]

Generate text strings from the source database, for NLP. Text fields that are NULL, empty, or contain only whitespace, are skipped.

Yields

tupletext, dict, where text is the source text and dict is a column-to-value mapping for all other fields (source reference fields, copy fields).

get_copy_columns() List[sqlalchemy.sql.schema.Column][source]

Returns the columns that the user has requested to be copied from the source table to the NLP destination table.

Returns

a list of SQLAlchemy Column objects

get_copy_indexes() List[sqlalchemy.sql.schema.Index][source]

Returns indexes that should be made in the destination table for columns that the user has requested to be copied from the source.

Returns

a list of SQLAlchemy Index objects

static get_core_columns_for_dest() List[sqlalchemy.sql.schema.Column][source]

Returns the columns used NLP destination tables, primarily describing the source. See Standard NLP output columns.

Returns

a list of SQLAlchemy Column objects

static get_core_indexes_for_dest() List[sqlalchemy.sql.schema.Index][source]

Returns the core indexes to be applied to the destination tables. Primarily, these are for columns that refer to the source.

Returns

a list of SQLAlchemy Index objects

See

get_count() int[source]

Counts records in the source table.

Used for progress monitoring.

get_progress_record(srcpkval: int, srcpkstr: Optional[str] = None) Optional[crate_anon.nlp_manager.models.NlpRecord][source]

Fetch a progress record for the given source record, if one exists.

Returns

crate_anon.nlp_manager.models.NlpRecord, or None

is_pk_integer() bool[source]

Is the primary key (PK) of the source table an integer?

property source_session: sqlalchemy.orm.session.Session

Returns the SQLAlchemy ORM Session for the source database.

property srcdatetimefield: str

Returns the name of the field (column) in the source table that defines the date/time of the source text.

property srcdb: str

Returns the name of the source database.

property srcfield: str

Returns the name of the text field (column) in the source table.

property srcpkfield: str

Returns the name of the primary key (PK) field (column) in the source table.

property srctable: str

Returns the name of the source table.