14.5.15. crate_anon.nlp_manager.input_field_config
crate_anon/nlp_manager/input_field_config.py
Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).
This file is part of CRATE.
CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.
Class to define input fields for NLP.
- class crate_anon.nlp_manager.input_field_config.InputFieldConfig(nlpdef: NlpDefinition, cfg_input_name: str)[source]
Class defining an input field for NLP (containing text).
See the documentation for the NLP config file.
- __init__(nlpdef: NlpDefinition, cfg_input_name: str) None [source]
Read config from a configparser section, and also associate with a specific NLP definition.
- Parameters:
nlpdef –
crate_anon.nlp_manager.nlp_definition.NlpDefinition
, the master NLP definition, referring to the master config file etc.cfg_input_name – config section name for the input field definition
- delete_all_progress_records() None [source]
Deletes all records from the progress database for this NLP definition (across all source tables/columns).
- delete_progress_records_where_srcpk_not(temptable: Table | None) None [source]
If
temptable
is None, deletes all progress records for this input field/NLP definition.If
temptable
is a table, deletes records from the progress database (from this input field/NLP definition) whose source PK is not in the temporary table. (Used for deleting NLP records when the source has subsequently been deleted.)
- gen_src_pks() Generator[Tuple[int, str | None], None, None] [source]
Generate integer PKs from the source table.
For tables with an integer PK, yields tuples:
pk_value, None
.For tables with a string PK, yields tuples:
pk_hash, pk_value
.Timing is subsumed under the timer named
TIMING_DELETE_WHERE_NO_SOURCE
.
- gen_text(tasknum: int = 0, ntasks: int = 1) Generator[Tuple[str, Dict[str, Any]], None, None] [source]
Generate text strings from the source database, for NLP. Text fields that are NULL, empty, or contain only whitespace, are skipped.
- Yields:
tuple –
text, dict
, wheretext
is the source text anddict
is a column-to-value mapping for all other fields (source reference fields, copy fields).
- get_copy_columns() List[Column] [source]
Returns the columns that the user has requested to be copied from the source table to the NLP destination table.
- Returns:
a list of SQLAlchemy
Column
objects
- get_copy_indexes() List[Index] [source]
Returns indexes that should be made in the destination table for columns that the user has requested to be copied from the source.
- Returns:
a list of SQLAlchemy
Index
objects
- static get_core_columns_for_dest() List[Column] [source]
Returns the columns used NLP destination tables, primarily describing the source. See Standard NLP output columns.
- Returns:
a list of SQLAlchemy
Column
objects
- static get_core_indexes_for_dest() List[Index] [source]
Returns the core indexes to be applied to the destination tables. Primarily, these are for columns that refer to the source.
- Returns:
a list of SQLAlchemy
Index
objects
See
- get_progress_record(srcpkval: int, srcpkstr: str | None = None) NlpRecord | None [source]
Fetch a progress record for the given source record, if one exists.
- Returns:
- property source_session: Session
Returns the SQLAlchemy ORM
Session
for the source database.
- property srcdatetimefield: str
Returns the name of the field (column) in the source table that defines the date/time of the source text.
- property srcdb: str
Returns the name of the source database.
- property srcfield: str
Returns the name of the text field (column) in the source table.
- property srcpkfield: str
Returns the name of the primary key (PK) field (column) in the source table.
- property srctable: str
Returns the name of the source table.