14.5.18. crate_anon.nlp_manager.nlp_definition
crate_anon/nlp_manager/nlp_definition.py
Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).
This file is part of CRATE.
CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.
NLP definition class.
- class crate_anon.nlp_manager.nlp_definition.NlpDefinition(nlpname: str, logtag: str = '')
Class representing an NLP master definition as read from config file.
An NLP definition represents the combination of
one or more NLP processors (e.g. “CRATE’s C-reactive protein finder”)
one or more input fields in the source database
The NLP definition can therefore be used to say “run this set of NLP processors over this set of textual fields in my database”.
See the documentation for the NLP config file.
- __init__(nlpname: str, logtag: str = '') None
Read config from file.
- Parameters:
nlpname – config section name for this NLP definition
logtag – text that may be passed to child processes to identify the NLP definition in their log output
- commit(session: Session) None
Executes a COMMIT on a specific session.
- Parameters:
session – SQLAlchemy ORM
Session
- commit_all() None
Execute a COMMIT on all databases (all destination database and the progress database).
- get_cloud_config() CloudConfig | None
Returns the
crate_anon.nlp_manager.cloud_config.CloudConfig
object associated with this NLP definition, orNone
if there isn’t one.
- get_cloud_config_or_raise() CloudConfig
Returns the
crate_anon.nlp_manager.cloud_config.CloudConfig
object associated with this NLP definition, or raiseValueError
if there isn’t one.
- get_config_section(section: str) ConfigSection
Returns a
crate_anon.common.extendedconfigparser.ConfigSection
referring to a (potentially different) section.- Parameters:
section – New section name.
- get_database(name_and_cfg_section: str, with_session: bool = True, with_conn: bool = False, reflect: bool = False) DatabaseHolder
Returns a
crate_anon.anonymise.dbholder.DatabaseHolder
from the config file, containing information abuot a database.- Parameters:
name_and_cfg_section – string that is the name of the database, and also the config file section name describing the database
with_session – create an SQLAlchemy Session?
with_conn – create an SQLAlchemy connection (via an Engine)?
reflect – read the database structure (when required)?
- get_env_dict(env_section_name: str, parent_env: Dict[str, str] | None = None) Dict[str, str]
Gets an operating system environment variable dictionary (
variable: value
mapping) from the config file.- Parameters:
env_section_name – config section name, without its “env:” prefix
parent_env – optional starting point (e.g. parent OS environment)
- Returns:
a dictionary suitable for use as an OS environment
- get_transation_limiter(session: Session) TransactionSizeLimiter
Returns (or creates and returns) a transaction limiter for a given SQLAlchemy session.
- Parameters:
session – SQLAlchemy ORM
Session
- Returns:
- hash(text: str) str
Hash text via this NLP definition’s hasher. The hash will be stored in a secret progress database and to detect later changes in the source records.
- Parameters:
text – text (typically from the source database) to be hashed
- Returns:
the hashed value
- property inputfieldconfigs: Iterable[InputFieldConfig]
Returns all input field configurations used by this NLP definition.
- Returns:
list of crate_anon.nlp_manager.input_field_config.InputFieldConfig objects
- property logtag: str
Returns the log tag of the NLP definition (may be used by child processes to provide more information for logs).
- property name: str
Returns the name of the NLP definition.
- nlprp_local_processors(sql_dialect: str | None = None) Dict[str, Any]
Returns a draft list of processors as per the NLPRP list_processors command.
- nlprp_local_processors_json(indent: int = 4, sort_keys: bool = True, sql_dialect: str | None = None) str
Returns a formatted JSON string from
nlprp_list_processors()
. This is primarily for debugging.- Parameters:
indent – number of spaces for indentation
sort_keys – sort keys?
sql_dialect – preferred SQL dialect for
tabular_schema
, orNone
for default
- property noncloud_processors: List[BaseNlpParser]
Returns all local (non-cloud) NLP processors used by this NLP definition.
- Returns:
list of objects derived from
crate_anon.nlp_manager.base_nlp_parser.BaseNlpParser
- notify_transaction(session: Session, n_rows: int, n_bytes: int, force_commit: bool = False) None
Tell our transaction limiter about a transaction that’s occurred on one of our databases. This may trigger a COMMIT.
- Parameters:
session – SQLAlchemy ORM
Session
that was usedn_rows – number of rows inserted
n_bytes – number of bytes inserted
force_commit – force a COMMIT?
- property now: datetime
Returns the time this NLP definition was created (in UTC). Used to time-stamp NLP runs.
- property parser: ExtendedConfigParser
Returns the
crate_anon.common.extendedconfigparser.ExtendedConfigParser
in use.
- property processors: List[TableMaker]
Returns all NLP processors used by this NLP definition.
- Returns:
list of objects derived from
crate_anon.nlp_manager.base_nlp_parser.BaseNlpParser
- property progdb: DatabaseHolder
Returns the progress database.
- property progressdb_engine: Engine
Returns an SQLAlchemy Core
Engine
for the progress database.
- property progressdb_metadata: MetaData
Returns the SQLAlchemy
MetaData
for the progress database.
- property progressdb_session: Session
Returns an SQLAlchemy ORM
Session
for the progress database.
- set_echo(echo: bool) None
Set the SQLAlchemy
echo
parameter (to echo SQL) for all our source databases.
- property temporary_tablename: str
Temporary tablename to use.
See the documentation for the NLP config file.
- property uses_cloud_processors: bool
Are any of our processors cloud-based?
- crate_anon.nlp_manager.nlp_definition.demo_nlp_config() str
Returns a demo NLP config file for CRATE.
- crate_anon.nlp_manager.nlp_definition.get_nlp_config_filename_or_exit() str
Returns the config filename, from our environment variable. If we can’t retrieve it, perform a hard exit.