14.5.18. crate_anon.nlp_manager.nlp_definition
crate_anon/nlp_manager/nlp_definition.py
Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).
This file is part of CRATE.
CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.
NLP definition class.
- class crate_anon.nlp_manager.nlp_definition.NlpDefinition(nlpname: str, logtag: str = '')[source]
Class representing an NLP master definition as read from config file.
An NLP definition represents the combination of
one or more NLP processors (e.g. “CRATE’s C-reactive protein finder”)
one or more input fields in the source database
The NLP definition can therefore be used to say “run this set of NLP processors over this set of textual fields in my database”.
See the documentation for the NLP config file.
- __init__(nlpname: str, logtag: str = '') None [source]
Read config from file.
- Parameters
nlpname – config section name for this NLP definition
logtag – text that may be passed to child processes to identify the NLP definition in their log output
- commit(session: sqlalchemy.orm.session.Session) None [source]
Executes a COMMIT on a specific session.
- Parameters
session – SQLAlchemy ORM
Session
- commit_all() None [source]
Execute a COMMIT on all databases (all destination database and the progress database).
- get_cloud_config() Optional[crate_anon.nlp_manager.cloud_config.CloudConfig] [source]
Returns the
crate_anon.nlp_manager.cloud_config.CloudConfig
object associated with this NLP definition, orNone
if there isn’t one.
- get_cloud_config_or_raise() crate_anon.nlp_manager.cloud_config.CloudConfig [source]
Returns the
crate_anon.nlp_manager.cloud_config.CloudConfig
object associated with this NLP definition, or raiseValueError
if there isn’t one.
- get_config_section(section: str) crate_anon.common.extendedconfigparser.ConfigSection [source]
Returns a
crate_anon.common.extendedconfigparser.ConfigSection
referring to a (potentially different) section.- Parameters
section – New section name.
- get_database(name_and_cfg_section: str, with_session: bool = True, with_conn: bool = False, reflect: bool = False) crate_anon.anonymise.dbholder.DatabaseHolder [source]
Returns a
crate_anon.anonymise.dbholder.DatabaseHolder
from the config file, containing information abuot a database.- Parameters
name_and_cfg_section – string that is the name of the database, and also the config file section name describing the database
with_session – create an SQLAlchemy Session?
with_conn – create an SQLAlchemy connection (via an Engine)?
reflect – read the database structure (when required)?
- get_env_dict(env_section_name: str, parent_env: Optional[Dict[str, str]] = None) Dict[str, str] [source]
Gets an operating system environment variable dictionary (
variable: value
mapping) from the config file.- Parameters
env_section_name – config section name, without its “env:” prefix
parent_env – optional starting point (e.g. parent OS environment)
- Returns
a dictionary suitable for use as an OS environment
- get_transation_limiter(session: sqlalchemy.orm.session.Session) crate_anon.common.sql.TransactionSizeLimiter [source]
Returns (or creates and returns) a transaction limiter for a given SQLAlchemy session.
- Parameters
session – SQLAlchemy ORM
Session
- Returns
- hash(text: str) str [source]
Hash text via this NLP definition’s hasher. The hash will be stored in a secret progress database and to detect later changes in the source records.
- Parameters
text – text (typically from the source database) to be hashed
- Returns
the hashed value
- property inputfieldconfigs: Iterable[InputFieldConfig]
Returns all input field configurations used by this NLP definition.
- Returns
list of crate_anon.nlp_manager.input_field_config.InputFieldConfig objects
- property logtag: str
Returns the log tag of the NLP definition (may be used by child processes to provide more information for logs).
- property name: str
Returns the name of the NLP definition.
- nlprp_local_processors(sql_dialect: Optional[str] = None) Dict[str, Any] [source]
Returns a draft list of processors as per the NLPRP list_processors command.
- nlprp_local_processors_json(indent: int = 4, sort_keys: bool = True, sql_dialect: Optional[str] = None) str [source]
Returns a formatted JSON string from
nlprp_list_processors()
. This is primarily for debugging.- Parameters
indent – number of spaces for indentation
sort_keys – sort keys?
sql_dialect – preferred SQL dialect for
tabular_schema
, orNone
for default
- property noncloud_processors: List[BaseNlpParser]
Returns all local (non-cloud) NLP processors used by this NLP definition.
- Returns
list of objects derived from
crate_anon.nlp_manager.base_nlp_parser.BaseNlpParser
- notify_transaction(session: sqlalchemy.orm.session.Session, n_rows: int, n_bytes: int, force_commit: bool = False) None [source]
Tell our transaction limiter about a transaction that’s occurred on one of our databases. This may trigger a COMMIT.
- Parameters
session – SQLAlchemy ORM
Session
that was usedn_rows – number of rows inserted
n_bytes – number of bytes inserted
force_commit – force a COMMIT?
- property now: datetime.datetime
Returns the time this NLP definition was created (in UTC). Used to time-stamp NLP runs.
- property parser: crate_anon.common.extendedconfigparser.ExtendedConfigParser
Returns the
crate_anon.common.extendedconfigparser.ExtendedConfigParser
in use.
- property processors: List[TableMaker]
Returns all NLP processors used by this NLP definition.
- Returns
list of objects derived from
crate_anon.nlp_manager.base_nlp_parser.BaseNlpParser
- property progdb: crate_anon.anonymise.dbholder.DatabaseHolder
Returns the progress database.
- property progressdb_engine: sqlalchemy.engine.base.Engine
Returns an SQLAlchemy Core
Engine
for the progress database.
- property progressdb_metadata: sqlalchemy.sql.schema.MetaData
Returns the SQLAlchemy
MetaData
for the progress database.
- property progressdb_session: sqlalchemy.orm.session.Session
Returns an SQLAlchemy ORM
Session
for the progress database.
- set_echo(echo: bool) None [source]
Set the SQLAlchemy
echo
parameter (to echo SQL) for all our source databases.
- property temporary_tablename: str
Temporary tablename to use.
See the documentation for the NLP config file.
- property uses_cloud_processors: bool
Are any of our processors cloud-based?