14.5.18. crate_anon.nlp_manager.nlp_definition

crate_anon/nlp_manager/nlp_definition.py


Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.


NLP definition class.

class crate_anon.nlp_manager.nlp_definition.NlpDefinition(nlpname: str, logtag: str = '')[source]

Class representing an NLP master definition as read from config file.

An NLP definition represents the combination of

  • one or more NLP processors (e.g. “CRATE’s C-reactive protein finder”)

  • one or more input fields in the source database

The NLP definition can therefore be used to say “run this set of NLP processors over this set of textual fields in my database”.

See the documentation for the NLP config file.

__init__(nlpname: str, logtag: str = '') None[source]

Read config from file.

Parameters
  • nlpname – config section name for this NLP definition

  • logtag – text that may be passed to child processes to identify the NLP definition in their log output

commit(session: sqlalchemy.orm.session.Session) None[source]

Executes a COMMIT on a specific session.

Parameters

session – SQLAlchemy ORM Session

commit_all() None[source]

Execute a COMMIT on all databases (all destination database and the progress database).

get_cloud_config() Optional[crate_anon.nlp_manager.cloud_config.CloudConfig][source]

Returns the crate_anon.nlp_manager.cloud_config.CloudConfig object associated with this NLP definition, or None if there isn’t one.

get_cloud_config_or_raise() crate_anon.nlp_manager.cloud_config.CloudConfig[source]

Returns the crate_anon.nlp_manager.cloud_config.CloudConfig object associated with this NLP definition, or raise ValueError if there isn’t one.

get_config_section(section: str) crate_anon.common.extendedconfigparser.ConfigSection[source]

Returns a crate_anon.common.extendedconfigparser.ConfigSection referring to a (potentially different) section.

Parameters

section – New section name.

get_database(name_and_cfg_section: str, with_session: bool = True, with_conn: bool = False, reflect: bool = False) crate_anon.anonymise.dbholder.DatabaseHolder[source]

Returns a crate_anon.anonymise.dbholder.DatabaseHolder from the config file, containing information abuot a database.

Parameters
  • name_and_cfg_section – string that is the name of the database, and also the config file section name describing the database

  • with_session – create an SQLAlchemy Session?

  • with_conn – create an SQLAlchemy connection (via an Engine)?

  • reflect – read the database structure (when required)?

get_env_dict(env_section_name: str, parent_env: Optional[Dict[str, str]] = None) Dict[str, str][source]

Gets an operating system environment variable dictionary (variable: value mapping) from the config file.

Parameters
  • env_section_name – config section name, without its “env:” prefix

  • parent_env – optional starting point (e.g. parent OS environment)

Returns

a dictionary suitable for use as an OS environment

get_transation_limiter(session: sqlalchemy.orm.session.Session) crate_anon.common.sql.TransactionSizeLimiter[source]

Returns (or creates and returns) a transaction limiter for a given SQLAlchemy session.

Parameters

session – SQLAlchemy ORM Session

Returns

a crate_anon.common.sql.TransactionSizeLimiter

hash(text: str) str[source]

Hash text via this NLP definition’s hasher. The hash will be stored in a secret progress database and to detect later changes in the source records.

Parameters

text – text (typically from the source database) to be hashed

Returns

the hashed value

property inputfieldconfigs: Iterable[InputFieldConfig]

Returns all input field configurations used by this NLP definition.

Returns

list of crate_anon.nlp_manager.input_field_config.InputFieldConfig objects

property logtag: str

Returns the log tag of the NLP definition (may be used by child processes to provide more information for logs).

property name: str

Returns the name of the NLP definition.

nlprp_local_processors(sql_dialect: Optional[str] = None) Dict[str, Any][source]

Returns a draft list of processors as per the NLPRP list_processors command.

nlprp_local_processors_json(indent: int = 4, sort_keys: bool = True, sql_dialect: Optional[str] = None) str[source]

Returns a formatted JSON string from nlprp_list_processors(). This is primarily for debugging.

Parameters
  • indent – number of spaces for indentation

  • sort_keys – sort keys?

  • sql_dialect – preferred SQL dialect for tabular_schema, or None for default

property noncloud_processors: List[BaseNlpParser]

Returns all local (non-cloud) NLP processors used by this NLP definition.

Returns

list of objects derived from crate_anon.nlp_manager.base_nlp_parser.BaseNlpParser

notify_transaction(session: sqlalchemy.orm.session.Session, n_rows: int, n_bytes: int, force_commit: bool = False) None[source]

Tell our transaction limiter about a transaction that’s occurred on one of our databases. This may trigger a COMMIT.

Parameters
  • session – SQLAlchemy ORM Session that was used

  • n_rows – number of rows inserted

  • n_bytes – number of bytes inserted

  • force_commit – force a COMMIT?

property now: datetime.datetime

Returns the time this NLP definition was created (in UTC). Used to time-stamp NLP runs.

property parser: crate_anon.common.extendedconfigparser.ExtendedConfigParser

Returns the crate_anon.common.extendedconfigparser.ExtendedConfigParser in use.

property processors: List[TableMaker]

Returns all NLP processors used by this NLP definition.

Returns

list of objects derived from crate_anon.nlp_manager.base_nlp_parser.BaseNlpParser

property progdb: crate_anon.anonymise.dbholder.DatabaseHolder

Returns the progress database.

property progressdb_engine: sqlalchemy.engine.base.Engine

Returns an SQLAlchemy Core Engine for the progress database.

property progressdb_metadata: sqlalchemy.sql.schema.MetaData

Returns the SQLAlchemy MetaData for the progress database.

property progressdb_session: sqlalchemy.orm.session.Session

Returns an SQLAlchemy ORM Session for the progress database.

set_echo(echo: bool) None[source]

Set the SQLAlchemy echo parameter (to echo SQL) for all our source databases.

property temporary_tablename: str

Temporary tablename to use.

See the documentation for the NLP config file.

property uses_cloud_processors: bool

Are any of our processors cloud-based?

crate_anon.nlp_manager.nlp_definition.demo_nlp_config() str[source]

Returns a demo NLP config file for CRATE.

crate_anon.nlp_manager.nlp_definition.get_nlp_config_filename_or_exit() str[source]

Returns the config filename, from our environment variable. If we can’t retrieve it, perform a hard exit.