12.5.10. crate_anon.nlp_manager.cloud_parser
crate_anon/nlp_manager/cloud_parser.py
Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).
This file is part of CRATE.
CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.
Send text to a cloud-based NLPRP server for processing.
- class crate_anon.nlp_manager.cloud_parser.Cloud(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]
EXTERNAL.
Abstract NLP processor that passes information to a remote (cloud-based) NLP system via the NLPRP protocol. The processor at the other end might be of any kind.
- __init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]
- Parameters:
nlpdef –
crate_anon.nlp_manager.nlp_definition.NlpDefinitioncfg_processor_name – the config section for the processor
commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.
- static data_type_str_to_coltype(data_type_str: str) Type[TypeEngine][source]
Get the SQLAlchemy column type class which fits with the data type specified. Currently we IGNORE self.sql_dialect.
- dest_tables_columns() Dict[str, List[Column]][source]
Describes the destination table(s) that this NLP processor wants to write to.
- Returns:
a dictionary of
{tablename: destination_columns}, wheredestination_columnsis a list of SQLAlchemyColumnobjects.- Return type:
dict
If there is an NLPRP remote table specification (tabular_schema method), we start with that.
Then we add any user-defined tables. If there is both a remote definition and a local definition, the local definition overrides the remote definition. If the destination table info has no columns, however, it is not used for table creation.
There may in principle be other tables too in the local config that are absent in the remote info (unusual!).
- dest_tables_indexes() Dict[str, List[Index]][source]
Describes indexes that this NLP processor suggests for its destination table(s).
- Returns:
a dictionary of
{tablename: indexes}, whereindexesis a list of SQLAlchemyIndexobjects.- Return type:
dict
The NLPRP remote table specification doesn’t include indexing. So all indexing information is from our config file, whether for GATE or cloud processors.
- static get_coltype_parts(coltype_str: str) Tuple[str, str | int][source]
Get root column type and parameter, i.e. for VARCHAR(50) root column type is VARCHAR and parameter is 50.
- get_first_local_tablename() str[source]
Used in some circumstances when the remote processor doesn’t specify a table.
- get_local_from_remote_tablename(remote_tablename: str) str[source]
When the remote server specifies a table name, we need to map it to a local database table name.
Raises KeyError on failure.
- get_otconf_from_type(output_type: str) OutputUserConfig[source]
For a GATE annotation type, or cloud remote table name, return the corresponding OutputUserConfig.
Enforces lower-case lookup.
Will raise KeyError if this fails.
- get_tablename_from_type(output_type: str) str[source]
For simple remote GATE processors, or cloud processors: for a given annotation type (GATE) or remote table name (cloud), return the destination table name.
Enforces lower-case lookup.
Will raise KeyError if this fails.
- get_tabular_schema_tablenames() List[str][source]
Returns the names of the tables in the tabular schema (or an empty list if we do not have a tabular schema).
- is_tabular() bool[source]
Is the format of the schema information given by the remote processor tabular?
- set_procinfo_if_correct(remote_processor: ServerProcessor) None[source]
Checks if a processor dictionary, with all the NLPLP-specified info a processor should have, belongs to this processor. If it does, then we add the information from the procesor dictionary.