14.5.11. crate_anon.nlp_manager.cloud_request
crate_anon/nlp_manager/cloud_request.py
Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).
This file is part of CRATE.
CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.
This module is for sending JSON requests to the NLP Cloud server and receiving responses.
- class crate_anon.nlp_manager.cloud_request.CloudRequest(nlpdef: NlpDefinition, debug_post_request: bool = False, debug_post_response: bool = False)
Class to send requests to the cloud processors and process the results.
- __init__(nlpdef: NlpDefinition, debug_post_request: bool = False, debug_post_response: bool = False) None
- Parameters:
nlpdef –
crate_anon.nlp_manager.nlp_definition.NlpDefinition
- classmethod set_rate_limit(rate_limit_hz: int) None
Creates new methods which are rate limited. Only use this once per run.
Note that this is a classmethod and must be so; if it were instance-based, you could create multiple requests and each would individually be rate-limited, but not collectively.
- class crate_anon.nlp_manager.cloud_request.CloudRequestListProcessors(nlpdef: NlpDefinition, **kwargs)
Request to get processors from the remote.
- __init__(nlpdef: NlpDefinition, **kwargs) None
- Parameters:
nlpdef –
crate_anon.nlp_manager.nlp_definition.NlpDefinition
- get_remote_processors() List[ServerProcessor]
Returns the list of available processors from the remote. If that list has not already been fetched, or unless it was pre-specified upon construction, fetch it from the server.
- class crate_anon.nlp_manager.cloud_request.CloudRequestProcess(crinfo: CloudRunInfo = None, nlpdef: NlpDefinition = None, commit: bool = False, client_job_id: str = None, **kwargs)
Request to process text.
- __init__(crinfo: CloudRunInfo = None, nlpdef: NlpDefinition = None, commit: bool = False, client_job_id: str = None, **kwargs) None
- Parameters:
crinfo – a
crate_anon.nlp_manager.cloud_run_info.CloudRunInfo
nlpdef – a
crate_anon.nlp_manager.nlp_definition.NlpDefinition
commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.
client_job_id – optional string used to group together results into one job.
- add_text(text: str, metadata: Dict[str, Any]) None
Adds text for analysis to the NLP request, with associated metadata.
Tests the size of the request if the text and metadata was added, then adds it if it doesn’t go over the size limit and there are word characters in the text. Also checks if we’ve reached the maximum records per request.
- Parameters:
text – the text
metadata – the metadata (which we expect to get back later)
:raises -
RecordNotPrintable
if the record contains no printable: characters :raises -RecordsPerRequestExceeded
if the request has exceeded the: maximum number of records per request :raises -RequestTooLong
if the request has exceeded the maximum: length
- check_if_ready(cookies: CookieJar | None = None) bool
Checks if the data is ready yet. Assumes queued mode (so
set_queue_id()
should have been called first). If the data is ready, collect it and returnTrue
, else returnFalse
.
- gen_nlp_values() Generator[Tuple[str, Dict[str, Any], Cloud], None, None]
Process response data that we have already obtained from the server, generating individual NLP results.
- Yields:
(tablename, result, processor)
for each result. Thetablename
value is the actual destination database table.- Raises:
KeyError –
- static gen_nlp_values_gate(processor: Cloud, processor_results: List[Dict[str, Any]], metadata: Dict[str, Any], text: str = '') Generator[Tuple[str, Dict[str, Any], Cloud], None, None]
Generates row values from processed GATE data.
Success should have been pre-verified.
- Parameters:
processor – The processor object:
processor_results –
A list of dictionaries (originally from JSON), each representing a row in a table, and each expected to have this format:
{ 'set': set the results belong to (e.g. 'Medication'), 'type': annotation type, 'start': start index, 'end': end index, 'features': { a dictionary of features, e.g. having keys 'drug', 'frequency', etc., with corresponding values } }
metadata – The metadata for a particular document - it would have been sent with the document and the server would have sent it back.
text – The source text itself (optional).
- Yields:
tuples
(output_tablename, formatted_result, processor)
Each instance of
formatted_result
has this format:{ GateFieldNames.TYPE: annotation type, GateFieldNames.SET: set, GateFieldNames.STARTPOS: start index, GateFieldNames.ENDPOS: end index, GateFieldNames.CONTENT: text fragment, FEATURE1: VALUE1, FEATURE2: VALUE2, ... }
- static gen_nlp_values_generic_single_table(processor: Cloud, tablename: str, rows: List[Dict[str, Any]], metadata: Dict[str, Any], column_renames: Dict[str, str] | None = None) Generator[Tuple[str, Dict[str, Any], Cloud], None, None]
Get result values from processed data, where the results object is a list of rows (each row in dictionary format), all for a single table, such as from a remote CRATE server.
Success should have been pre-verified.
- Parameters:
processor – The processor object.
tablename – The table name to use.
rows – List of NLPRP results for one processor. Each result represents a row of a table and is in dictionary format.
metadata – The metadata for a particular document - it would have been sent with the document and the server would have sent it back.
column_renames – Column renames to apply.
Yields
(output_tablename, formatted_result, processor)
.
- process_all() None
Puts the NLP data into the database. Very similar to
crate_anon.nlp_manager.base_nlp_parser.BaseNlpParser.process()
, but deals with all relevant processors at once.
- send_process_request(queue: bool, cookies: CookieJar | None = None, include_text_in_reply: bool = True) None
Sends a request to the server to process the text we have stored.
- Parameters:
queue – queue the request for back-end processing (rather than waiting for an immediate reply)?
cookies – optional
http.cookiejar.CookieJar
include_text_in_reply – should the server include the source text in the reply?
- set_queue_id(queue_id: str) None
Sets the queue_id. To be used when you’re not actually sending a request this time.
- class crate_anon.nlp_manager.cloud_request.CloudRequestQueueManagement(nlpdef: NlpDefinition, debug_post_request: bool = False, debug_post_response: bool = False)
Request to manage the queue in some way.
- delete_all_from_queue() None
Delete ALL pending requests from the server’s queue. Use with caution.
- delete_from_queue(queue_ids: List[str]) None
Delete pending requests from the server’s queue for queue_ids specified.
- exception crate_anon.nlp_manager.cloud_request.RecordNotPrintable
- exception crate_anon.nlp_manager.cloud_request.RecordsPerRequestExceeded
- exception crate_anon.nlp_manager.cloud_request.RequestTooLong
- crate_anon.nlp_manager.cloud_request.extract_nlprp_top_level_results(nlp_data: Dict[str, str | int | float | bool | None | Dict | List]) List
Checks that the top-level NLP response contains an appropriate “results” object, or raises KeyError or ValueError.
Returns the list result, which is a list of results per document.
- crate_anon.nlp_manager.cloud_request.extract_processor_data_list(docresult: Dict[str, str | int | float | bool | None | Dict | List]) List[Dict[str, str | int | float | bool | None | Dict | List]]
Check and extract a list of per-processor results from a single-document NLPRP result.
- crate_anon.nlp_manager.cloud_request.parse_nlprp_docresult_metadata(docresult: Dict[str, str | int | float | bool | None | Dict | List]) Tuple[Dict[str, Any], int | None, str | None, str]
Check that this NLPRP document result validly contains metadata, and that metadata contains things we always send. Extract key components. Provide helpful error message on failure.
- Returns:
tuple (metadata, pkval, pkstr, srchhash)
- crate_anon.nlp_manager.cloud_request.parse_per_processor_data(processor_data: Dict[str, Any]) Tuple
Return a tuple of mandatory results from NLPRP per-processor data, or raise KeyError.
- crate_anon.nlp_manager.cloud_request.report_processor_errors(processor_data: Dict[str, Any]) None
Should only be called if there has been an error. Reports the error(s) to the log.
- crate_anon.nlp_manager.cloud_request.to_json_str(json_structure: str | int | float | bool | None | Dict | List) str
Converts a Python object to a JSON string.
- crate_anon.nlp_manager.cloud_request.utf8len(text: str) int
Returns the length of text once encoded in UTF-8.