14.5.28. crate_anon.nlp_manager.parse_substance_misuse
crate_anon/nlp_manager/parse_substance_misuse.py
Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).
This file is part of CRATE.
CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.
Python regex-based NLP processors for substance misuse.
- class crate_anon.nlp_manager.parse_substance_misuse.AlcoholUnits(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]
SUBSTANCE MISUSE.
Alcohol consumption, specified explicitly as (UK) units per day or per week, or via non-numeric references to not drinking any.
Output is in UK units per week. A UK unit is 10 ml of ethanol [1] [2]. UK NHS guidelines used to be “per week” and remain broadly week-based [1].
It doesn’t attempt any understanding of other alcohol descriptions (e.g. “pints of beer”, “glasses of wine”, “bottles of vodka”) so is expected to apply where a clinician has converted a (potentially mixed) alcohol description to a units-per-week calculation.
- __init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None [source]
Init function for NumericalResultParser.
- Parameters:
nlpdef – A
crate_anon.nlp_manager.nlp_definition.NlpDefinition
.cfg_processor_name – Config section name in the NLP config file.
variable – Used by subclasses as the record value for
variable_name
.target_unit – Fieldname used for the primary output quantity.
regex_str_for_debugging – String form of regex, for debugging.
commit – Force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.
Subclasses will extend this method.
- parse(text: str, debug: bool = False) Generator[Tuple[str, Dict[str, Any]], None, None] [source]
Parse for two regexes which operate slightly differently.
- parse_alcohol_none(text: str, debug: bool = False) Generator[Tuple[str, Dict[str, Any]], None, None] [source]
Deal with references to not drinking any alcohol (except those referred to as e.g. “0 units per week”, which will be picked up by the units-per-week function – that will be rare!).
- parse_alcohol_units(text: str, debug: bool = False) Generator[Tuple[str, Dict[str, Any]], None, None] [source]
We amend SimpleNumericalResultParser.parse() to deal with tense a bit better (e.g. “used to drink”). Comments from that version not repeated. That version also shortened a bit since we guarantee some aspects of the flags.
- class crate_anon.nlp_manager.parse_substance_misuse.AlcoholUnitsValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]
Validator for AlcoholUnits (see help for explanation).
- classmethod get_variablename_regexstrlist() Tuple[str, List[str]] [source]
To be overridden.
- Returns:
(validated_variable_name, regex_str_list)
, where:- regex_str_list:
List of regular expressions, each in string format.
This class operates with compiled regexes having this group format (capture groups in this sequence):
variable
- validated_variable:
used to set our
variable
attribute and thus the value of the fieldvariable_name
in the NLP output; for example, ifvalidated_variable == 'crp'
, then thevariable_name
field will be set tocrp_validator
.
- Return type:
tuple