14.5.28. crate_anon.nlp_manager.parse_substance_misuse

crate_anon/nlp_manager/parse_substance_misuse.py


Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.


Python regex-based NLP processors for substance misuse.

class crate_anon.nlp_manager.parse_substance_misuse.AlcoholUnits(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

SUBSTANCE MISUSE.

Alcohol consumption, specified explicitly as (UK) units per day or per week, or via non-numeric references to not drinking any.

  • Output is in UK units per week. A UK unit is 10 ml of ethanol [1] [2]. UK NHS guidelines used to be “per week” and remain broadly week-based [1].

  • It doesn’t attempt any understanding of other alcohol descriptions (e.g. “pints of beer”, “glasses of wine”, “bottles of vodka”) so is expected to apply where a clinician has converted a (potentially mixed) alcohol description to a units-per-week calculation.

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]

Init function for NumericalResultParser.

Parameters:
  • nlpdef – A crate_anon.nlp_manager.nlp_definition.NlpDefinition.

  • cfg_processor_name – Config section name in the NLP config file.

  • variable – Used by subclasses as the record value for variable_name.

  • target_unit – Fieldname used for the primary output quantity.

  • regex_str_for_debugging – String form of regex, for debugging.

  • commit – Force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

Subclasses will extend this method.

parse(text: str, debug: bool = False) Generator[Tuple[str, Dict[str, Any]], None, None][source]

Parse for two regexes which operate slightly differently.

parse_alcohol_none(text: str, debug: bool = False) Generator[Tuple[str, Dict[str, Any]], None, None][source]

Deal with references to not drinking any alcohol (except those referred to as e.g. “0 units per week”, which will be picked up by the units-per-week function – that will be rare!).

parse_alcohol_units(text: str, debug: bool = False) Generator[Tuple[str, Dict[str, Any]], None, None][source]

We amend SimpleNumericalResultParser.parse() to deal with tense a bit better (e.g. “used to drink”). Comments from that version not repeated. That version also shortened a bit since we guarantee some aspects of the flags.

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_substance_misuse.AlcoholUnitsValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for AlcoholUnits (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple