12.5.23. crate_anon.nlp_manager.parse_clinical

crate_anon/nlp_manager/parse_clinical.py

Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.

Python regex-based NLP processors for clinical assessment data.

Most inherit from crate_anon.nlp_manager.regex_parser.SimpleNumericalResultParser and are constructed with these arguments:

nlpdef:: a crate_anon.nlp_manager.nlp_definition.NlpDefinition
cfgsection:: the name of a CRATE NLP config file section (from which we may choose to get extra config information)
commit:: force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

± these:

debug:: show debugging information

class crate_anon.nlp_manager.parse_clinical.Bmi(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

CLINICAL EXAMINATION.

Body mass index (BMI), in kg / m^2.

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) → None[source]

Parameters:

nlpdef – crate_anon.nlp_manager.nlp_definition.NlpDefinition
cfg_processor_name – config section suffix in the NLP config file
regex_str –
Regular expression, in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):
- variable
- tense_indicator
- relation
- value
- units
variable – used as the record value for variable_name
target_unit – fieldname used for the primary output quantity
units_to_factor –
dictionary, mapping
- FROM (compiled regex for units)
- TO EITHER a float (multiple) to multiply those units by, to get the preferred unit
- OR a function taking a text parameter and returning a float value in preferred unit
Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.
take_absolute –
Convert negative values to positive ones? Typical text requiring this option might look like:
```
CRP-4
CRP-106
CRP -97
Blood results for today as follows: Na- 142, K-4.1, ...
```
… occurring in 23 out of 8054 hits for CRP of one test set in our data.

For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.
commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.
debug – print the regex?

test(verbose: bool = False) → None[source]

Performs a self-test on the NLP processor.

Parameters:: verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_clinical.BmiValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for Bmi (see help for explanation).

classmethod get_variablename_regexstrlist() → Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_clinical.Bp(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

CLINICAL EXAMINATION.

Blood pressure, in mmHg. (Systolic and diastolic.)

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) → None[source]

__init__ function for TableMaker.

Parameters:

nlpdef – An instance of crate_anon.nlp_manager.nlp_definition.NlpDefinition.
cfg_processor_name – The name of a CRATE NLP config file section, TO WHICH we will add a processor: prefix (from which section we may choose to get extra config information).
commit – Force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.
friendly_name – Friendly name for the parser.

dest_tables_columns() → Dict[str, List[Column]][source]

Describes the destination table(s) that this NLP processor wants to write to.

Returns:: a dictionary of {tablename: destination_columns}, where destination_columns is a list of SQLAlchemy Column objects.
Return type:: dict

parse(text: str, debug: bool = False) → Generator[Tuple[str, Dict[str, Any]], None, None][source]: Parser for BP. Specialized because we’re fetching two numbers.

test(verbose: bool = False) → None[source]

Performs a self-test on the NLP processor.

Parameters:: verbose – Be verbose?

This is an abstract method that is subclassed.

test_bp_parser(test_expected_list: List[Tuple[str, List[Tuple[float, float]]]], verbose: bool = False) → None[source]

Called by test().

Parameters:

test_expected_list – tuple source_text, expected_values where expected_values is a list of tuples like sbp, dbp.
verbose – be verbose?

class crate_anon.nlp_manager.parse_clinical.BpValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for Bp (see help for explanation).

classmethod get_variablename_regexstrlist() → Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_clinical.Height(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False, debug: bool = False)[source]

CLINICAL EXAMINATION.

Height. Handles metric (e.g. “1.8m”) and imperial (e.g. “5 ft 2 in”).

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False, debug: bool = False) → None[source]

Init function for NumericalResultParser.

Parameters:

nlpdef – A crate_anon.nlp_manager.nlp_definition.NlpDefinition.
cfg_processor_name – Config section name in the NLP config file.
variable – Used by subclasses as the record value for variable_name.
target_unit – Fieldname used for the primary output quantity.
regex_str_for_debugging – String form of regex, for debugging.
commit – Force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

Subclasses will extend this method.

parse(text: str, debug: bool = False) → Generator[Tuple[str, Dict[str, Any]], None, None][source]: Parser for Height. Specialized for complex unit conversion.

test(verbose: bool = False) → None[source]

Performs a self-test on the NLP processor.

Parameters:: verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_clinical.HeightValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for Height (see help for explanation).

classmethod get_variablename_regexstrlist() → Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_clinical.Weight(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False, debug: bool = False)[source]

CLINICAL EXAMINATION.

Weight. Handles metric (e.g. “57kg”) and imperial (e.g. “10 st 2 lb”). Requires units to be specified.

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False, debug: bool = False) → None[source]

Init function for NumericalResultParser.

Parameters:

nlpdef – A crate_anon.nlp_manager.nlp_definition.NlpDefinition.
cfg_processor_name – Config section name in the NLP config file.
variable – Used by subclasses as the record value for variable_name.
target_unit – Fieldname used for the primary output quantity.
regex_str_for_debugging – String form of regex, for debugging.
commit – Force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

Subclasses will extend this method.

parse(text: str, debug: bool = False) → Generator[Tuple[str, Dict[str, Any]], None, None][source]: Parser for Weight. Specialized for complex unit conversion.

test(verbose: bool = False) → None[source]

Performs a self-test on the NLP processor.

Parameters:: verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_clinical.WeightValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for Weight (see help for explanation).

classmethod get_variablename_regexstrlist() → Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple