14.5.23. crate_anon.nlp_manager.parse_clinical

crate_anon/nlp_manager/parse_clinical.py


Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.


Python regex-based NLP processors for clinical assessment data.

Most inherit from crate_anon.nlp_manager.regex_parser.SimpleNumericalResultParser and are constructed with these arguments:

nlpdef:

a crate_anon.nlp_manager.nlp_definition.NlpDefinition

cfgsection:

the name of a CRATE NLP config file section (from which we may choose to get extra config information)

commit:

force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

± these:

debug:

show debugging information

class crate_anon.nlp_manager.parse_clinical.Bmi(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

CLINICAL EXAMINATION.

Body mass index (BMI), in kg / m^2.

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]
Parameters:
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_clinical.BmiValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for Bmi (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_clinical.Bp(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

CLINICAL EXAMINATION.

Blood pressure, in mmHg. (Systolic and diastolic.)

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]

__init__ function for TableMaker.

Parameters:
  • nlpdef – An instance of crate_anon.nlp_manager.nlp_definition.NlpDefinition.

  • cfg_processor_name – The name of a CRATE NLP config file section, TO WHICH we will add a processor: prefix (from which section we may choose to get extra config information).

  • commit – Force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • friendly_name – Friendly name for the parser.

dest_tables_columns() Dict[str, List[Column]][source]

Describes the destination table(s) that this NLP processor wants to write to.

Returns:

a dictionary of {tablename: destination_columns}, where destination_columns is a list of SQLAlchemy Column objects.

Return type:

dict

parse(text: str, debug: bool = False) Generator[Tuple[str, Dict[str, Any]], None, None][source]

Parser for BP. Specialized because we’re fetching two numbers.

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

test_bp_parser(test_expected_list: List[Tuple[str, List[Tuple[float, float]]]], verbose: bool = False) None[source]

Called by test().

Parameters:
  • test_expected_list – tuple source_text, expected_values where expected_values is a list of tuples like sbp, dbp.

  • verbose – be verbose?

class crate_anon.nlp_manager.parse_clinical.BpValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for Bp (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_clinical.Height(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False, debug: bool = False)[source]

CLINICAL EXAMINATION.

Height. Handles metric (e.g. “1.8m”) and imperial (e.g. “5 ft 2 in”).

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False, debug: bool = False) None[source]

Init function for NumericalResultParser.

Parameters:
  • nlpdef – A crate_anon.nlp_manager.nlp_definition.NlpDefinition.

  • cfg_processor_name – Config section name in the NLP config file.

  • variable – Used by subclasses as the record value for variable_name.

  • target_unit – Fieldname used for the primary output quantity.

  • regex_str_for_debugging – String form of regex, for debugging.

  • commit – Force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

Subclasses will extend this method.

parse(text: str, debug: bool = False) Generator[Tuple[str, Dict[str, Any]], None, None][source]

Parser for Height. Specialized for complex unit conversion.

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_clinical.HeightValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for Height (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_clinical.Weight(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False, debug: bool = False)[source]

CLINICAL EXAMINATION.

Weight. Handles metric (e.g. “57kg”) and imperial (e.g. “10 st 2 lb”). Requires units to be specified.

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False, debug: bool = False) None[source]

Init function for NumericalResultParser.

Parameters:
  • nlpdef – A crate_anon.nlp_manager.nlp_definition.NlpDefinition.

  • cfg_processor_name – Config section name in the NLP config file.

  • variable – Used by subclasses as the record value for variable_name.

  • target_unit – Fieldname used for the primary output quantity.

  • regex_str_for_debugging – String form of regex, for debugging.

  • commit – Force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

Subclasses will extend this method.

parse(text: str, debug: bool = False) Generator[Tuple[str, Dict[str, Any]], None, None][source]

Parser for Weight. Specialized for complex unit conversion.

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_clinical.WeightValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for Weight (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple