14.5.23. crate_anon.nlp_manager.parse_clinical
crate_anon/nlp_manager/parse_clinical.py
Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).
This file is part of CRATE.
CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.
Python regex-based NLP processors for clinical assessment data.
Most inherit from
crate_anon.nlp_manager.regex_parser.SimpleNumericalResultParser
and
are constructed with these arguments:
- nlpdef:
- cfgsection:
the name of a CRATE NLP config file section (from which we may choose to get extra config information)
- commit:
force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.
± these:
- debug:
show debugging information
- class crate_anon.nlp_manager.parse_clinical.Bmi(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]
CLINICAL EXAMINATION.
Body mass index (BMI), in kg / m^2.
- __init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None [source]
- Parameters:
nlpdef –
crate_anon.nlp_manager.nlp_definition.NlpDefinition
cfg_processor_name – config section suffix in the NLP config file
regex_str –
Regular expression, in string format.
This class operates with compiled regexes having this group format (capture groups in this sequence):
variable
tense_indicator
relation
value
units
variable – used as the record value for
variable_name
target_unit – fieldname used for the primary output quantity
units_to_factor –
dictionary, mapping
FROM (compiled regex for units)
TO EITHER a float (multiple) to multiply those units by, to get the preferred unit
OR a function taking a text parameter and returning a float value in preferred unit
Any units present in the regex but absent from
units_to_factor
will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.take_absolute –
Convert negative values to positive ones? Typical text requiring this option might look like:
CRP-4 CRP-106 CRP -97 Blood results for today as follows: Na- 142, K-4.1, ...
… occurring in 23 out of 8054 hits for CRP of one test set in our data.
For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.
commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.
debug – print the regex?
- class crate_anon.nlp_manager.parse_clinical.BmiValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]
Validator for Bmi (see help for explanation).
- classmethod get_variablename_regexstrlist() Tuple[str, List[str]] [source]
To be overridden.
- Returns:
(validated_variable_name, regex_str_list)
, where:- regex_str_list:
List of regular expressions, each in string format.
This class operates with compiled regexes having this group format (capture groups in this sequence):
variable
- validated_variable:
used to set our
variable
attribute and thus the value of the fieldvariable_name
in the NLP output; for example, ifvalidated_variable == 'crp'
, then thevariable_name
field will be set tocrp_validator
.
- Return type:
tuple
- class crate_anon.nlp_manager.parse_clinical.Bp(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]
CLINICAL EXAMINATION.
Blood pressure, in mmHg. (Systolic and diastolic.)
- __init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None [source]
__init__
function forTableMaker
.- Parameters:
nlpdef – An instance of
crate_anon.nlp_manager.nlp_definition.NlpDefinition
.cfg_processor_name – The name of a CRATE NLP config file section, TO WHICH we will add a
processor:
prefix (from which section we may choose to get extra config information).commit – Force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.
friendly_name – Friendly name for the parser.
- dest_tables_columns() Dict[str, List[Column]] [source]
Describes the destination table(s) that this NLP processor wants to write to.
- Returns:
a dictionary of
{tablename: destination_columns}
, wheredestination_columns
is a list of SQLAlchemyColumn
objects.- Return type:
dict
- parse(text: str, debug: bool = False) Generator[Tuple[str, Dict[str, Any]], None, None] [source]
Parser for BP. Specialized because we’re fetching two numbers.
- class crate_anon.nlp_manager.parse_clinical.BpValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]
Validator for Bp (see help for explanation).
- classmethod get_variablename_regexstrlist() Tuple[str, List[str]] [source]
To be overridden.
- Returns:
(validated_variable_name, regex_str_list)
, where:- regex_str_list:
List of regular expressions, each in string format.
This class operates with compiled regexes having this group format (capture groups in this sequence):
variable
- validated_variable:
used to set our
variable
attribute and thus the value of the fieldvariable_name
in the NLP output; for example, ifvalidated_variable == 'crp'
, then thevariable_name
field will be set tocrp_validator
.
- Return type:
tuple
- class crate_anon.nlp_manager.parse_clinical.Height(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False, debug: bool = False)[source]
CLINICAL EXAMINATION.
Height. Handles metric (e.g. “1.8m”) and imperial (e.g. “5 ft 2 in”).
- __init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False, debug: bool = False) None [source]
Init function for NumericalResultParser.
- Parameters:
nlpdef – A
crate_anon.nlp_manager.nlp_definition.NlpDefinition
.cfg_processor_name – Config section name in the NLP config file.
variable – Used by subclasses as the record value for
variable_name
.target_unit – Fieldname used for the primary output quantity.
regex_str_for_debugging – String form of regex, for debugging.
commit – Force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.
Subclasses will extend this method.
- class crate_anon.nlp_manager.parse_clinical.HeightValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]
Validator for Height (see help for explanation).
- classmethod get_variablename_regexstrlist() Tuple[str, List[str]] [source]
To be overridden.
- Returns:
(validated_variable_name, regex_str_list)
, where:- regex_str_list:
List of regular expressions, each in string format.
This class operates with compiled regexes having this group format (capture groups in this sequence):
variable
- validated_variable:
used to set our
variable
attribute and thus the value of the fieldvariable_name
in the NLP output; for example, ifvalidated_variable == 'crp'
, then thevariable_name
field will be set tocrp_validator
.
- Return type:
tuple
- class crate_anon.nlp_manager.parse_clinical.Weight(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False, debug: bool = False)[source]
CLINICAL EXAMINATION.
Weight. Handles metric (e.g. “57kg”) and imperial (e.g. “10 st 2 lb”). Requires units to be specified.
- __init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False, debug: bool = False) None [source]
Init function for NumericalResultParser.
- Parameters:
nlpdef – A
crate_anon.nlp_manager.nlp_definition.NlpDefinition
.cfg_processor_name – Config section name in the NLP config file.
variable – Used by subclasses as the record value for
variable_name
.target_unit – Fieldname used for the primary output quantity.
regex_str_for_debugging – String form of regex, for debugging.
commit – Force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.
Subclasses will extend this method.
- class crate_anon.nlp_manager.parse_clinical.WeightValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]
Validator for Weight (see help for explanation).
- classmethod get_variablename_regexstrlist() Tuple[str, List[str]] [source]
To be overridden.
- Returns:
(validated_variable_name, regex_str_list)
, where:- regex_str_list:
List of regular expressions, each in string format.
This class operates with compiled regexes having this group format (capture groups in this sequence):
variable
- validated_variable:
used to set our
variable
attribute and thus the value of the fieldvariable_name
in the NLP output; for example, ifvalidated_variable == 'crp'
, then thevariable_name
field will be set tocrp_validator
.
- Return type:
tuple