14.5.22. crate_anon.nlp_manager.parse_biochemistry

crate_anon/nlp_manager/parse_biochemistry.py


Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.


Python regex-based NLP processors for biochemistry data.

All inherit from crate_anon.nlp_manager.regex_parser.SimpleNumericalResultParser and are constructed with these arguments:

nlpdef:

a crate_anon.nlp_manager.nlp_definition.NlpDefinition

cfgsection:

the name of a CRATE NLP config file section (from which we may choose to get extra config information)

commit:

force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

class crate_anon.nlp_manager.parse_biochemistry.ALT(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

BIOCHEMISTRY (LFTs).

Alanine aminotransferase (ALT), a.k.a. alanine transaminase (ALT). Units are U/L.

A.k.a. serum glutamate-pyruvate transaminase (SGPT), or serum glutamate-pyruvic transaminase (SGPT), but not a.k.a. those in recent memory!

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]
Parameters:
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_biochemistry.ALTValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for ALT (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_biochemistry.Albumin(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

BIOCHEMISTRY (LFTs).

Albumin (Alb). Units are g/L.

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]
Parameters:
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_biochemistry.AlbuminValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for Albumin (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_biochemistry.AlkPhos(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

BIOCHEMISTRY (LFTs/BFTs).

Alkaline phosphatase (ALP, AlkP, AlkPhos). Units are U/L.

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]
Parameters:
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_biochemistry.AlkPhosValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for AlkPhos (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_biochemistry.Bilirubin(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

BIOCHEMISTRY (LFTs).

Total bilirubin. Units are μM.

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]
Parameters:
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_biochemistry.BilirubinValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for Bilirubin (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_biochemistry.Creatinine(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

BIOCHEMISTRY (U&E).

Creatinine. Default units are micromolar (SI); also supports mg/dL.

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]
Parameters:
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_biochemistry.CreatinineValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for Creatinine (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_biochemistry.Crp(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

BIOCHEMISTRY.

C-reactive protein (CRP). Default units are mg/L; also supports mg/dL.

CRP units:

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]
Parameters:
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_biochemistry.CrpValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for Crp (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_biochemistry.GammaGT(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

BIOCHEMISTRY (LFTs).

Gamma-glutamyl transferase (gGT), in U/L.

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]
Parameters:
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_biochemistry.GammaGTValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for GammaGT (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_biochemistry.Glucose(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

BIOCHEMISTRY.

Glucose. Default units are mM; also supports mg/dL.

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]
Parameters:
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_biochemistry.GlucoseValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for Glucose (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_biochemistry.HDLCholesterol(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

BIOCHEMISTRY (LIPID PROFILE).

High-density lipoprotein (HDL) cholesterol. Default units are mM; also supports mg/dL.

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]
Parameters:
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_biochemistry.HDLCholesterolValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for HDLCholesterol (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_biochemistry.HbA1c(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

BIOCHEMISTRY.

Glycosylated (glycated) haemoglobin (HbA1c). Default units are mmol/mol; also supports %.

Note: HbA1 is different (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2541274).

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]
Parameters:
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_biochemistry.HbA1cValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for HbA1c (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_biochemistry.LDLCholesterol(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

BIOCHEMISTRY (LIPID PROFILE).

Low density lipoprotein (LDL) cholesterol. Default units are mM; also supports mg/dL.

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]
Parameters:
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_biochemistry.LDLCholesterolValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for LDLCholesterol (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_biochemistry.Lithium(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

BIOCHEMISTRY (THERAPEUTIC DRUG MONITORING).

Lithium (Li) levels (for blood tests, not doses), in mM.

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]
Parameters:
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_biochemistry.LithiumValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for Lithium (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_biochemistry.Potassium(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

BIOCHEMISTRY (U&E).

Potassium (K), in mM.

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]
Parameters:
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_biochemistry.PotassiumValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for Potassium (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_biochemistry.Sodium(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

BIOCHEMISTRY (U&E).

Sodium (Na), in mM.

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]
Parameters:
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_biochemistry.SodiumValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for Sodium (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_biochemistry.TotalCholesterol(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

BIOCHEMISTRY (LIPID PROFILE).

Total or undifferentiated cholesterol. Default units are mM; also supports mg/dL.

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]
Parameters:
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_biochemistry.TotalCholesterolValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for TotalCholesterol (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_biochemistry.Triglycerides(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

BIOCHEMISTRY (LIPID PROFILE).

Triglycerides. Default units are mM; also supports mg/dL.

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]
Parameters:
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_biochemistry.TriglyceridesValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for Triglycerides (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_biochemistry.Tsh(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

BIOCHEMISTRY (ENDOCRINOLOGY).

Thyroid-stimulating hormone (TSH), in mIU/L (or μIU/mL).

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]
Parameters:
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_biochemistry.TshValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for TSH (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

class crate_anon.nlp_manager.parse_biochemistry.Urea(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

BIOCHEMISTRY (U&E).

Urea, in mM.

__init__(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False) None[source]
Parameters:
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters:

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_biochemistry.UreaValidator(nlpdef: NlpDefinition | None, cfg_processor_name: str | None, commit: bool = False)[source]

Validator for Urea (see help for explanation).

classmethod get_variablename_regexstrlist() Tuple[str, List[str]][source]

To be overridden.

Returns:

(validated_variable_name, regex_str_list), where:

regex_str_list:

List of regular expressions, each in string format.

This class operates with compiled regexes having this group format (capture groups in this sequence):

  • variable

validated_variable:

used to set our variable attribute and thus the value of the field variable_name in the NLP output; for example, if validated_variable == 'crp', then the variable_name field will be set to crp_validator.

Return type:

tuple

crate_anon.nlp_manager.parse_biochemistry.hba1c_mmol_per_mol_from_percent(percent: float | str) float | None[source]

Convert an HbA1c value from old percentage units – DCCT (Diabetes Control and Complications Trial), UKPDS (United Kingdom Prospective Diabetes Study) or NGSP (National Glycohemoglobin Standardization Program) – to newer IFCC (International Federation of Clinical Chemistry) mmol/mol units (mmol HbA1c / mol Hb).

Parameters:

percent – DCCT value as a percentage

Returns:

IFCC value in mmol/mol

Example: 5% becomes 31.1 mmol/mol.

By Emanuele Osimo, Feb 2019. Some modifications by Rudolf Cardinal, Feb 2019.

References:

Note also that you may see eAG values (estimated average glucose), in mmol/L or mg/dl; see http://www.ngsp.org/A1ceAG.asp; these are not direct measurements of HbA1c.