14.5.26. crate_anon.nlp_manager.parse_haematology

crate_anon/nlp_manager/parse_haematology.py


Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.


Python regex-based NLP processors for haematology tests.

All inherit from crate_anon.nlp_manager.regex_parser.NumeratorOutOfDenominatorParser and are constructed with these arguments:

nlpdef:

a crate_anon.nlp_manager.nlp_definition.NlpDefinition

cfgsection:

the name of a CRATE NLP config file section (from which we may choose to get extra config information)

commit:

force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

class crate_anon.nlp_manager.parse_haematology.Basophils(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

HAEMATOLOGY (FBC).

Basophil count (absolute). Default units are 10^9 / L; also supports cells/mm^3 = cells/μL.

__init__(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False) None[source]

__init__ function for WbcBase.

Parameters
  • nlpdef – a crate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – the name of a CRATE NLP config file section (from which we may choose to get extra config information)

  • cell_type_regex_text – text for regex for the cell type, representing e.g. “monocytes” or “basophils”

  • variable – used as the record value for variable_name

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

test(verbose=False) None[source]

Performs a self-test on the NLP processor.

Parameters

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_haematology.BasophilsValidator(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

Validator for Basophils (see help for explanation).

class crate_anon.nlp_manager.parse_haematology.Eosinophils(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

HAEMATOLOGY (FBC).

Eosinophil count (absolute). Default units are 10^9 / L; also supports cells/mm^3 = cells/μL.

__init__(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False) None[source]

__init__ function for WbcBase.

Parameters
  • nlpdef – a crate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – the name of a CRATE NLP config file section (from which we may choose to get extra config information)

  • cell_type_regex_text – text for regex for the cell type, representing e.g. “monocytes” or “basophils”

  • variable – used as the record value for variable_name

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_haematology.EosinophilsValidator(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

Validator for Eosinophils (see help for explanation).

class crate_anon.nlp_manager.parse_haematology.Esr(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

HAEMATOLOGY (ESR).

Erythrocyte sedimentation rate (ESR), in mm/h.

__init__(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False) None[source]
Parameters
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_haematology.EsrValidator(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

Validator for Esr (see help for explanation).

class crate_anon.nlp_manager.parse_haematology.Haematocrit(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

HAEMATOLOGY (FBC).

Haematocrit (Hct). A dimensionless quantity (but supports L/L notation).

__init__(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False) None[source]
Parameters
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_haematology.HaematocritValidator(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

Validator for Haematocrit (see help for explanation).

class crate_anon.nlp_manager.parse_haematology.Haemoglobin(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

HAEMATOLOGY (FBC).

Haemoglobin (Hb). Default units are g/L; also supports g/dL.

UK reporting for haemoglobin switched in 2013 from g/dL to g/L; see e.g.

The DANGER remains that “Hb 9” may have been from someone assuming old-style units, 9 g/dL = 90 g/L, but this will be interpreted as 9 g/L. This problem is hard to avoid.

__init__(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False) None[source]
Parameters
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_haematology.HaemoglobinValidator(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

Validator for Haemoglobin (see help for explanation).

class crate_anon.nlp_manager.parse_haematology.Lymphocytes(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

HAEMATOLOGY (FBC).

Lymphocyte count (absolute). Default units are 10^9 / L; also supports cells/mm^3 = cells/μL.

__init__(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False) None[source]

__init__ function for WbcBase.

Parameters
  • nlpdef – a crate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – the name of a CRATE NLP config file section (from which we may choose to get extra config information)

  • cell_type_regex_text – text for regex for the cell type, representing e.g. “monocytes” or “basophils”

  • variable – used as the record value for variable_name

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_haematology.LymphocytesValidator(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

Validator for Lymphocytes (see help for explanation).

class crate_anon.nlp_manager.parse_haematology.Monocytes(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

HAEMATOLOGY (FBC).

Monocyte count (absolute). Default units are 10^9 / L; also supports cells/mm^3 = cells/μL.

__init__(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False) None[source]

__init__ function for WbcBase.

Parameters
  • nlpdef – a crate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – the name of a CRATE NLP config file section (from which we may choose to get extra config information)

  • cell_type_regex_text – text for regex for the cell type, representing e.g. “monocytes” or “basophils”

  • variable – used as the record value for variable_name

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_haematology.MonocytesValidator(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

Validator for Monocytes (see help for explanation).

class crate_anon.nlp_manager.parse_haematology.Neutrophils(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

HAEMATOLOGY (FBC).

Neutrophil (polymorphonuclear leukoocte) count (absolute). Default units are 10^9 / L; also supports cells/mm^3 = cells/μL.

__init__(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False) None[source]

__init__ function for WbcBase.

Parameters
  • nlpdef – a crate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – the name of a CRATE NLP config file section (from which we may choose to get extra config information)

  • cell_type_regex_text – text for regex for the cell type, representing e.g. “monocytes” or “basophils”

  • variable – used as the record value for variable_name

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_haematology.NeutrophilsValidator(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

Validator for Neutrophils (see help for explanation).

class crate_anon.nlp_manager.parse_haematology.Platelets(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

HAEMATOLOGY (FBC).

Platelet count. Default units are 10^9 / L; also supports cells/mm^3 = cells/μL.

Not actually a white blood cell, of course, but can share the same base class; platelets are expressed in the same units, of 10^9 / L. Typical values 150–450 ×10^9 / L (or 150,000–450,000 per μL).

__init__(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False) None[source]

__init__ function for WbcBase.

Parameters
  • nlpdef – a crate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – the name of a CRATE NLP config file section (from which we may choose to get extra config information)

  • cell_type_regex_text – text for regex for the cell type, representing e.g. “monocytes” or “basophils”

  • variable – used as the record value for variable_name

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_haematology.PlateletsValidator(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

Validator for Platelets (see help for explanation).

class crate_anon.nlp_manager.parse_haematology.RBC(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

HAEMATOLOGY (FBC).

Red blood cell count. Default units are 10^12/L; also supports cells/mm^3 = cells/μL.

A typical excerpt from a FBC report:

RBC, POC    4.84            10*12/L
RBC, POC    9.99    (H)     10*12/L
__init__(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False) None[source]
Parameters
  • nlpdefcrate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – config section suffix in the NLP config file

  • regex_str

    Regular expression, in string format.

    This class operates with compiled regexes having this group format (capture groups in this sequence):

    • variable

    • tense_indicator

    • relation

    • value

    • units

  • variable – used as the record value for variable_name

  • target_unit – fieldname used for the primary output quantity

  • units_to_factor

    dictionary, mapping

    • FROM (compiled regex for units)

    • TO EITHER a float (multiple) to multiply those units by, to get the preferred unit

    • OR a function taking a text parameter and returning a float value in preferred unit

    Any units present in the regex but absent from units_to_factor will lead the result to be ignored. For example, this allows you to ignore a relative neutrophil count (“neutrophils 2.2%”) while detecting absolute neutrophil counts (“neutrophils 2.2”), or ignoring “docusate sodium 100mg” but detecting “sodium 140 mM”.

  • take_absolute

    Convert negative values to positive ones? Typical text requiring this option might look like:

    CRP-4
    CRP-106
    CRP -97
    Blood results for today as follows: Na- 142, K-4.1, ...
    

    … occurring in 23 out of 8054 hits for CRP of one test set in our data.

    For many quantities, we know that they cannot be negative, so this is just a notation rather than a minus sign. We have to account for it, or it’ll distort our values. Preferable to account for it here rather than later; see manual.

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

  • debug – print the regex?

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_haematology.RBCValidator(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

Validator for RBC (see help for explanation).

class crate_anon.nlp_manager.parse_haematology.Wbc(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

HAEMATOLOGY (FBC).

White cell count (WBC, WCC). Default units are 10^9 / L; also supports cells/mm^3 = cells/μL.

__init__(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False) None[source]

__init__ function for WbcBase.

Parameters
  • nlpdef – a crate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – the name of a CRATE NLP config file section (from which we may choose to get extra config information)

  • cell_type_regex_text – text for regex for the cell type, representing e.g. “monocytes” or “basophils”

  • variable – used as the record value for variable_name

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

test(verbose: bool = False) None[source]

Performs a self-test on the NLP processor.

Parameters

verbose – Be verbose?

This is an abstract method that is subclassed.

class crate_anon.nlp_manager.parse_haematology.WbcBase(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], cell_type_regex_text: str, variable: str, commit: bool = False)[source]

DO NOT USE DIRECTLY. White cell count base class. Default units are 10^9 / L; also supports cells/mm^3 = cells/μL.

__init__(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], cell_type_regex_text: str, variable: str, commit: bool = False) None[source]

__init__ function for WbcBase.

Parameters
  • nlpdef – a crate_anon.nlp_manager.nlp_definition.NlpDefinition

  • cfg_processor_name – the name of a CRATE NLP config file section (from which we may choose to get extra config information)

  • cell_type_regex_text – text for regex for the cell type, representing e.g. “monocytes” or “basophils”

  • variable – used as the record value for variable_name

  • commit – force a COMMIT whenever we insert data? You should specify this in multiprocess mode, or you may get database deadlocks.

static make_wbc_regex(cell_type_regex_text: str) str[source]

Makes a regular expression (as text) from text representing a cell type.

class crate_anon.nlp_manager.parse_haematology.WbcValidator(nlpdef: Optional[crate_anon.nlp_manager.nlp_definition.NlpDefinition], cfg_processor_name: Optional[str], commit: bool = False)[source]

Validator for Wbc (see help for explanation).