7.3. Run the NLP

Now you’ve created and edited your config file, you can run the NLP process in one of the following ways:

crate_nlp --nlpdef NLP_NAME --incremental
crate_nlp --nlpdef NLP_NAME --full
crate_nlp_multiprocess --nlpdef NLP_NAME --incremental
crate_nlp_multiprocess --nlpdef NLP_NAME --full

where NLP_NAME is something you’ve configured in the NLP config file (e.g. a drug-parsing NLP program or the GATE demonstration name/location NLP app). Use

The ‘multiprocess’ versions are faster (if you have a multi-core/-CPU computer). The ‘full’ option destroys the destination database and starts again. The ‘incremental’ one brings the destination database up to date (creating it if necessary). The default is ‘incremental’, for safety reasons.

Get more help with

crate_nlp --help

7.3.1. crate_nlp

This runs a single-process NLP controller.

Options:

usage: crate_nlp [-h] [--version] [--config CONFIG] [--verbose]
                 [--nlpdef [NLPDEF]] [--report_every_fast [REPORT_EVERY_FAST]]
                 [--report_every_nlp [REPORT_EVERY_NLP]]
                 [--chunksize [CHUNKSIZE]] [--process [PROCESS]]
                 [--nprocesses [NPROCESSES]] [--processcluster PROCESSCLUSTER]
                 [--democonfig] [--listprocessors] [--describeprocessors]
                 [--print_local_processors] [--print_cloud_processors]
                 [--showinfo [NLP_CLASS_NAME]] [--count] [-i | -f]
                 [--dropremake] [--skipdelete] [--nlp] [--echo] [--timing]
                 [--cloud] [--immediate] [--retrieve] [--cancelrequest]
                 [--cancelall] [--showqueue]

NLP manager. Version 0.18.92 (2019-10-10). By Rudolf Cardinal.

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --config CONFIG       Config file (overriding environment variable
                        CRATE_NLP_CONFIG)
  --verbose, -v         Be verbose (use twice for extra verbosity)
  --nlpdef [NLPDEF]     NLP definition name (from config file)
  --report_every_fast [REPORT_EVERY_FAST]
                        Report insert progress (for fast operations) every n
                        rows in verbose mode (default 100000)
  --report_every_nlp [REPORT_EVERY_NLP]
                        Report progress for NLP every n rows in verbose mode
                        (default 500)
  --chunksize [CHUNKSIZE]
                        Number of records copied in a chunk when copying PKs
                        from one database to another (default 100000)
  --process [PROCESS]   For multiprocess mode: specify process number
  --nprocesses [NPROCESSES]
                        For multiprocess mode: specify total number of
                        processes (launched somehow, of which this is to be
                        one)
  --processcluster PROCESSCLUSTER
                        Process cluster name
  --democonfig          Print a demo config file
  --listprocessors      Show possible built-in NLP processor names
  --describeprocessors  Show details of built-in NLP processors
  --print_local_processors
                        Show NLPRP JSON for local processors that are part of
                        the chosen NLP definition, then stop
  --print_cloud_processors
                        Show NLPRP JSON for cloud (remote) processors that are
                        part of the chosen NLP definition, then stop
  --showinfo [NLP_CLASS_NAME]
                        Show detailed information for a parser
  --count               Count records in source/destination databases, then
                        stop
  -i, --incremental     Process only new/changed information, where possible
                        (* default)
  -f, --full            Drop and remake everything
  --dropremake          Drop/remake destination tables only
  --skipdelete          For incremental updates, skip deletion of rows present
                        in the destination but not the source
  --nlp                 Perform NLP processing only
  --echo                Echo SQL
  --timing              Show detailed timing breakdown
  --cloud               Use cloud-based NLP processing tools. Queued mode by
                        default.
  --immediate           To be used with 'cloud'. Process immediately.
  --retrieve            Retrieve NLP data from cloud
  --cancelrequest       Cancel pending requests for the nlpdef specified
  --cancelall           Cancel all pending cloud requests. WARNING: this
                        option cancels all pending requests - not just those
                        for the nlp definition specified
  --showqueue           Shows all pending cloud requests.

# Generated at 2019-10-10 10:23:30

7.3.2. Current NLP processors

NLP processors (from crate_nlp --describeprocessors):

+---------------------------+----------------------------------------------------------------------------------+
| NLP name                  | Description                                                                      |
+---------------------------+----------------------------------------------------------------------------------+
| Ace                       |                                                                                  |
|                           |     Addenbrooke's Cognitive Examination (ACE, ACE-R, ACE-III) total score.       |
|                           |                                                                                  |
| AceValidator              |                                                                                  |
|                           |     Validator for Ace                                                            |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Basophils                 |                                                                                  |
|                           |     Basophil count (absolute).                                                   |
|                           |                                                                                  |
| BasophilsValidator        |                                                                                  |
|                           |     Validator for Basophils                                                      |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Bmi                       |                                                                                  |
|                           |     Body mass index (BMI) (in kg / m^2).                                         |
|                           |                                                                                  |
| BmiValidator              |                                                                                  |
|                           |     Validator for Bmi                                                            |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Bp                        |                                                                                  |
|                           |     Blood pressure, in mmHg. (Systolic and diastolic.)                           |
|                           |                                                                                  |
|                           |     (Since we produce two variables, SBP and DBP, and we use something a little  |
|                           |     more complex than                                                            |
|                           | :class:`crate_anon.nlp_manager.regex_parser.NumeratorOutOfDenominatorParser`,    |
|                           |     we subclass :class:`crate_anon.nlp_manager.base_nlp_parser.BaseNlpParser`    |
|                           |     directly.)                                                                   |
|                           |                                                                                  |
| BpValidator               |                                                                                  |
|                           |     Validator for Bp                                                             |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Creatinine                |                                                                                  |
|                           |     Creatinine. Default units are micromolar (SI).                               |
|                           |                                                                                  |
| CreatinineValidator       |                                                                                  |
|                           |     Validator for Creatinine                                                     |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Crp                       |                                                                                  |
|                           |     C-reactive protein (CRP).                                                    |
|                           |                                                                                  |
|                           |     CRP units:                                                                   |
|                           |                                                                                  |
|                           |     - mg/L is commonest in the UK (or at least standard at Addenbrooke's,        |
|                           |       Hinchingbrooke, and Dundee);                                               |
|                           |                                                                                  |
|                           |     - values of <=6 mg/L or <10 mg/L are normal, and e.g. 70-250 mg/L in         |
|                           |       pneumonia.                                                                 |
|                           |                                                                                  |
|                           |     - Refs include:                                                              |
|                           |                                                                                  |
|                           |       - http://www.ncbi.nlm.nih.gov/pubmed/7705110                               |
|                           |       - http://emedicine.medscape.com/article/2086909-overview                   |
|                           |                                                                                  |
|                           |     - 1 mg/dL = 10 mg/L, so normal in mg/dL is <=1 roughly.                      |
|                           |                                                                                  |
|                           |                                                                                  |
| CrpValidator              |                                                                                  |
|                           |     Validator for CRP                                                            |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Eosinophils               |                                                                                  |
|                           |     Eosinophil count (absolute).                                                 |
|                           |                                                                                  |
| EosinophilsValidator      |                                                                                  |
|                           |     Validator for Eosinophils                                                    |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Esr                       |                                                                                  |
|                           |     Erythrocyte sedimentation rate (ESR).                                        |
|                           |                                                                                  |
| EsrValidator              |                                                                                  |
|                           |     Validator for Esr                                                            |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Gate                      |                                                                                  |
|                           |     Class controlling an external process, typically our Java interface to       |
|                           |     GATE programs, ``CrateGatePipeline.java`` (but it could be any external      |
|                           |     program).                                                                    |
|                           |                                                                                  |
|                           |     We send text to it, it parses the text, and it sends us back results, which  |
|                           |     we return as dictionaries. The specific text sought depends on the           |
|                           |     configuration file and the specific GATE program used.                       |
|                           |                                                                                  |
|                           |     Notes:                                                                       |
|                           |                                                                                  |
|                           |     - PROBLEM when attempting to use KConnect (Bio-YODIE): its source code is    |
|                           |       full of direct calls to ``System.out.println()``.                          |
|                           |                                                                                  |
|                           |       POTENTIAL SOLUTIONS:                                                       |
|                           |                                                                                  |
|                           |       - named pipes:                                                             |
|                           |                                                                                  |
|                           |         - ``os.mkfifo()`` - Unix only.                                           |
|                           |         - ``win32pipe`` - http://stackoverflow.com/questions/286614              |
|                           |                                                                                  |
|                           |       - ZeroMQ with some sort of security                                        |
|                           |                                                                                  |
|                           |         - ``pip install zmq``                                                    |
|                           |         - some sort of Java binding (``jzmq``, ``jeromq``...)                    |
|                           |                                                                                  |
|                           |       - redirect ``stdout`` in our Java handler                                  |
|                           |                                                                                  |
|                           |         - ``System.setOut()``... yes, that works.                                |
|                           |         - Implemented and exposed as ``--suppress_gate_stdout``.                 |
|                           |                                                                                  |
|                           |                                                                                  |
| Glucose                   |                                                                                  |
|                           |     Glucose.                                                                     |
|                           |                                                                                  |
|                           |     - By Emanuele Osimo, Feb 2019.                                               |
|                           |     - Some modifications by Rudolf Cardinal, Feb 2019.                           |
|                           |                                                                                  |
| GlucoseValidator          |                                                                                  |
|                           |     Validator for glucose                                                        |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Haematocrit               |                                                                                  |
|                           |     Haematocrit (Hct).                                                           |
|                           |                                                                                  |
| HaematocritValidator      |                                                                                  |
|                           |     Validator for Haematocrit                                                    |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Haemoglobin               |                                                                                  |
|                           |     Haemoglobin (Hb).                                                            |
|                           |                                                                                  |
|                           |     UK reporting for haemoglobin switched in 2013 from g/dL to g/L; see          |
|                           |     e.g.                                                                         |
|                           |                                                                                  |
|                           |     - http://www.pathology.leedsth.nhs.uk/pathology/Portals/0/PDFs/BP-2013-02%20 |
|                           | Hb%20units.pdf                                                                   |
|                           |     - http://www.acb.org.uk/docs/default-                                        |
|                           | source/committees/scientific/guidelines/acb/pathology-harmony-haematology.pdf    |
|                           |                                                                                  |
|                           |     The *DANGER* remains that "Hb 9" may have been from someone assuming         |
|                           |     old-style units, 9 g/dL = 90 g/L, but this will be interpreted as 9 g/L.     |
|                           |     This problem is hard to avoid.                                               |
|                           |                                                                                  |
|                           |                                                                                  |
| HaemoglobinValidator      |                                                                                  |
|                           |     Validator for Haemoglobin                                                    |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| HbA1c                     |                                                                                  |
|                           |     Glycosylated (glycated) haemoglobin (HbA1c).                                 |
|                           |                                                                                  |
|                           |     - By Emanuele Osimo, Feb 2019.                                               |
|                           |     - Some modifications by Rudolf Cardinal, Feb 2019.                           |
|                           |                                                                                  |
| HbA1cValidator            |                                                                                  |
|                           |     Validator for HbA1c                                                          |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| HDLCholesterol            |                                                                                  |
|                           |     High-density lipoprotein (HDL) cholesterol.                                  |
|                           |                                                                                  |
|                           |     - By Emanuele Osimo, Feb 2019.                                               |
|                           |     - Some modifications by Rudolf Cardinal, Feb 2019.                           |
|                           |                                                                                  |
| HDLCholesterolValidator   |                                                                                  |
|                           |     Validator for HDL cholesterol                                                |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Height                    |                                                                                  |
|                           |     Height. Handles metric (e.g. "1.8m") and imperial (e.g. "5 ft 2 in").        |
|                           |                                                                                  |
| HeightValidator           |                                                                                  |
|                           |     Validator for Height                                                         |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| LDLCholesterol            |                                                                                  |
|                           |     Low density lipoprotein (LDL) cholesterol.                                   |
|                           |                                                                                  |
|                           |     - By Emanuele Osimo, Feb 2019.                                               |
|                           |     - Some modifications by Rudolf Cardinal, Feb 2019.                           |
|                           |                                                                                  |
| LDLCholesterolValidator   |                                                                                  |
|                           |     Validator for LDL cholesterol                                                |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Lithium                   |                                                                                  |
|                           |     Lithium (Li) levels (for blood tests, not doses).                            |
|                           |                                                                                  |
| LithiumValidator          |                                                                                  |
|                           |     Validator for Lithium                                                        |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Lymphocytes               |                                                                                  |
|                           |     Lymphocyte count (absolute).                                                 |
|                           |                                                                                  |
| LymphocytesValidator      |                                                                                  |
|                           |     Validator for Lymphocytes                                                    |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Medex                     |                                                                                  |
|                           |     Class controlling a Medex-UIMA external process, via our custom Java         |
|                           |     interface, ``CrateMedexPipeline.java``.                                      |
|                           |                                                                                  |
| MiniAce                   |                                                                                  |
|                           |     Mini-Addenbrooke's Cognitive Examination (M-ACE).                            |
|                           |                                                                                  |
| MiniAceValidator          |                                                                                  |
|                           |     Validator for MiniAce                                                        |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Mmse                      |                                                                                  |
|                           |     Mini-mental state examination (MMSE).                                        |
|                           |                                                                                  |
| MmseValidator             |                                                                                  |
|                           |     Validator for Mmse                                                           |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Moca                      |                                                                                  |
|                           |     Montreal Cognitive Assessment (MOCA).                                        |
|                           |                                                                                  |
| MocaValidator             |                                                                                  |
|                           |     Validator for Moca                                                           |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Monocytes                 |                                                                                  |
|                           |     Monocyte count (absolute).                                                   |
|                           |                                                                                  |
| MonocytesValidator        |                                                                                  |
|                           |     Validator for Monocytes                                                      |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Neutrophils               |                                                                                  |
|                           |     Neutrophil count (absolute).                                                 |
|                           |                                                                                  |
| NeutrophilsValidator      |                                                                                  |
|                           |     Validator for Neutrophils                                                    |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Platelets                 |                                                                                  |
|                           |     Platelet count.                                                              |
|                           |                                                                                  |
|                           |     Not actually a white blood cell, of course, but can share the same base      |
|                           |     class; platelets are expressed in the same units, of 10^9 / L.               |
|                           |     Typical values 150–450 ×10^9 / L (or 150,000–450,000 per μL).                |
|                           |                                                                                  |
| PlateletsValidator        |                                                                                  |
|                           |     Validator for Platelets                                                      |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Potassium                 |                                                                                  |
|                           |     Potassium (K).                                                               |
|                           |                                                                                  |
| PotassiumValidator        |                                                                                  |
|                           |     Validator for Potassium                                                      |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| RBC                       |                                                                                  |
|                           |     Red blood cell count.                                                        |
|                           |                                                                                  |
|                           |     Typical:                                                                     |
|                           |                                                                                  |
|                           |     .. code-block:: none                                                         |
|                           |                                                                                  |
|                           |         RBC, POC    4.84            10*12/L                                      |
|                           |         RBC, POC    9.99    (H)     10*12/L                                      |
|                           |                                                                                  |
| RBCValidator              |                                                                                  |
|                           |     Validator for RBC                                                            |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Sodium                    |                                                                                  |
|                           |     Sodium (Na).                                                                 |
|                           |                                                                                  |
| SodiumValidator           |                                                                                  |
|                           |     Validator for Sodium                                                         |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| TotalCholesterol          |                                                                                  |
|                           |     Total cholesterol.                                                           |
|                           |                                                                                  |
| TotalCholesterolValidator |                                                                                  |
|                           |     Validator for total cholesterol                                              |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Triglycerides             |                                                                                  |
|                           |     Triglycerides.                                                               |
|                           |                                                                                  |
|                           |     - By Emanuele Osimo, Feb 2019.                                               |
|                           |     - Some modifications by Rudolf Cardinal, Feb 2019.                           |
|                           |                                                                                  |
| TriglyceridesValidator    |                                                                                  |
|                           |     Validator for triglycerides                                                  |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Tsh                       |                                                                                  |
|                           |     Thyroid-stimulating hormone (TSH).                                           |
|                           |                                                                                  |
| TshValidator              |                                                                                  |
|                           |     Validator for TSH                                                            |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Urea                      |                                                                                  |
|                           |     Urea.                                                                        |
|                           |                                                                                  |
| UreaValidator             |                                                                                  |
|                           |     Validator for Urea                                                           |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Wbc                       |                                                                                  |
|                           |     White cell count (WBC, WCC).                                                 |
|                           |                                                                                  |
| WbcValidator              |                                                                                  |
|                           |     Validator for Wbc                                                            |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
| Weight                    |                                                                                  |
|                           |     Weight. Handles metric (e.g. "57kg") and imperial (e.g. "10 st 2 lb").       |
|                           |                                                                                  |
| WeightValidator           |                                                                                  |
|                           |     Validator for Weight                                                         |
|                           |     (see :class:`crate_anon.nlp_manager.regex_parser.ValidatorBase` for          |
|                           |     explanation).                                                                |
|                           |                                                                                  |
+---------------------------+----------------------------------------------------------------------------------+

# Generated at 2019-10-10 10:23:29

7.3.3. crate_nlp_multiprocess

This program runs multiple copies of crate_nlp in parallel.

Options:

usage: crate_nlp_multiprocess [-h] --nlpdef NLPDEF [--nproc [NPROC]]
                              [--verbose]

Runs the CRATE NLP manager in parallel. Version 0.18.92 (2019-10-10). Note
that all arguments not specified here are passed to the underlying script (see
crate_nlp --help).

optional arguments:
  -h, --help            show this help message and exit
  --nlpdef NLPDEF       NLP processing name, from the config file
  --nproc [NPROC], -n [NPROC]
                        Number of processes (default on this machine: 8)
  --verbose, -v         Be verbose

# Generated at 2019-10-10 10:23:31

7.3.4. Limiting the network bandwidth used by cloud NLP

Cloud-based NLP may involve sending large quantities of text (de-identified and encrypted en route) to a distant server. If you have limited network bandwidth, you may want to cap the bandwidth used by CRATE (at the price of speed).

Under Linux, use trickle. Here’s how:

# Install with e.g. "sudo apt install trickle", then see "man trickle".
# Source code is at https://github.com/mariusae/trickle.
# Example with limits of 500 KB/s download, 200 KB/s upload:
trickle -s -d 500 -u 200 crate_nlp <OPTIONS>

Under Windows, use NetLimiter. The rationale is as follows.

Under Windows, the choice is less obvious. A commercial opton is NetLimiter, but there is no direct equivalent of trickle. Python options require quite a bit of network code redesign; e.g.

but with the exception of rewriting network code to use Twisted rather than requests, none of these open-source methods address the general-purposes bandwidth limitation challenge addressed by trickle. The best option might be txrequests or treq plus bandwidth limitation via Twisted through its ThrottlingFactory, but this doesn’t look entirely simple (see links above). Even with that, it’d be hard to coordinate bandwidth limits across multiple processes.

Therefore, in favour of NetLimiter:

  • it’s cheap (~$30/licence in 2019);
  • it provides a per-host unlimited-duration licence;
  • if you’re using Windows you’re already in the domain of commercial software;
  • the cloud NLP facility of CRATE is the sort of thing you’re likely to run on one big computer rather than lots of computers (so one licence should suffice);
  • its filters are very flexible (including time-of-day restrictions and the ability to group applications);
  • the alternatives would involve substantial development effort for lesser benefit;

… so NetLimiter seems like the most cost-effective option.