7.4. Run the NLP

Now you’ve created and edited your config file, you can run the NLP process in one of the following ways:

crate_nlp [--config CONFIG] --nlpdef NLP_NAME --incremental
crate_nlp [--config CONFIG] --nlpdef NLP_NAME --full
crate_nlp_multiprocess [--config CONFIG] --nlpdef NLP_NAME --incremental
crate_nlp_multiprocess [--config CONFIG] --nlpdef NLP_NAME --full

where NLP_NAME is something you’ve configured in the NLP config file (e.g. a drug-parsing NLP program or the GATE demonstration name/location NLP app). You can specify the config file explicitly or default to one selected by an environment variable (see below).

The ‘multiprocess’ versions are faster (if you have a multi-core/-CPU computer). The ‘full’ option destroys the destination database and starts again. The ‘incremental’ one brings the destination database up to date (creating it if necessary). The default is ‘incremental’, for safety reasons.

Get more help with

crate_nlp --help

7.4.1. crate_nlp

This runs a single-process NLP controller.

Options:

usage: crate_nlp [-h] [--config CONFIG] [--nlpdef NLPDEF] [-i | -f]
                 [--dropremake] [--skipdelete] [--nlp] [--chunksize CHUNKSIZE]
                 [--verbose] [--report_every_fast REPORT_EVERY_FAST]
                 [--report_every_nlp REPORT_EVERY_NLP] [--echo] [--timing]
                 [--process PROCESS] [--nprocesses NPROCESSES]
                 [--processcluster PROCESSCLUSTER] [--version] [--democonfig]
                 [--listprocessors] [--describeprocessors] [--test_nlp]
                 [--print_local_processors] [--print_cloud_processors]
                 [--count] [--cloud] [--immediate] [--retrieve]
                 [--cancelrequest] [--cancelall] [--showqueue]

NLP manager. Version 0.19.4 (2022-05-24). By Rudolf Cardinal.

optional arguments:
  -h, --help            show this help message and exit

Config options:
  --config CONFIG       Config file (overriding environment variable
                        CRATE_NLP_CONFIG) (default: None)
  --nlpdef NLPDEF       NLP definition name (from config file) (default: None)
  -i, --incremental     Process only new/changed information, where possible
                        (default: True)
  -f, --full            Drop and remake everything (default: False)
  --dropremake          Drop/remake destination tables only (default: False)
  --skipdelete          For incremental updates, skip deletion of rows present
                        in the destination but not the source (default: False)
  --nlp                 Perform NLP processing only (default: False)
  --chunksize CHUNKSIZE
                        Number of records copied in a chunk when copying PKs
                        from one database to another (default: 100000)

Reporting options:
  --verbose, -v         Be verbose (use twice for extra verbosity) (default:
                        False)
  --report_every_fast REPORT_EVERY_FAST
                        Report insert progress (for fast operations) every n
                        rows in verbose mode (default: 100000)
  --report_every_nlp REPORT_EVERY_NLP
                        Report progress for NLP every n rows in verbose mode
                        (default: 500)
  --echo                Echo SQL (default: False)
  --timing              Show detailed timing breakdown (default: False)

Multiprocessing options:
  --process PROCESS     For multiprocess mode: specify process number
                        (default: 0)
  --nprocesses NPROCESSES
                        For multiprocess mode: specify total number of
                        processes (launched somehow, of which this is to be
                        one) (default: 1)
  --processcluster PROCESSCLUSTER
                        Process cluster name (default: )

Info actions:
  --version             show program's version number and exit
  --democonfig          Print a demo config file (default: False)
  --listprocessors      Show all possible built-in NLP processor names
                        (default: False)
  --describeprocessors  Show details of all built-in NLP processors (default:
                        False)
  --test_nlp            Test the NLP processor(s) for the selected definition,
                        by sending text from stdin to them (default: False)
  --print_local_processors
                        For the chosen NLP definition, establish which local
                        NLP processors are involved (if any). Show detailed
                        information about these processors (as NLPRP JSON),
                        then stop (default: False)
  --print_cloud_processors
                        For the chosen NLP definition, establish the relevant
                        cloud server, if applicable (from the 'cloud_config'
                        parameter). Ask that remote server about its available
                        NLP processors. Show detailed information about these
                        remote processors (as NLPRP JSON), then stop (default:
                        False)
  --count               Count records in source/destination databases, then
                        stop (default: False)

Cloud options:
  --cloud               Use cloud-based NLP processing tools. Queued mode by
                        default. (default: False)
  --immediate           To be used with 'cloud'. Process immediately.
                        (default: False)
  --retrieve            Retrieve NLP data from cloud (default: False)
  --cancelrequest       Cancel pending requests for the nlpdef specified
                        (default: False)
  --cancelall           Cancel all pending cloud requests. WARNING: this
                        option cancels all pending requests - not just those
                        for the nlp definition specified (default: False)
  --showqueue           Shows all pending cloud requests. (default: False)

7.4.2. Current NLP processors

NLP processors (from crate_nlp --describeprocessors):

+---------------------------+-------------------------------------------------+
| NLP name                  | Description                                     |
+---------------------------+-------------------------------------------------+
| Ace                       | COGNITIVE.                                      |
|                           |                                                 |
|                           | Addenbrooke's Cognitive Examination (ACE,       |
|                           | ACE-R, ACE-III) total score.                    |
|                           |                                                 |
|                           | The default denominator is 100 but it supports  |
|                           | other values if given                           |
|                           | explicitly.                                     |
+---------------------------+-------------------------------------------------+
| AceValidator              | Validator for Ace (see help for explanation).   |
+---------------------------+-------------------------------------------------+
| Albumin                   | BIOCHEMISTRY (LFTs).                            |
|                           |                                                 |
|                           | Albumin (Alb). Units are g/L.                   |
+---------------------------+-------------------------------------------------+
| AlbuminValidator          | Validator for Albumin (see help for             |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| AlkPhos                   | BIOCHEMISTRY (LFTs/BFTs).                       |
|                           |                                                 |
|                           | Alkaline phosphatase (ALP, AlkP, AlkPhos).      |
|                           | Units are U/L.                                  |
+---------------------------+-------------------------------------------------+
| AlkPhosValidator          | Validator for AlkPhos (see help for             |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| ALT                       | BIOCHEMISTRY (LFTs).                            |
|                           |                                                 |
|                           | Alanine aminotransferase (ALT), a.k.a. alanine  |
|                           | transaminase (ALT).                             |
|                           | Units are U/L.                                  |
|                           |                                                 |
|                           | A.k.a. serum glutamate-pyruvate transaminase    |
|                           | (SGPT), or serum                                |
|                           | glutamate-pyruvic transaminase (SGPT), but not  |
|                           | a.k.a. those in recent                          |
|                           | memory!                                         |
+---------------------------+-------------------------------------------------+
| ALTValidator              | Validator for ALT (see help for explanation).   |
+---------------------------+-------------------------------------------------+
| Basophils                 | HAEMATOLOGY (FBC).                              |
|                           |                                                 |
|                           | Basophil count (absolute).                      |
|                           | Default units are 10^9 / L; also supports       |
|                           | cells/mm^3 = cells/μL.                          |
+---------------------------+-------------------------------------------------+
| BasophilsValidator        | Validator for Basophils (see help for           |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| Bilirubin                 | BIOCHEMISTRY (LFTs).                            |
|                           |                                                 |
|                           | Total bilirubin. Units are μM.                  |
+---------------------------+-------------------------------------------------+
| BilirubinValidator        | Validator for Bilirubin (see help for           |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| Bmi                       | CLINICAL EXAMINATION.                           |
|                           |                                                 |
|                           | Body mass index (BMI), in kg / m^2.             |
+---------------------------+-------------------------------------------------+
| BmiValidator              | Validator for Bmi (see help for explanation).   |
+---------------------------+-------------------------------------------------+
| Bp                        | CLINICAL EXAMINATION.                           |
|                           |                                                 |
|                           | Blood pressure, in mmHg. (Systolic and          |
|                           | diastolic.)                                     |
+---------------------------+-------------------------------------------------+
| BpValidator               | Validator for Bp (see help for explanation).    |
+---------------------------+-------------------------------------------------+
| Cloud                     | EXTERNAL.                                       |
|                           |                                                 |
|                           | Abstract NLP processor that passes information  |
|                           | to a remote (cloud-based)                       |
|                           | NLP system via the NLPRP protocol. The          |
|                           | processor at the other end might be             |
|                           | of any kind.                                    |
+---------------------------+-------------------------------------------------+
| Creatinine                | BIOCHEMISTRY (U&E).                             |
|                           |                                                 |
|                           | Creatinine. Default units are micromolar (SI);  |
|                           | also supports mg/dL.                            |
+---------------------------+-------------------------------------------------+
| CreatinineValidator       | Validator for Creatinine (see help for          |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| Crp                       | BIOCHEMISTRY.                                   |
|                           |                                                 |
|                           | C-reactive protein (CRP). Default units are     |
|                           | mg/L; also supports mg/dL.                      |
|                           |                                                 |
|                           | CRP units:                                      |
|                           |                                                 |
|                           | - mg/L is commonest in the UK (or at least      |
|                           | standard at Addenbrooke's,                      |
|                           |   Hinchingbrooke, and Dundee);                  |
|                           |                                                 |
|                           | - values of <=6 mg/L or <10 mg/L are normal,    |
|                           | and e.g. 70-250 mg/L in                         |
|                           |   pneumonia.                                    |
|                           |                                                 |
|                           | - Refs include:                                 |
|                           |                                                 |
|                           |   - https://www.ncbi.nlm.nih.gov/pubmed/7705110 |
|                           |   - https://emedicine.medscape.com/article/2086 |
|                           | 909-overview                                    |
|                           |                                                 |
|                           | - 1 mg/dL = 10 mg/L, so normal in mg/dL is <=1  |
|                           | roughly.                                        |
+---------------------------+-------------------------------------------------+
| CrpValidator              | Validator for Crp (see help for explanation).   |
+---------------------------+-------------------------------------------------+
| Eosinophils               | HAEMATOLOGY (FBC).                              |
|                           |                                                 |
|                           | Eosinophil count (absolute).                    |
|                           | Default units are 10^9 / L; also supports       |
|                           | cells/mm^3 = cells/μL.                          |
+---------------------------+-------------------------------------------------+
| EosinophilsValidator      | Validator for Eosinophils (see help for         |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| Esr                       | HAEMATOLOGY (ESR).                              |
|                           |                                                 |
|                           | Erythrocyte sedimentation rate (ESR), in mm/h.  |
+---------------------------+-------------------------------------------------+
| EsrValidator              | Validator for Esr (see help for explanation).   |
+---------------------------+-------------------------------------------------+
| GammaGT                   | BIOCHEMISTRY (LFTs).                            |
|                           |                                                 |
|                           | Gamma-glutamyl transferase (gGT), in U/L.       |
+---------------------------+-------------------------------------------------+
| GammaGTValidator          | Validator for GammaGT (see help for             |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| Gate                      | EXTERNAL.                                       |
|                           |                                                 |
|                           | Abstract NLP processor controlling an external  |
|                           | process, typically our Java                     |
|                           | interface to GATE programs,                     |
|                           | ``CrateGatePipeline.java`` (but it could be any |
|                           | external program).                              |
|                           |                                                 |
|                           | We send text to it, it parses the text, and it  |
|                           | sends us back results, which                    |
|                           | we return as dictionaries. The specific text    |
|                           | sought depends on the                           |
|                           | configuration file and the specific GATE        |
|                           | program used.                                   |
|                           |                                                 |
|                           | For details of GATE, see                        |
|                           | https://www.gate.ac.uk/.                        |
+---------------------------+-------------------------------------------------+
| Glucose                   | BIOCHEMISTRY.                                   |
|                           |                                                 |
|                           | Glucose. Default units are mM; also supports    |
|                           | mg/dL.                                          |
+---------------------------+-------------------------------------------------+
| GlucoseValidator          | Validator for Glucose (see help for             |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| Haematocrit               | HAEMATOLOGY (FBC).                              |
|                           |                                                 |
|                           | Haematocrit (Hct).                              |
|                           | A dimensionless quantity (but supports L/L      |
|                           | notation).                                      |
+---------------------------+-------------------------------------------------+
| HaematocritValidator      | Validator for Haematocrit (see help for         |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| Haemoglobin               | HAEMATOLOGY (FBC).                              |
|                           |                                                 |
|                           | Haemoglobin (Hb). Default units are g/L; also   |
|                           | supports g/dL.                                  |
|                           |                                                 |
|                           | UK reporting for haemoglobin switched in 2013   |
|                           | from g/dL to g/L; see                           |
|                           | e.g.                                            |
|                           |                                                 |
|                           | - http://www.pathology.leedsth.nhs.uk/pathology |
|                           | /Portals/0/PDFs/BP-2013-02%20Hb%20units.pdf     |
|                           | - https://www.acb.org.uk/docs/default-source/co |
|                           | mmittees/scientific/guidelines/acb/pathology-   |
|                           | harmony-haematology.pdf                         |
|                           |                                                 |
|                           | The *DANGER* remains that "Hb 9" may have been  |
|                           | from someone assuming                           |
|                           | old-style units, 9 g/dL = 90 g/L, but this will |
|                           | be interpreted as 9 g/L.                        |
|                           | This problem is hard to avoid.                  |
+---------------------------+-------------------------------------------------+
| HaemoglobinValidator      | Validator for Haemoglobin (see help for         |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| HbA1c                     | BIOCHEMISTRY.                                   |
|                           |                                                 |
|                           | Glycosylated (glycated) haemoglobin (HbA1c).    |
|                           | Default units are mmol/mol; also supports %.    |
|                           |                                                 |
|                           | Note: HbA1 is different                         |
|                           | (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2 |
|                           | 541274).                                        |
+---------------------------+-------------------------------------------------+
| HbA1cValidator            | Validator for HbA1c (see help for explanation). |
+---------------------------+-------------------------------------------------+
| HDLCholesterol            | BIOCHEMISTRY (LIPID PROFILE).                   |
|                           |                                                 |
|                           | High-density lipoprotein (HDL) cholesterol.     |
|                           | Default units are mM; also supports mg/dL.      |
+---------------------------+-------------------------------------------------+
| HDLCholesterolValidator   | Validator for HDLCholesterol (see help for      |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| Height                    | CLINICAL EXAMINATION.                           |
|                           |                                                 |
|                           | Height. Handles metric (e.g. "1.8m") and        |
|                           | imperial (e.g. "5 ft 2 in").                    |
+---------------------------+-------------------------------------------------+
| HeightValidator           | Validator for Height (see help for              |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| LDLCholesterol            | BIOCHEMISTRY (LIPID PROFILE).                   |
|                           |                                                 |
|                           | Low density lipoprotein (LDL) cholesterol.      |
|                           | Default units are mM; also supports mg/dL.      |
+---------------------------+-------------------------------------------------+
| LDLCholesterolValidator   | Validator for LDLCholesterol (see help for      |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| Lithium                   | BIOCHEMISTRY (THERAPEUTIC DRUG MONITORING).     |
|                           |                                                 |
|                           | Lithium (Li) levels (for blood tests, not       |
|                           | doses), in mM.                                  |
+---------------------------+-------------------------------------------------+
| LithiumValidator          | Validator for Lithium (see help for             |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| Lymphocytes               | HAEMATOLOGY (FBC).                              |
|                           |                                                 |
|                           | Lymphocyte count (absolute).                    |
|                           | Default units are 10^9 / L; also supports       |
|                           | cells/mm^3 = cells/μL.                          |
+---------------------------+-------------------------------------------------+
| LymphocytesValidator      | Validator for Lymphocytes (see help for         |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| Medex                     | EXTERNAL.                                       |
|                           |                                                 |
|                           | Class controlling a Medex-UIMA external         |
|                           | process, via our custom Java                    |
|                           | interface, ``CrateMedexPipeline.java``.         |
|                           |                                                 |
|                           | MedEx-UIMA is a medication-finding tool:        |
|                           | https://www.ncbi.nlm.nih.gov/pubmed/25954575.   |
+---------------------------+-------------------------------------------------+
| MiniAce                   | COGNITIVE.                                      |
|                           |                                                 |
|                           | Mini-Addenbrooke's Cognitive Examination        |
|                           | (M-ACE).                                        |
|                           |                                                 |
|                           | The default denominator is 30, but it supports  |
|                           | other values if given                           |
|                           | explicitly.                                     |
+---------------------------+-------------------------------------------------+
| MiniAceValidator          | Validator for MiniAce (see help for             |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| Mmse                      | COGNITIVE.                                      |
|                           |                                                 |
|                           | Mini-mental state examination (MMSE).           |
|                           |                                                 |
|                           | The default denominator is 30, but it supports  |
|                           | other values if given                           |
|                           | explicitly.                                     |
+---------------------------+-------------------------------------------------+
| MmseValidator             | Validator for Mmse (see help for explanation).  |
+---------------------------+-------------------------------------------------+
| Moca                      | COGNITIVE.                                      |
|                           |                                                 |
|                           | Montreal Cognitive Assessment (MOCA).           |
|                           |                                                 |
|                           | The default denominator is 30, but it supports  |
|                           | other values if given                           |
|                           | explicitly.                                     |
+---------------------------+-------------------------------------------------+
| MocaValidator             | Validator for Moca (see help for explanation).  |
+---------------------------+-------------------------------------------------+
| Monocytes                 | HAEMATOLOGY (FBC).                              |
|                           |                                                 |
|                           | Monocyte count (absolute).                      |
|                           | Default units are 10^9 / L; also supports       |
|                           | cells/mm^3 = cells/μL.                          |
+---------------------------+-------------------------------------------------+
| MonocytesValidator        | Validator for Monocytes (see help for           |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| Neutrophils               | HAEMATOLOGY (FBC).                              |
|                           |                                                 |
|                           | Neutrophil (polymorphonuclear leukoocte) count  |
|                           | (absolute).                                     |
|                           | Default units are 10^9 / L; also supports       |
|                           | cells/mm^3 = cells/μL.                          |
+---------------------------+-------------------------------------------------+
| NeutrophilsValidator      | Validator for Neutrophils (see help for         |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| Platelets                 | HAEMATOLOGY (FBC).                              |
|                           |                                                 |
|                           | Platelet count.                                 |
|                           | Default units are 10^9 / L; also supports       |
|                           | cells/mm^3 = cells/μL.                          |
|                           |                                                 |
|                           | Not actually a white blood cell, of course, but |
|                           | can share the same base                         |
|                           | class; platelets are expressed in the same      |
|                           | units, of 10^9 / L.                             |
|                           | Typical values 150–450 ×10^9 / L (or            |
|                           | 150,000–450,000 per μL).                        |
+---------------------------+-------------------------------------------------+
| PlateletsValidator        | Validator for Platelets (see help for           |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| Potassium                 | BIOCHEMISTRY (U&E).                             |
|                           |                                                 |
|                           | Potassium (K), in mM.                           |
+---------------------------+-------------------------------------------------+
| PotassiumValidator        | Validator for Potassium (see help for           |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| RBC                       | HAEMATOLOGY (FBC).                              |
|                           |                                                 |
|                           | Red blood cell count.                           |
|                           | Default units are 10^12/L; also supports        |
|                           | cells/mm^3 = cells/μL.                          |
|                           |                                                 |
|                           | A typical excerpt from a FBC report:            |
|                           |                                                 |
|                           | .. code-block:: none                            |
|                           |                                                 |
|                           |     RBC, POC    4.84            10*12/L         |
|                           |     RBC, POC    9.99    (H)     10*12/L         |
+---------------------------+-------------------------------------------------+
| RBCValidator              | Validator for RBC (see help for explanation).   |
+---------------------------+-------------------------------------------------+
| Sodium                    | BIOCHEMISTRY (U&E).                             |
|                           |                                                 |
|                           | Sodium (Na), in mM.                             |
+---------------------------+-------------------------------------------------+
| SodiumValidator           | Validator for Sodium (see help for              |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| TotalCholesterol          | BIOCHEMISTRY (LIPID PROFILE).                   |
|                           |                                                 |
|                           | Total or undifferentiated cholesterol.          |
|                           | Default units are mM; also supports mg/dL.      |
+---------------------------+-------------------------------------------------+
| TotalCholesterolValidator | Validator for TotalCholesterol (see help for    |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| Triglycerides             | BIOCHEMISTRY (LIPID PROFILE).                   |
|                           |                                                 |
|                           | Triglycerides.                                  |
|                           | Default units are mM; also supports mg/dL.      |
+---------------------------+-------------------------------------------------+
| TriglyceridesValidator    | Validator for Triglycerides (see help for       |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+
| Tsh                       | BIOCHEMISTRY (ENDOCRINOLOGY).                   |
|                           |                                                 |
|                           | Thyroid-stimulating hormone (TSH), in mIU/L (or |
|                           | μIU/mL).                                        |
+---------------------------+-------------------------------------------------+
| TshValidator              | Validator for TSH (see help for explanation).   |
+---------------------------+-------------------------------------------------+
| Urea                      | BIOCHEMISTRY (U&E).                             |
|                           |                                                 |
|                           | Urea, in mM.                                    |
+---------------------------+-------------------------------------------------+
| UreaValidator             | Validator for Urea (see help for explanation).  |
+---------------------------+-------------------------------------------------+
| Wbc                       | HAEMATOLOGY (FBC).                              |
|                           |                                                 |
|                           | White cell count (WBC, WCC).                    |
|                           | Default units are 10^9 / L; also supports       |
|                           | cells/mm^3 = cells/μL.                          |
+---------------------------+-------------------------------------------------+
| WbcValidator              | Validator for Wbc (see help for explanation).   |
+---------------------------+-------------------------------------------------+
| Weight                    | CLINICAL EXAMINATION.                           |
|                           |                                                 |
|                           | Weight. Handles metric (e.g. "57kg") and        |
|                           | imperial (e.g. "10 st 2 lb").                   |
|                           | Requires units to be specified.                 |
+---------------------------+-------------------------------------------------+
| WeightValidator           | Validator for Weight (see help for              |
|                           | explanation).                                   |
+---------------------------+-------------------------------------------------+

Abbreviations not otherwise explained:

  • BFTs: bone function tests.

  • FBC: full blood count.

  • LFTs: liver function tests.

  • U&E, urea and electrolytes.

7.4.3. crate_nlp_multiprocess

This program runs multiple copies of crate_nlp in parallel.

Options:

usage: crate_nlp_multiprocess [-h] --nlpdef NLPDEF [--nproc [NPROC]]
                              [--verbose]

Runs the CRATE NLP manager in parallel. Version 0.19.4 (2022-05-24). Note that
all arguments not specified here are passed to the underlying script (see
crate_nlp --help).

optional arguments:
  -h, --help            show this help message and exit
  --nlpdef NLPDEF       NLP processing name, from the config file
  --nproc [NPROC], -n [NPROC]
                        Number of processes (default is the number of CPUs on
                        this machine)
  --verbose, -v         Be verbose

7.4.4. Limiting the network bandwidth used by cloud NLP

Cloud-based NLP may involve sending large quantities of text (de-identified and encrypted en route) to a distant server. If you have limited network bandwidth, you may want to cap the bandwidth used by CRATE (at the price of speed).

Under Linux, use trickle. Here’s how:

# Install with e.g. "sudo apt install trickle", then see "man trickle".
# Source code is at https://github.com/mariusae/trickle.
# Example with limits of 500 KB/s download, 200 KB/s upload:
trickle -s -d 500 -u 200 crate_nlp <OPTIONS>

Under Windows, use NetLimiter. The rationale is as follows.

Under Windows, the choice is less obvious. A commercial opton is NetLimiter, but there is no direct equivalent of trickle. Python options require quite a bit of network code redesign; e.g.

but with the exception of rewriting network code to use Twisted rather than requests, none of these open-source methods address the general-purposes bandwidth limitation challenge addressed by trickle. The best option might be txrequests or treq plus bandwidth limitation via Twisted through its ThrottlingFactory, but this doesn’t look entirely simple (see links above). Even with that, it’d be hard to coordinate bandwidth limits across multiple processes.

Therefore, in favour of NetLimiter:

  • it’s cheap (~$30/licence in 2019);

  • it provides a per-host unlimited-duration licence;

  • if you’re using Windows you’re already in the domain of commercial software;

  • the cloud NLP facility of CRATE is the sort of thing you’re likely to run on one big computer rather than lots of computers (so one licence should suffice);

  • its filters are very flexible (including time-of-day restrictions and the ability to group applications);

  • the alternatives would involve substantial development effort for lesser benefit;

… so NetLimiter seems like the most cost-effective option.