.. crate_anon/docs/source/nlp/run_nlp.rst .. Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk). . This file is part of CRATE. . CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. . CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. . You should have received a copy of the GNU General Public License along with CRATE. If not, see . .. _NetLimiter: https://www.netlimiter.com/ .. _requests: http://docs.python-requests.org .. _ThrottlingFactory: https://twistedmatrix.com/documents/current/api/twisted.protocols.policies.ThrottlingFactory.html .. _treq: https://treq.readthedocs.io/ .. _trickle: https://www.usenix.org/legacy/event/usenix05/tech/freenix/full_papers/eriksen/eriksen.pdf .. _Twisted: https://twistedmatrix.com/ .. _txrequests: https://pypi.org/project/txrequests/ Run the NLP ----------- Now you've created and edited your config file, you can run the NLP process in one of the following ways: .. code-block:: bash crate_nlp [--config CONFIG] --nlpdef NLP_NAME --incremental crate_nlp [--config CONFIG] --nlpdef NLP_NAME --full crate_nlp_multiprocess [--config CONFIG] --nlpdef NLP_NAME --incremental crate_nlp_multiprocess [--config CONFIG] --nlpdef NLP_NAME --full where `NLP_NAME` is something you’ve configured in the :ref:`NLP config file ` (e.g. a drug-parsing NLP program or the GATE demonstration name/location NLP app). You can specify the config file explicitly or default to one selected by an environment variable (see below). The ‘multiprocess’ versions are faster (if you have a multi-core/-CPU computer). The ‘full’ option destroys the destination database and starts again. The ‘incremental’ one brings the destination database up to date (creating it if necessary). The default is ‘incremental’, for safety reasons. Get more help with .. code-block:: bash crate_nlp --help .. _crate_nlp: crate_nlp ~~~~~~~~~ This runs a single-process NLP controller. Options: .. literalinclude:: _crate_nlp_help.txt :language: none .. _crate_nlp_describeprocessors: Current NLP processors ~~~~~~~~~~~~~~~~~~~~~~ NLP processors (from ``crate_nlp --describeprocessors``): .. literalinclude:: _crate_nlp_describeprocessors.txt :language: none Abbreviations not otherwise explained: - BFTs: bone function tests. - FBC: full blood count. - LFTs: liver function tests. - U&E, urea and electrolytes. .. _crate_nlp_multiprocess: crate_nlp_multiprocess ~~~~~~~~~~~~~~~~~~~~~~ This program runs multiple copies of ``crate_nlp`` in parallel. Options: .. literalinclude:: _crate_nlp_multiprocess_help.txt :language: none Limiting the network bandwidth used by cloud NLP ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Cloud-based NLP may involve sending large quantities of text (de-identified and encrypted en route) to a distant server. If you have limited network bandwidth, you may want to cap the bandwidth used by CRATE (at the price of speed). **Under Linux,** use trickle_. Here's how: .. code-block:: bash # Install with e.g. "sudo apt install trickle", then see "man trickle". # Source code is at https://github.com/mariusae/trickle. # Example with limits of 500 KB/s download, 200 KB/s upload: trickle -s -d 500 -u 200 crate_nlp **Under Windows,** use NetLimiter_. The rationale is as follows. Under Windows, the choice is less obvious. A commercial opton is NetLimiter_, but there is no direct equivalent of trickle_. Python options require quite a bit of network code redesign; e.g. - https://stackoverflow.com/questions/3488616/bandwidth-throttling-in-python - https://stackoverflow.com/questions/17691231/how-to-limit-download-rate-of-http-requests-in-requests-python-library - https://stackoverflow.com/questions/20247354/limiting-throttling-the-rate-of-http-requests-in-grequests - https://stackoverflow.com/questions/13047458/bandwidth-throttling-using-twisted but with the exception of rewriting network code to use Twisted_ rather than requests_, none of these open-source methods address the general-purposes bandwidth limitation challenge addressed by trickle_. The best option might be txrequests_ or treq_ plus bandwidth limitation via Twisted_ through its ThrottlingFactory_, but this doesn't look entirely simple (see links above). Even with that, it'd be hard to coordinate bandwidth limits across multiple processes. Therefore, in favour of NetLimiter_: - it's cheap (~$30/licence in 2019); - it provides a per-host unlimited-duration licence; - if you're using Windows you're already in the domain of commercial software; - the cloud NLP facility of CRATE is the sort of thing you're likely to run on one big computer rather than lots of computers (so one licence should suffice); - its filters are very flexible (including time-of-day restrictions and the ability to group applications); - the alternatives would involve substantial development effort for lesser benefit; ... so NetLimiter_ seems like the most cost-effective option.