7.9. NLPRP web server
This is CRATE’s implementation of a full NLPRP web server. To use it:
Make sure you have the necessary other software installed and running, including Redis and (if you wish to use it as your Celery broker) RabbitMQ, plus a database such as MySQL.
Create a blank database, for storing documents and processing requests transiently.
Create a blank text file to contain details of your users (with their encrypted passwords).
Create a processor definition file with crate_nlp_webserver_print_demo. Edit it.
Create a config file with crate_nlp_webserver_print_demo. Edit it, including pointing it to the database(s), the users file, and the processors file, and setting an encryption key (e.g. with crate_nlp_webserver_generate_encryption_key). For more details, see below.
Initialize your empty database with crate_nlp_webserver_initialize_db, pointing it at your config file.
Add a test user with crate_nlp_webserver_manage_users.
Launch the web server, e.g. via crate_nlp_webserver_pserve or crate_nlp_webserver_launch_gunicorn.
Launch the Celery workers with crate_nlp_webserver_launch_celery.
To test it, set up your NLP client for a cloud processor, point it at your server, and try some NLP.
Suppose your NLP definition is called
cloud_nlp_demo
:
# Show what the server's offering:
crate_nlp --nlpdef cloud_nlp_demo --verbose --print_cloud_processors
# Run without queuing:
crate_nlp --nlpdef cloud_nlp_demo --verbose --full --cloud --immediate
# Run with queuing:
crate_nlp --nlpdef cloud_nlp_demo --verbose --full --cloud
crate_nlp --nlpdef cloud_nlp_demo --verbose --full --showqueue
# crate_nlp --nlpdef cloud_nlp_demo --verbose --full --cancelrequest
# crate_nlp --nlpdef cloud_nlp_demo --verbose --full --cancelall
crate_nlp --nlpdef cloud_nlp_demo --verbose --full --retrieve
7.9.1. crate_nlp_webserver_print_demo
Prints a demo NLP web server config.
USAGE: crate_nlp_webserver_print_demo [-h] [--config | --processors]
Print demo config file or demo processor constants file for server side cloud
nlp.
OPTIONS:
-h, --help show this help message and exit
--config Print a demo config file for server side cloud nlp.
--processors Print a demo processor constants file for server side cloud
nlp.
7.9.2. Config file format
The NLP web server’s config file is a PasteDeploy file. This system is used to define WSGI applications and servers.
Here’s a specimen config file:
# This is a "paste" configuration file for the CRATE NLPRP web server.
# =============================================================================
# The CRATE NLPRP server web application
# =============================================================================
[app:main]
use = egg:crate_anon#main
pyramid.reload_templates = true
# pyramid.includes =
# pyramid_debugtoolbar
nlp_webserver.secret = changethis
sqlalchemy.url = mysql://username:password@localhost/dbname?charset=utf8
# Absolute path of users file
users_file = /home/.../nlp_web_files/users.txt
# Absolute path of processors file - this must be a .py file in the correct
# format
processors_path = /home/.../nlp_web_files/processor_constants.py
# URLs for queueing
broker_url = amqp://localhost/
backend_url = db+mysql://username:password@localhost/backenddbname?charset=utf8
# Key for reversible encryption. Use 'crate_nlp_webserver_generate_encryption_key'.
encryption_key =
# =============================================================================
# The web server software
# =============================================================================
[server:main]
use = egg:waitress#main
listen = localhost:6543
7.9.2.1. Application section
The [app:main]
section defines an application named main, which is
the default name. Options within this section are provided as keyword arguments
to the WSGI factory; see
crate_anon.nlp_webserver.wsgi_app.make_wsgi_app()
(and its settings
argument) to see how this works.
These options include:
use
, which is a PasteDeploy setting to say where the code for the WSGI application lives. For CRATE’s NLP server, this should beegg:crate_anon
oregg:crate_anon#main
1.Pyramid settings, such as
pyramid.reload_templates
;CRATE NLP web server settings, as follows.
7.9.2.1.1. nlp_webserver.secret
String.
A secret key for cookies (see Pyramid AuthTktAuthenticationPolicy; make one using crate_nlp_webserver_generate_encryption_key).
7.9.2.1.2. sqlalchemy.url
String.
The SQLAlchemy URL to your database; see database URLs.
Other SQLAlchemy parameters also work; all begin sqlalchemy.` For
example, ``sqlalchemy.echo = True
enables a debugging feature where all SQL
is echoed.
7.9.2.1.3. users_file
String.
The path to your user definition file; see crate_nlp_webserver_manage_users.
7.9.2.1.4. processors_path
String.
The path to your processor definition file; see Processors file format.
7.9.2.1.5. broker_url
String.
The URL to your Celery broker server, e.g. via AMQP, for back-end processing.
7.9.2.1.6. backend_url
String. Default: None.
The URL to your Celery backend database, used to store queuing information. For the format, see Celery database URL examples.
You can ignore this, as it is not necessary to configure a backend for Celery, since results are stored elsewhere. See Internals.
If you do want to enable a backend: you can use the same database as above, if you wish, or you can create a separate database for Celery.
7.9.2.1.7. encryption_key
String.
A secret key used for password encryption in the users file. You can make one with crate_nlp_webserver_generate_encryption_key.
7.9.2.1.8. redis_host
String. Default: localhost
.
Host for Redis database,
7.9.2.1.9. redis_port
Integer. Default: 6379.
Port for Redis.
7.9.2.1.10. redis_password
String. Default: None.
Password for Redis.
7.9.2.1.11. redis_db_number
Integer. Default: 0.
Database number for Redis.
7.9.2.2. Web server section
The [server:main]
section defines the web server configuration for the app
named main
.
The
use
setting determines which web server should be used.Other parameters are passed to the web server in use
Examples include:
-
[server:main] use = egg:waitress#main # ... alternative: use = egg:crate_anon#waitress listen = localhost:6543
For arguments, see usage and Arguments to waitress.serve.
-
[server:main] use = egg:crate_anon#cherrypy server.socket_host = 127.0.0.1 server.socket_port = 8080
For arguments, see CherryPy: Configure.
Gunicorn (Linux only):
[server:main] use = egg:gunicorn#main bind = localhost:6543 workers = 4 # certfile = /etc/ssl/certs/ca-certificates.crt # ssl_version = 5
For arguments, see Gunicorn: Settings.
7.9.3. Processors file format
This is a Python file whose job is to define the PROCESSORS
variable.
This is a list of dictionaries in the format shown below. Each dictionary
defines a processor’s:
name;
descriptive title;
version string;
whether this is the default version (used when the client doesn’t ask for a particular version);
processor type (e.g. GATE, CRATE);
schema (database table) information, if known.
As you will see below, CRATE does all this work for you, for its own
processors, via
crate_anon.nlp_manager.all_processors.all_crate_python_processors_nlprp_processor_info()
.
Specimen processors file:
#!/usr/bin/env python
"""
Autogenerated NLP processor definition file, to be imported by the CRATE
NLPRP web server. The PROCESSORS variable is the one of interest.
"""
# =============================================================================
# Imports
# =============================================================================
from crate_anon.common.constants import JSON_INDENT
from crate_anon.nlp_manager.all_processors import (
all_crate_python_processors_nlprp_processor_info,
)
from crate_anon.nlprp.constants import NlprpValues, NlprpKeys as NKeys
from crate_anon.nlp_webserver.constants import (
KEY_PROCTYPE,
PROCTYPE_GATE,
)
# =============================================================================
# Processor definitions
# =============================================================================
# GATE processors correct as of 19/04/2019 for KCL server.
# Python processors are automatic, as below.
PROCESSORS = all_crate_python_processors_nlprp_processor_info() + [
# -------------------------------------------------------------------------
# GATE processors
# -------------------------------------------------------------------------
{
NKeys.NAME: "medication",
NKeys.TITLE: "GATE processor: Medication tagger",
NKeys.VERSION: "0.1",
NKeys.IS_DEFAULT_VERSION: True,
NKeys.DESCRIPTION: (
"Finds mentions of drug prescriptions, including the dose, "
"route and frequency."
),
KEY_PROCTYPE: PROCTYPE_GATE,
NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
},
{
NKeys.NAME: "diagnosis",
NKeys.TITLE: "GATE processor: Diagnosis finder",
NKeys.VERSION: "0.1",
NKeys.IS_DEFAULT_VERSION: True,
NKeys.DESCRIPTION: (
"Finds mentions of diagnoses, in words or in coded form."
),
KEY_PROCTYPE: PROCTYPE_GATE,
NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
},
{
NKeys.NAME: "blood-pressure",
NKeys.TITLE: "GATE processor: Blood Pressure",
NKeys.VERSION: "0.1",
NKeys.IS_DEFAULT_VERSION: True,
NKeys.DESCRIPTION: "Finds mentions of blood pressure measurements.",
KEY_PROCTYPE: PROCTYPE_GATE,
NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
},
{
NKeys.NAME: "cbt",
NKeys.TITLE: "GATE processor: Cognitive Behavioural Therapy",
NKeys.VERSION: "0.1",
NKeys.IS_DEFAULT_VERSION: True,
NKeys.DESCRIPTION: (
"Identifies mentions of cases where the patient has attended "
"CBT sessions."
),
KEY_PROCTYPE: PROCTYPE_GATE,
NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
},
{
NKeys.NAME: "lives-alone",
NKeys.TITLE: "GATE processor: Lives Alone",
NKeys.VERSION: "0.1",
NKeys.IS_DEFAULT_VERSION: True,
NKeys.DESCRIPTION: "Identifies if the patient lives alone.",
KEY_PROCTYPE: PROCTYPE_GATE,
NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
},
{
NKeys.NAME: "mmse",
NKeys.TITLE: "GATE processor: Mini-Mental State Exam Result Extractor",
NKeys.VERSION: "0.1",
NKeys.IS_DEFAULT_VERSION: True,
NKeys.DESCRIPTION: (
"The Mini-Mental State Exam (MMSE) Results Extractor finds the "
"results of this common dementia screening test within documents "
"along with the date on which the test was administered."
),
KEY_PROCTYPE: PROCTYPE_GATE,
NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
},
{
NKeys.NAME: "bmi",
NKeys.TITLE: "GATE processor: Body Mass Index",
NKeys.VERSION: "0.1",
NKeys.IS_DEFAULT_VERSION: True,
NKeys.DESCRIPTION: "Finds mentions of BMI scores.",
KEY_PROCTYPE: PROCTYPE_GATE,
NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
},
{
NKeys.NAME: "smoking",
NKeys.TITLE: "GATE processor: Smoking Status Annotator",
NKeys.VERSION: "0.1",
NKeys.IS_DEFAULT_VERSION: True,
NKeys.DESCRIPTION: (
"Identifies instances of smoking being discussed and determines "
"the status and subject (patient or someone else)."
),
KEY_PROCTYPE: PROCTYPE_GATE,
NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
},
{
NKeys.NAME: "ADR",
NKeys.TITLE: "GATE processor: ADR",
NKeys.VERSION: "0.1",
NKeys.IS_DEFAULT_VERSION: True,
NKeys.DESCRIPTION: "Adverse drug event mentions in clinical notes.",
KEY_PROCTYPE: PROCTYPE_GATE,
NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
},
{
NKeys.NAME: "suicide",
NKeys.TITLE: "GATE processor: Symptom finder - Suicide",
NKeys.VERSION: "0.1",
NKeys.IS_DEFAULT_VERSION: True,
NKeys.DESCRIPTION: "App derived from TextHunter project suicide.",
KEY_PROCTYPE: PROCTYPE_GATE,
NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
},
{
NKeys.NAME: "appetite",
NKeys.TITLE: "GATE processor: Symptom finder - Appetite",
NKeys.VERSION: "0.1",
NKeys.IS_DEFAULT_VERSION: True,
NKeys.DESCRIPTION: "Finds markers of good or poor appetite.",
KEY_PROCTYPE: PROCTYPE_GATE,
NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
},
{
NKeys.NAME: "low_mood",
NKeys.TITLE: "GATE processor: Symptom finder - Low_Mood",
NKeys.VERSION: "0.1",
NKeys.IS_DEFAULT_VERSION: True,
NKeys.DESCRIPTION: "App derived from TextHunter project low_mood.",
KEY_PROCTYPE: PROCTYPE_GATE,
NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
},
]
# =============================================================================
# Convenience method: if you run the file, it prints its results.
# =============================================================================
if __name__ == "__main__":
import json # delayed import
print(json.dumps(PROCESSORS, indent=JSON_INDENT, sort_keys=True))
7.9.4. crate_nlp_webserver_initialize_db
USAGE: crate_nlp_webserver_initialize_db [-h] config_uri
Tool to initialize the database used by CRATE's implementation of an NLPRP
server.
POSITIONAL ARGUMENTS:
config_uri Config file to read (e.g. 'development.ini'); URL of database is
found here.
OPTIONS:
-h, --help show this help message and exit
7.9.5. crate_nlp_webserver_manage_users
usage: crate_nlp_webserver_manage_users [-h]
[--adduser USERNAME PASSWORD | --rmuser USERNAME | --changepw USERNAME PASSWORD]
Manage users for the CRATE nlp_web server.
optional arguments:
-h, --help show this help message and exit
--adduser USERNAME PASSWORD
Add a user and associated password.
--rmuser USERNAME Remove a user by specifying their username.
--changepw USERNAME PASSWORD
Change a user's password.
# Generated at 2019-09-28 18:29:51
7.9.6. crate_nlp_webserver_generate_encryption_key
Generates a random encryption key and prints it to the screen.
7.9.7. crate_nlp_webserver_pserve
This is the standard Pyramid pserve command. At its most basic, it takes a single parameter, being the name of your NLP web server config file, and it starts the web server.
Note that its help (provided by Pyramid’s pserve
itself) talks about a file
URI, which might mislead you into thinking you need something like
file:///home/person/blah.ini
, but actually it wants a filename, like
/home/person/blah.ini
.
usage: crate_nlp_webserver_pserve [-h] [-n NAME] [-s SERVER_TYPE]
[--server-name SECTION_NAME] [--reload]
[--reload-interval RELOAD_INTERVAL] [-b]
[-v] [-q]
[config_uri] [config_vars ...]
This command serves a web application that uses a PasteDeploy
configuration file for the server and application.
You can also include variable assignments like 'http_port=8080'
and then use %(http_port)s in your config files.
positional arguments:
config_uri The URI to the configuration file.
config_vars Variables required by the config file. For example,
`http_port=%(http_port)s` would expect
`http_port=8080` to be passed here.
options:
-h, --help show this help message and exit
-n NAME, --app-name NAME
Load the named application (default main)
-s SERVER_TYPE, --server SERVER_TYPE
Use the named server.
--server-name SECTION_NAME
Use the named server as defined in the configuration
file (default: main)
--reload Use auto-restart file monitor
--reload-interval RELOAD_INTERVAL
Seconds between checking files (low number can cause
significant CPU usage)
-b, --browser Open a web browser to the server url. The server url
is determined from the 'open_url' setting in the
'pserve' section of the configuration file.
-v, --verbose Set verbose level (default 1)
-q, --quiet Suppress verbose output
7.9.8. crate_nlp_webserver_launch_gunicorn
This is the preferred alternative to crate_nlp_webserver_pserve for launching the CRATE NLP web server via Gunicorn (it stops Gunicorn complaining but otherwise does the same thing).
USAGE: crate_nlp_webserver_launch_gunicorn [-h] [--crate_config CRATE_CONFIG]
Launch CRATE NLP web server via Gunicorn. (Any leftover arguments will be
passed to Gunicorn.)
OPTIONS:
-h, --help show this help message and exit
--crate_config CRATE_CONFIG
CRATE NLP web server config file (default is read from
environment variable CRATE_NLP_WEB_CONFIG)
7.9.9. crate_nlp_webserver_launch_celery
This launches the Celery back-end job controller for the CRATE NLP web server. It needs to be running for your NLP web server to do any proper work!
USAGE: crate_nlp_webserver_launch_celery [-h] [--command COMMAND]
[--cleanup_timeout_s CLEANUP_TIMEOUT_S]
[--debug]
Launch CRATE NLP web server Celery processes. (Any leftover arguments will be
passed to Celery.)
OPTIONS:
-h, --help show this help message and exit
--command COMMAND Celery command (default: worker)
--cleanup_timeout_s CLEANUP_TIMEOUT_S
Time to wait when shutting down Celery via Ctrl-C
(default: 10.0)
--debug Ask Celery to be verbose (default: False)
7.9.10. crate_nlp_webserver_launch_flower
This command has no options. It launches the Celery Flower tool, which is for monitoring Celery, and associates it with the CRATE NLP web server. It starts a local web server (by default on port 5555; see TCP/IP ports); if you browse to http://localhost:5555/ or http://127.0.0.1:5555/, you can monitor what’s happening.
7.9.11. Internal operations: where is your data stored?
CRATE’s NLP web server uses Redis to store web sessions (for user/session authentication). No content is stored here.
It uses Celery for back-end jobs.
Celery is configured with a broker and a backend.
The broker is a messaging system, such as RabbitMQ via AMQP.
The backend is typically a database of jobs. Job results are stored here, but CRATE does not use this database for storing job results; it uses a separate database (used for storing, transiently, the potentially confidential incoming client information and outgoing NLP results).
If you want, the Celery backend database can be the same as your main CRATE NLP server database (Celery uses tables named
celery_taskmeta
andcelery_tasksetmeta
; these do not conflict with CRATE’s NLP servertable names).
All client data and all NLP results are stored in a single database.
Footnotes
- 1
CRATE then defines
paste.app_factory
in itssetup.py
, which allows PasteDeploy to find the actual WSGI app factory.