5.9. NLPRP web server

This is CRATE’s implementation of a full NLPRP web server. To use it:

Make sure you have the necessary other software installed and running, including Redis and (if you wish to use it as your Celery broker) RabbitMQ, plus a database such as MySQL.
Create a blank database, for storing documents and processing requests transiently.
Create a blank text file to contain details of your users (with their encrypted passwords).
Create a processor definition file with crate_nlp_webserver_print_demo. Edit it.
Create a config file with crate_nlp_webserver_print_demo. Edit it, including pointing it to the database(s), the users file, and the processors file, and setting an encryption key (e.g. with crate_nlp_webserver_generate_encryption_key). For more details, see below.
Initialize your empty database with crate_nlp_webserver_initialize_db, pointing it at your config file.
Add a test user with crate_nlp_webserver_manage_users.
Launch the web server, e.g. via crate_nlp_webserver_pserve or crate_nlp_webserver_launch_gunicorn.
Launch the Celery workers with crate_nlp_webserver_launch_celery.

To test it, set up your NLP client for a cloud processor, point it at your server, and try some NLP. Suppose your NLP definition is called cloud_nlp_demo:

# Show what the server's offering:
crate_nlp --nlpdef cloud_nlp_demo --verbose --print_cloud_processors

# Run without queuing:
crate_nlp --nlpdef cloud_nlp_demo --verbose --full --cloud --immediate

# Run with queuing:
crate_nlp --nlpdef cloud_nlp_demo --verbose --full --cloud
crate_nlp --nlpdef cloud_nlp_demo --verbose --full --showqueue
# crate_nlp --nlpdef cloud_nlp_demo --verbose --full --cancelrequest
# crate_nlp --nlpdef cloud_nlp_demo --verbose --full --cancelall
crate_nlp --nlpdef cloud_nlp_demo --verbose --full --retrieve

5.9.1. crate_nlp_webserver_print_demo

Prints a demo NLP web server config.

USAGE: crate_nlp_webserver_print_demo [-h] [--config | --processors]

Print demo config file or demo processor constants file for server side cloud
nlp.

OPTIONS:
  -h, --help    show this help message and exit
  --config      Print a demo config file for server side cloud nlp.
  --processors  Print a demo processor constants file for server side cloud
                nlp.

5.9.2. Config file format

The NLP web server’s config file is a PasteDeploy file. This system is used to define WSGI applications and servers.

Here’s a specimen config file:

# This is a "paste" configuration file for the CRATE NLPRP web server.

# =============================================================================
# The CRATE NLPRP server web application
# =============================================================================

[app:main]

use = egg:crate_anon#main
pyramid.reload_templates = true
# pyramid.includes =
#     pyramid_debugtoolbar

nlp_webserver.secret = changethis
sqlalchemy.url = mysql://username:password@localhost/dbname?charset=utf8

# Absolute path of users file
users_file = /home/.../nlp_web_files/users.txt

# Absolute path of processors file - this must be a .py file in the correct
# format
processors_path = /home/.../nlp_web_files/processor_constants.py

# URLs for queueing
broker_url = amqp://localhost/
backend_url = db+mysql://username:password@localhost/backenddbname?charset=utf8

# Key for reversible encryption. Use 'crate_nlp_webserver_generate_encryption_key'.
encryption_key =

# =============================================================================
# The web server software
# =============================================================================

[server:main]

use = egg:waitress#main
listen = localhost:6543

5.9.2.1. Application section

The [app:main] section defines an application named main, which is the default name. Options within this section are provided as keyword arguments to the WSGI factory; see crate_anon.nlp_webserver.wsgi_app.make_wsgi_app() (and its settings argument) to see how this works.

These options include:

use, which is a PasteDeploy setting to say where the code for the WSGI application lives. For CRATE’s NLP server, this should be egg:crate_anon or egg:crate_anon#main [1].
Pyramid settings, such as pyramid.reload_templates;
CRATE NLP web server settings, as follows.

5.9.2.1.1. nlp_webserver.secret

String.

A secret key for cookies (see Pyramid AuthTktAuthenticationPolicy; make one using crate_nlp_webserver_generate_encryption_key).

5.9.2.1.2. sqlalchemy.url

String.

The SQLAlchemy URL to your database; see database URLs.

Other SQLAlchemy parameters also work; all begin sqlalchemy.` For example, ``sqlalchemy.echo = True enables a debugging feature where all SQL is echoed.

5.9.2.1.3. users_file

String.

The path to your user definition file; see crate_nlp_webserver_manage_users.

5.9.2.1.4. processors_path

String.

The path to your processor definition file; see Processors file format.

5.9.2.1.5. broker_url

String.

The URL to your Celery broker server, e.g. via AMQP, for back-end processing.

5.9.2.1.6. backend_url

String. Default: None.

The URL to your Celery backend database, used to store queuing information. For the format, see Celery database URL examples.

You can ignore this, as it is not necessary to configure a backend for Celery, since results are stored elsewhere. See Internals.
If you do want to enable a backend: you can use the same database as above, if you wish, or you can create a separate database for Celery.

5.9.2.1.7. encryption_key

String.

A secret key used for password encryption in the users file. You can make one with crate_nlp_webserver_generate_encryption_key.

5.9.2.1.8. redis_host

String. Default: localhost.

Host for Redis database,

5.9.2.1.9. redis_port

Integer. Default: 6379.

Port for Redis.

5.9.2.1.10. redis_password

String. Default: None.

Password for Redis.

5.9.2.1.11. redis_db_number

Integer. Default: 0.

Database number for Redis.

5.9.2.2. Web server section

The [server:main] section defines the web server configuration for the app named main.

The use setting determines which web server should be used.
Other parameters are passed to the web server in use

Examples include:

Waitress:

[server:main]
use = egg:waitress#main
# ... alternative: use = egg:crate_anon#waitress
listen = localhost:6543

For arguments, see usage and Arguments to waitress.serve.

CherryPy:

[server:main]
use = egg:crate_anon#cherrypy
server.socket_host = 127.0.0.1
server.socket_port = 8080

For arguments, see CherryPy: Configure.

Gunicorn (Linux only):

[server:main]
use = egg:gunicorn#main
bind = localhost:6543
workers = 4
# certfile = /etc/ssl/certs/ca-certificates.crt
# ssl_version = 5

For arguments, see Gunicorn: Settings.

5.9.3. Processors file format

This is a Python file whose job is to define the PROCESSORS variable. This is a list of dictionaries in the format shown below. Each dictionary defines a processor’s:

name;
descriptive title;
version string;
whether this is the default version (used when the client doesn’t ask for a particular version);
processor type (e.g. GATE, CRATE);
schema (database table) information, if known.

As you will see below, CRATE does all this work for you, for its own processors, via crate_anon.nlp_manager.all_processors.all_crate_python_processors_nlprp_processor_info().

Specimen processors file:

#!/usr/bin/env python

"""
Autogenerated NLP processor definition file, to be imported by the CRATE
NLPRP web server. The PROCESSORS variable is the one of interest.
"""

# =============================================================================
# Imports
# =============================================================================

from crate_anon.common.constants import JSON_INDENT
from crate_anon.nlp_manager.all_processors import (
    all_crate_python_processors_nlprp_processor_info,
)
from crate_anon.nlprp.constants import NlprpValues, NlprpKeys as NKeys
from crate_anon.nlp_webserver.constants import (
    KEY_PROCTYPE,
    PROCTYPE_GATE,
)

# =============================================================================
# Processor definitions
# =============================================================================

# GATE processors correct as of 19/04/2019 for KCL server.
# Python processors are automatic, as below.

PROCESSORS = all_crate_python_processors_nlprp_processor_info() + [
    # -------------------------------------------------------------------------
    # GATE processors
    # -------------------------------------------------------------------------
    {
        NKeys.NAME: "medication",
        NKeys.TITLE: "GATE processor: Medication tagger",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: (
            "Finds mentions of drug prescriptions, including the dose, "
            "route and frequency."
        ),
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "diagnosis",
        NKeys.TITLE: "GATE processor: Diagnosis finder",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: (
            "Finds mentions of diagnoses, in words or in coded form."
        ),
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "blood-pressure",
        NKeys.TITLE: "GATE processor: Blood Pressure",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "Finds mentions of blood pressure measurements.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "cbt",
        NKeys.TITLE: "GATE processor: Cognitive Behavioural Therapy",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: (
            "Identifies mentions of cases where the patient has attended "
            "CBT sessions."
        ),
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "lives-alone",
        NKeys.TITLE: "GATE processor: Lives Alone",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "Identifies if the patient lives alone.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "mmse",
        NKeys.TITLE: "GATE processor: Mini-Mental State Exam Result Extractor",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: (
            "The Mini-Mental State Exam (MMSE) Results Extractor finds the "
            "results of this common dementia screening test within documents "
            "along with the date on which the test was administered."
        ),
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "bmi",
        NKeys.TITLE: "GATE processor: Body Mass Index",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "Finds mentions of BMI scores.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "smoking",
        NKeys.TITLE: "GATE processor: Smoking Status Annotator",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: (
            "Identifies instances of smoking being discussed and determines "
            "the status and subject (patient or someone else)."
        ),
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "ADR",
        NKeys.TITLE: "GATE processor: ADR",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "Adverse drug event mentions in clinical notes.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "suicide",
        NKeys.TITLE: "GATE processor: Symptom finder - Suicide",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "App derived from TextHunter project suicide.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "appetite",
        NKeys.TITLE: "GATE processor: Symptom finder - Appetite",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "Finds markers of good or poor appetite.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "low_mood",
        NKeys.TITLE: "GATE processor: Symptom finder - Low_Mood",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "App derived from TextHunter project low_mood.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
]


# =============================================================================
# Convenience method: if you run the file, it prints its results.
# =============================================================================

if __name__ == "__main__":
    import json  # delayed import

    print(json.dumps(PROCESSORS, indent=JSON_INDENT, sort_keys=True))

5.9.4. crate_nlp_webserver_initialize_db

USAGE: crate_nlp_webserver_initialize_db [-h] config_uri

Tool to initialize the database used by CRATE's implementation of an NLPRP
server.

POSITIONAL ARGUMENTS:
  config_uri  Config file to read (e.g. 'development.ini'); URL of database is
              found here.

OPTIONS:
  -h, --help  show this help message and exit

5.9.5. crate_nlp_webserver_manage_users

usage: crate_nlp_webserver_manage_users [-h]
                                        [--adduser USERNAME PASSWORD | --rmuser USERNAME | --changepw USERNAME PASSWORD]

Manage users for the CRATE nlp_web server.

optional arguments:
  -h, --help            show this help message and exit
  --adduser USERNAME PASSWORD
                        Add a user and associated password.
  --rmuser USERNAME     Remove a user by specifying their username.
  --changepw USERNAME PASSWORD
                        Change a user's password.

# Generated at 2019-09-28 18:29:51

5.9.6. crate_nlp_webserver_generate_encryption_key

Generates a random encryption key and prints it to the screen.

5.9.7. crate_nlp_webserver_pserve

This is the standard Pyramid pserve command. At its most basic, it takes a single parameter, being the name of your NLP web server config file, and it starts the web server.

Note that its help (provided by Pyramid’s pserve itself) talks about a file URI, which might mislead you into thinking you need something like file:///home/person/blah.ini, but actually it wants a filename, like /home/person/blah.ini.

usage: crate_nlp_webserver_pserve [-h] [-n NAME] [-s SERVER_TYPE]
                                  [--server-name SECTION_NAME] [--reload]
                                  [--reload-interval RELOAD_INTERVAL] [-b]
                                  [-v] [-q]
                                  [config_uri] [config_vars ...]

This command serves a web application that uses a PasteDeploy
configuration file for the server and application.

You can also include variable assignments like 'http_port=8080'
and then use %(http_port)s in your config files.

positional arguments:
  config_uri            The URI to the configuration file.
  config_vars           Variables required by the config file. For example,
                        `http_port=%(http_port)s` would expect
                        `http_port=8080` to be passed here.

options:
  -h, --help            show this help message and exit
  -n NAME, --app-name NAME
                        Load the named application (default main)
  -s SERVER_TYPE, --server SERVER_TYPE
                        Use the named server.
  --server-name SECTION_NAME
                        Use the named server as defined in the configuration
                        file (default: main)
  --reload              Use auto-restart file monitor
  --reload-interval RELOAD_INTERVAL
                        Seconds between checking files (low number can cause
                        significant CPU usage)
  -b, --browser         Open a web browser to the server url. The server url
                        is determined from the 'open_url' setting in the
                        'pserve' section of the configuration file.
  -v, --verbose         Set verbose level (default 1)
  -q, --quiet           Suppress verbose output

5.9.8. crate_nlp_webserver_launch_gunicorn

This is the preferred alternative to crate_nlp_webserver_pserve for launching the CRATE NLP web server via Gunicorn (it stops Gunicorn complaining but otherwise does the same thing).

USAGE: crate_nlp_webserver_launch_gunicorn [-h] [--crate_config CRATE_CONFIG]

Launch CRATE NLP web server via Gunicorn. (Any leftover arguments will be
passed to Gunicorn.)

OPTIONS:
  -h, --help            show this help message and exit
  --crate_config CRATE_CONFIG
                        CRATE NLP web server config file (default is read from
                        environment variable CRATE_NLP_WEB_CONFIG)

5.9.9. crate_nlp_webserver_launch_celery

This launches the Celery back-end job controller for the CRATE NLP web server. It needs to be running for your NLP web server to do any proper work!

USAGE: crate_nlp_webserver_launch_celery [-h] [--command COMMAND]
                                         [--cleanup_timeout_s CLEANUP_TIMEOUT_S]
                                         [--debug]

Launch CRATE NLP web server Celery processes. (Any leftover arguments will be
passed to Celery.)

OPTIONS:
  -h, --help            show this help message and exit
  --command COMMAND     Celery command (default: worker)
  --cleanup_timeout_s CLEANUP_TIMEOUT_S
                        Time to wait when shutting down Celery via Ctrl-C
                        (default: 10.0)
  --debug               Ask Celery to be verbose (default: False)

5.9.10. crate_nlp_webserver_launch_flower

This command has no options. It launches the Celery Flower tool, which is for monitoring Celery, and associates it with the CRATE NLP web server. It starts a local web server (by default on port 5555; see TCP/IP ports); if you browse to http://localhost:5555/ or http://127.0.0.1:5555/, you can monitor what’s happening.

5.9.11. Internal operations: where is your data stored?

CRATE’s NLP web server uses Redis to store web sessions (for user/session authentication). No content is stored here.
It uses Celery for back-end jobs.
- Celery is configured with a broker and a backend.
- The broker is a messaging system, such as RabbitMQ via AMQP.
- The backend is typically a database of jobs. Job results are stored here, but CRATE does not use this database for storing job results; it uses a separate database (used for storing, transiently, the potentially confidential incoming client information and outgoing NLP results).
- If you want, the Celery backend database can be the same as your main CRATE NLP server database (Celery uses tables named celery_taskmeta and celery_tasksetmeta; these do not conflict with CRATE’s NLP servertable names).
All client data and all NLP results are stored in a single database.

Footnotes