7.9. NLPRP web server

This is CRATE’s implementation of a full NLPRP web server. To use it:

  1. Make sure you have the necessary other software installed and running, including Redis and (if you wish to use it as your Celery broker) RabbitMQ, plus a database such as MySQL.

  2. Create a blank database, for storing documents and processing requests transiently.

  3. Create a blank text file to contain details of your users (with their encrypted passwords).

  4. Create a processor definition file with crate_nlp_webserver_print_demo. Edit it.

  5. Create a config file with crate_nlp_webserver_print_demo. Edit it, including pointing it to the database(s), the users file, and the processors file, and setting an encryption key (e.g. with crate_nlp_webserver_generate_encryption_key). For more details, see below.

  6. Initialize your empty database with crate_nlp_webserver_initialize_db, pointing it at your config file.

  7. Add a test user with crate_nlp_webserver_manage_users.

  8. Launch the web server, e.g. via crate_nlp_webserver_pserve or crate_nlp_webserver_launch_gunicorn.

  9. Launch the Celery workers with crate_nlp_webserver_launch_celery.

To test it, set up your NLP client for a cloud processor, point it at your server, and try some NLP. Suppose your NLP definition is called cloud_nlp_demo:

# Show what the server's offering:
crate_nlp --nlpdef cloud_nlp_demo --verbose --print_cloud_processors

# Run without queuing:
crate_nlp --nlpdef cloud_nlp_demo --verbose --full --cloud --immediate

# Run with queuing:
crate_nlp --nlpdef cloud_nlp_demo --verbose --full --cloud
crate_nlp --nlpdef cloud_nlp_demo --verbose --full --showqueue
# crate_nlp --nlpdef cloud_nlp_demo --verbose --full --cancelrequest
# crate_nlp --nlpdef cloud_nlp_demo --verbose --full --cancelall
crate_nlp --nlpdef cloud_nlp_demo --verbose --full --retrieve

7.9.1. crate_nlp_webserver_print_demo

Prints a demo NLP web server config.

USAGE: crate_nlp_webserver_print_demo [-h] [--config | --processors]

Print demo config file or demo processor constants file for server side cloud
nlp.

OPTIONAL ARGUMENTS:
  -h, --help    show this help message and exit
  --config      Print a demo config file for server side cloud nlp.
  --processors  Print a demo processor constants file for server side cloud
                nlp.

7.9.2. Config file format

The NLP web server’s config file is a PasteDeploy file. This system is used to define WSGI applications and servers.

Here’s a specimen config file:

# This is a "paste" configuration file for the CRATE NLPRP web server.

# =============================================================================
# The CRATE NLPRP server web application
# =============================================================================

[app:main]

use = egg:crate_anon#main
pyramid.reload_templates = true
# pyramid.includes =
#     pyramid_debugtoolbar

nlp_webserver.secret = changethis
sqlalchemy.url = mysql://username:password@localhost/dbname?charset=utf8

# Absolute path of users file
users_file = /home/.../nlp_web_files/users.txt

# Absolute path of processors file - this must be a .py file in the correct
# format
processors_path = /home/.../nlp_web_files/processor_constants.py

# URLs for queueing
broker_url = amqp://localhost/
backend_url = db+mysql://username:password@localhost/backenddbname?charset=utf8

# Key for reversible encryption. Use 'crate_nlp_webserver_generate_encryption_key'.
encryption_key =

# =============================================================================
# The web server software
# =============================================================================

[server:main]

use = egg:waitress#main
listen = localhost:6543

7.9.2.1. Application section

The [app:main] section defines an application named main, which is the default name. Options within this section are provided as keyword arguments to the WSGI factory; see crate_anon.nlp_webserver.wsgi_app.make_wsgi_app() (and its settings argument) to see how this works.

These options include:

  1. use, which is a PasteDeploy setting to say where the code for the WSGI application lives. For CRATE’s NLP server, this should be egg:crate_anon or egg:crate_anon#main 1.

  2. Pyramid settings, such as pyramid.reload_templates;

  3. CRATE NLP web server settings, as follows.

7.9.2.1.1. nlp_webserver.secret

String.

A secret key for cookies (see Pyramid AuthTktAuthenticationPolicy; make one using crate_nlp_webserver_generate_encryption_key).

7.9.2.1.2. sqlalchemy.url

String.

The SQLAlchemy URL to your database; see database URLs.

Other SQLAlchemy parameters also work; all begin sqlalchemy.` For example, ``sqlalchemy.echo = True enables a debugging feature where all SQL is echoed.

7.9.2.1.3. users_file

String.

The path to your user definition file; see crate_nlp_webserver_manage_users.

7.9.2.1.4. processors_path

String.

The path to your processor definition file; see Processors file format.

7.9.2.1.5. broker_url

String.

The URL to your Celery broker server, e.g. via AMQP, for back-end processing.

7.9.2.1.6. backend_url

String. Default: None.

The URL to your Celery backend database, used to store queuing information. For the format, see Celery database URL examples.

  • You can ignore this, as it is not necessary to configure a backend for Celery, since results are stored elsewhere. See Internals.

  • If you do want to enable a backend: you can use the same database as above, if you wish, or you can create a separate database for Celery.

7.9.2.1.7. encryption_key

String.

A secret key used for password encryption in the users file. You can make one with crate_nlp_webserver_generate_encryption_key.

7.9.2.1.8. redis_host

String. Default: localhost.

Host for Redis database,

7.9.2.1.9. redis_port

Integer. Default: 6379.

Port for Redis.

7.9.2.1.10. redis_password

String. Default: None.

Password for Redis.

7.9.2.1.11. redis_db_number

Integer. Default: 0.

Database number for Redis.

7.9.2.2. Web server section

The [server:main] section defines the web server configuration for the app named main.

  • The use setting determines which web server should be used.

  • Other parameters are passed to the web server in use

Examples include:

  • Waitress:

    [server:main]
    use = egg:waitress#main
    # ... alternative: use = egg:crate_anon#waitress
    listen = localhost:6543
    

    For arguments, see usage and Arguments to waitress.serve.

  • CherryPy:

    [server:main]
    use = egg:crate_anon#cherrypy
    server.socket_host = 127.0.0.1
    server.socket_port = 8080
    

    For arguments, see CherryPy: Configure.

  • Gunicorn (Linux only):

    [server:main]
    use = egg:gunicorn#main
    bind = localhost:6543
    workers = 4
    # certfile = /etc/ssl/certs/ca-certificates.crt
    # ssl_version = 5
    

    For arguments, see Gunicorn: Settings.

7.9.3. Processors file format

This is a Python file whose job is to define the PROCESSORS variable. This is a list of dictionaries in the format shown below. Each dictionary defines a processor’s:

  • name;

  • descriptive title;

  • version string;

  • whether this is the default version (used when the client doesn’t ask for a particular version);

  • processor type (e.g. GATE, CRATE);

  • schema (database table) information, if known.

As you will see below, CRATE does all this work for you, for its own processors, via crate_anon.nlp_manager.all_processors.all_crate_python_processors_nlprp_processor_info().

Specimen processors file:

#!/usr/bin/env python

"""
Autogenerated NLP processor definition file, to be imported by the CRATE
NLPRP web server. The PROCESSORS variable is the one of interest.
"""

# =============================================================================
# Imports
# =============================================================================

from crate_anon.common.constants import JSON_INDENT
from crate_anon.nlp_manager.all_processors import (
    all_crate_python_processors_nlprp_processor_info,
)
from crate_anon.nlprp.constants import NlprpValues, NlprpKeys as NKeys
from crate_anon.nlp_webserver.constants import (
    KEY_PROCTYPE,
    PROCTYPE_GATE,
)


# =============================================================================
# Processor definitions
# =============================================================================

# GATE processors correct as of 19/04/2019 for KCL server.
# Python processors are automatic, as below.

PROCESSORS = all_crate_python_processors_nlprp_processor_info() + [
    # -------------------------------------------------------------------------
    # GATE processors
    # -------------------------------------------------------------------------
    {
        NKeys.NAME: "medication",
        NKeys.TITLE: "GATE processor: Medication tagger",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: (
            "Finds mentions of drug prescriptions, including the dose, "
            "route and frequency."
        ),
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "diagnosis",
        NKeys.TITLE: "GATE processor: Diagnosis finder",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: (
            "Finds mentions of diagnoses, in words or in coded form."
        ),
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "blood-pressure",
        NKeys.TITLE: "GATE processor: Blood Pressure",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "Finds mentions of blood pressure measurements.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "cbt",
        NKeys.TITLE: "GATE processor: Cognitive Behavioural Therapy",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: (
            "Identifies mentions of cases where the patient has attended "
            "CBT sessions."
        ),
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "lives-alone",
        NKeys.TITLE: "GATE processor: Lives Alone",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "Identifies if the patient lives alone.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "mmse",
        NKeys.TITLE: "GATE processor: Mini-Mental State Exam Result Extractor",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: (
            "The Mini-Mental State Exam (MMSE) Results Extractor finds the "
            "results of this common dementia screening test within documents "
            "along with the date on which the test was administered."
        ),
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "bmi",
        NKeys.TITLE: "GATE processor: Body Mass Index",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "Finds mentions of BMI scores.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "smoking",
        NKeys.TITLE: "GATE processor: Smoking Status Annotator",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: (
            "Identifies instances of smoking being discussed and determines "
            "the status and subject (patient or someone else)."
        ),
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "ADR",
        NKeys.TITLE: "GATE processor: ADR",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "Adverse drug event mentions in clinical notes.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "suicide",
        NKeys.TITLE: "GATE processor: Symptom finder - Suicide",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "App derived from TextHunter project suicide.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "appetite",
        NKeys.TITLE: "GATE processor: Symptom finder - Appetite",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "Finds markers of good or poor appetite.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
    {
        NKeys.NAME: "low_mood",
        NKeys.TITLE: "GATE processor: Symptom finder - Low_Mood",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "App derived from TextHunter project low_mood.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN,
    },
]


# =============================================================================
# Convenience method: if you run the file, it prints its results.
# =============================================================================

if __name__ == "__main__":
    import json  # delayed import

    print(json.dumps(PROCESSORS, indent=JSON_INDENT, sort_keys=True))

7.9.4. crate_nlp_webserver_initialize_db

USAGE: crate_nlp_webserver_initialize_db [-h] config_uri

Tool to initialize the database used by CRATE's implementation of an NLPRP
server.

POSITIONAL ARGUMENTS:
  config_uri  Config file to read (e.g. 'development.ini'); URL of database is
              found here.

OPTIONAL ARGUMENTS:
  -h, --help  show this help message and exit

7.9.5. crate_nlp_webserver_manage_users

usage: crate_nlp_webserver_manage_users [-h]
                                        [--adduser USERNAME PASSWORD | --rmuser USERNAME | --changepw USERNAME PASSWORD]

Manage users for the CRATE nlp_web server.

optional arguments:
  -h, --help            show this help message and exit
  --adduser USERNAME PASSWORD
                        Add a user and associated password.
  --rmuser USERNAME     Remove a user by specifying their username.
  --changepw USERNAME PASSWORD
                        Change a user's password.

# Generated at 2019-09-28 18:29:51

7.9.6. crate_nlp_webserver_generate_encryption_key

Generates a random encryption key and prints it to the screen.

7.9.7. crate_nlp_webserver_pserve

This is the standard Pyramid pserve command. At its most basic, it takes a single parameter, being the name of your NLP web server config file, and it starts the web server.

Note that its help (provided by Pyramid’s pserve itself) talks about a file URI, which might mislead you into thinking you need something like file:///home/person/blah.ini, but actually it wants a filename, like /home/person/blah.ini.

usage: crate_nlp_webserver_pserve [-h] [-n NAME] [-s SERVER_TYPE]
                                  [--server-name SECTION_NAME] [--reload]
                                  [--reload-interval RELOAD_INTERVAL] [-b]
                                  [-v] [-q]
                                  [config_uri] [config_vars [config_vars ...]]

This command serves a web application that uses a PasteDeploy
configuration file for the server and application.

You can also include variable assignments like 'http_port=8080'
and then use %(http_port)s in your config files.

positional arguments:
  config_uri            The URI to the configuration file.
  config_vars           Variables required by the config file. For example,
                        `http_port=%(http_port)s` would expect
                        `http_port=8080` to be passed here.

optional arguments:
  -h, --help            show this help message and exit
  -n NAME, --app-name NAME
                        Load the named application (default main)
  -s SERVER_TYPE, --server SERVER_TYPE
                        Use the named server.
  --server-name SECTION_NAME
                        Use the named server as defined in the configuration
                        file (default: main)
  --reload              Use auto-restart file monitor
  --reload-interval RELOAD_INTERVAL
                        Seconds between checking files (low number can cause
                        significant CPU usage)
  -b, --browser         Open a web browser to the server url. The server url
                        is determined from the 'open_url' setting in the
                        'pserve' section of the configuration file.
  -v, --verbose         Set verbose level (default 1)
  -q, --quiet           Suppress verbose output

7.9.8. crate_nlp_webserver_launch_gunicorn

This is the preferred alternative to crate_nlp_webserver_pserve for launching the CRATE NLP web server via Gunicorn (it stops Gunicorn complaining but otherwise does the same thing).

USAGE: crate_nlp_webserver_launch_gunicorn [-h] [--crate_config CRATE_CONFIG]

Launch CRATE NLP web server via Gunicorn. (Any leftover arguments will be
passed to Gunicorn.)

OPTIONAL ARGUMENTS:
  -h, --help            show this help message and exit
  --crate_config CRATE_CONFIG
                        CRATE NLP web server config file (default is read from
                        environment variable CRATE_NLP_WEB_CONFIG)

7.9.9. crate_nlp_webserver_launch_celery

This launches the Celery back-end job controller for the CRATE NLP web server. It needs to be running for your NLP web server to do any proper work!

USAGE: crate_nlp_webserver_launch_celery [-h] [--command COMMAND]
                                         [--cleanup_timeout_s CLEANUP_TIMEOUT_S]
                                         [--debug]

Launch CRATE NLP web server Celery processes. (Any leftover arguments will be
passed to Celery.)

OPTIONAL ARGUMENTS:
  -h, --help            show this help message and exit
  --command COMMAND     Celery command (default: worker)
  --cleanup_timeout_s CLEANUP_TIMEOUT_S
                        Time to wait when shutting down Celery via Ctrl-C
                        (default: 10.0)
  --debug               Ask Celery to be verbose (default: False)

7.9.10. crate_nlp_webserver_launch_flower

This command has no options. It launches the Celery Flower tool, which is for monitoring Celery, and associates it with the CRATE NLP web server. It starts a local web server (by default on port 5555; see TCP/IP ports); if you browse to http://localhost:5555/ or http://127.0.0.1:5555/, you can monitor what’s happening.

7.9.11. Internal operations: where is your data stored?

  • CRATE’s NLP web server uses Redis to store web sessions (for user/session authentication). No content is stored here.

  • It uses Celery for back-end jobs.

    • Celery is configured with a broker and a backend.

    • The broker is a messaging system, such as RabbitMQ via AMQP.

    • The backend is typically a database of jobs. Job results are stored here, but CRATE does not use this database for storing job results; it uses a separate database (used for storing, transiently, the potentially confidential incoming client information and outgoing NLP results).

    • If you want, the Celery backend database can be the same as your main CRATE NLP server database (Celery uses tables named celery_taskmeta and celery_tasksetmeta; these do not conflict with CRATE’s NLP servertable names).

  • All client data and all NLP results are stored in a single database.


Footnotes

1

CRATE then defines paste.app_factory in its setup.py, which allows PasteDeploy to find the actual WSGI app factory.