7.8. NLPRP web server

This is CRATE’s implementation of a full NLPRP web server. To use it:

  1. Make sure you have the necessary other software installed and running, including Redis and (if you wish to use it as your Celery broker) RabbitMQ, plus a database such as MySQL.
  2. Create a blank database, for storing documents and processing requests transiently.
  3. Create a blank text file to contain details of your users (with their encrypted passwords).
  4. Create a processor definition file with crate_nlp_webserver_print_demo. Edit it.
  5. Create a config file with crate_nlp_webserver_print_demo. Edit it, including pointing it to the database(s), the users file, and the processors file, and setting an encryption key (e.g. with crate_nlp_webserver_generate_encryption_key). For more details, see below.
  6. Initialize your empty database with crate_nlp_webserver_initialize_db, pointing it at your config file.
  7. Add a test user with crate_nlp_webserver_manage_users.
  8. Launch the web server, e.g. via crate_nlp_webserver_pserve or crate_nlp_webserver_launch_gunicorn.
  9. Launch the Celery workers with crate_nlp_webserver_launch_celery.

To test it, set up your NLP client for a cloud processor, point it at your server, and try some NLP. Suppose your NLP definition is called cloud_nlp_demo:

# Show what the server's offering:
crate_nlp --nlpdef cloud_nlp_demo --verbose --print_cloud_processors

# Run without queuing:
crate_nlp --nlpdef cloud_nlp_demo --verbose --full --cloud --immediate

# Run with queuing:
crate_nlp --nlpdef cloud_nlp_demo --verbose --full --cloud
crate_nlp --nlpdef cloud_nlp_demo --verbose --full --showqueue
# crate_nlp --nlpdef cloud_nlp_demo --verbose --full --cancelrequest
# crate_nlp --nlpdef cloud_nlp_demo --verbose --full --cancelall
crate_nlp --nlpdef cloud_nlp_demo --verbose --full --retrieve

7.8.1. crate_nlp_webserver_print_demo

Prints a demo NLP web server config.

usage: crate_nlp_webserver_print_demo [-h] [--config | --processors]

Print demo config file or demo processor constants file for server side cloud nlp.

optional arguments:
  -h, --help    show this help message and exit
  --config      Print a demo config file for server side cloud nlp.
  --processors  Print a demo processor constants file for server side cloud
                nlp.

# Generated at 2019-10-10 10:23:34

7.8.2. Config file format

The NLP web server’s config file is a PasteDeploy file. This system is used to define WSGI applications and servers.

Here’s a specimen config file:

# This is a "paste" configuration file for the CRATE NLPRP web server.

# =============================================================================
# The CRATE NLPRP server web application 
# =============================================================================

[app:main]

use = egg:crate_anon#main
pyramid.reload_templates = true
# pyramid.includes =
#     pyramid_debugtoolbar

nlp_webserver.secret = changethis
sqlalchemy.url = mysql://username:password@localhost/dbname?charset=utf8

# Absolute path of users file
users_file = /home/.../nlp_web_files/users.txt

# Absolute path of processors file - this must be a .py file in the correct
# format
processors_path = /home/.../nlp_web_files/processor_constants.py

# URLs for queueing
broker_url = amqp://localhost/
backend_url = db+mysql://username:password@localhost/backenddbname?charset=utf8

# Key for reversible encryption. Use 'crate_nlp_webserver_generate_encryption_key'.
encryption_key =

# =============================================================================
# The web server software
# =============================================================================

[server:main]

use = egg:waitress#main
listen = localhost:6543


# Generated at 2019-10-10 10:23:35

7.8.2.1. Application section

The [app:main] section defines an application named main, which is the default name. Options within this section are provided as keyword arguments to the WSGI factory; see crate_anon.nlp_webserver.wsgi_app.make_wsgi_app() (and its settings argument) to see how this works.

These options include:

  1. use, which is a PasteDeploy setting to say where the code for the WSGI application lives. For CRATE’s NLP server, this should be egg:crate_anon or egg:crate_anon#main [1].
  2. Pyramid settings, such as pyramid.reload_templates;
  3. CRATE NLP web server settings, as follows.

7.8.2.1.1. nlp_webserver.secret

String.

A secret key for cookies (see Pyramid AuthTktAuthenticationPolicy; make one using crate_nlp_webserver_generate_encryption_key).

7.8.2.1.2. sqlalchemy.url

String.

The SQLAlchemy URL to your database; see database URLs.

Other SQLAlchemy parameters also work; all begin sqlalchemy.` For example, ``sqlalchemy.echo = True enables a debugging feature where all SQL is echoed.

7.8.2.1.3. users_file

String.

The path to your user definition file; see crate_nlp_webserver_manage_users.

7.8.2.1.4. processors_path

String.

The path to your processor definition file; see Processors file format.

7.8.2.1.5. broker_url

String.

The URL to your Celery broker server, e.g. via AMQP, for back-end processing.

7.8.2.1.6. backend_url

String. Default: None.

The URL to your Celery backend database, used to store queuing information. For the format, see Celery database URL examples.

  • You can ignore this, as it is not necessary to configure a backend for Celery, since results are stored elsewhere. See Internals.
  • If you do want to enable a backend: you can use the same database as above, if you wish, or you can create a separate database for Celery.

7.8.2.1.7. encryption_key

String.

A secret key used for password encryption in the users file. You can make one with crate_nlp_webserver_generate_encryption_key.

7.8.2.1.8. redis_host

String. Default: localhost.

Host for Redis database,

7.8.2.1.9. redis_port

Integer. Default: 6379.

Port for Redis.

7.8.2.1.10. redis_password

String. Default: None.

Password for Redis.

7.8.2.1.11. redis_db_number

Integer. Default: 0.

Database number for Redis.

7.8.2.2. Web server section

The [server:main] section defines the web server configuration for the app named main.

  • The use setting determines which web server should be used.
  • Other parameters are passed to the web server in use

Examples include:

  • Waitress:

    [server:main]
    use = egg:waitress#main
    # ... alternative: use = egg:crate_anon#waitress
    listen = localhost:6543
    

    For arguments, see usage and Arguments to waitress.serve.

  • CherryPy:

    [server:main]
    use = egg:crate_anon#cherrypy
    server.socket_host = 127.0.0.1
    server.socket_port = 8080
    

    For arguments, see CherryPy: Configure.

  • Gunicorn (Linux only):

    [server:main]
    use = egg:gunicorn#main
    bind = localhost:6543
    workers = 4
    # certfile = /etc/ssl/certs/ca-certificates.crt
    # ssl_version = 5
    

    For arguments, see Gunicorn: Settings.

7.8.3. Processors file format

This is a Python file whose job is to define the PROCESSORS variable. This is a list of dictionaries in the format shown below. Each dictionary defines a processor’s:

  • name;
  • descriptive title;
  • version string;
  • whether this is the default version (used when the client doesn’t ask for a particular version);
  • processor type (e.g. GATE, CRATE);
  • schema (database table) information, if known.

As you will see below, CRATE does all this work for you, for its own processors, via crate_anon.nlp_manager.all_processors.all_crate_python_processors_nlprp_processor_info().

Specimen processors file:

#!/usr/bin/env python

"""
Autogenerated NLP processor definition file, to be imported by the CRATE
NLPRP web server. The PROCESSORS variable is the one of interest.
"""

# =============================================================================
# Imports
# =============================================================================

from crate_anon.nlp_manager.all_processors import (
    all_crate_python_processors_nlprp_processor_info
)
from crate_anon.nlprp.constants import NlprpValues, NlprpKeys as NKeys
from crate_anon.nlp_webserver.constants import (
    KEY_PROCTYPE,
    PROCTYPE_GATE,
)


# =============================================================================
# Processor definitions
# =============================================================================

# GATE processors correct as of 19/04/2019 for KCL server.
# Python processors are automatic, as below.

PROCESSORS = all_crate_python_processors_nlprp_processor_info() + [
    # -------------------------------------------------------------------------
    # GATE processors
    # -------------------------------------------------------------------------
    {
        NKeys.NAME: "medication",
        NKeys.TITLE: "GATE processor: Medication tagger",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "Finds mentions of drug prescriptions, "
                           "including the dose, route and frequency.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN
    },
    {
        NKeys.NAME: "diagnosis",
        NKeys.TITLE: "GATE processor: Diagnosis finder",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "Finds mentions of diagnoses, in words or "
                           "in coded form.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN
    },
    {
        NKeys.NAME: "blood-pressure",
        NKeys.TITLE: "GATE processor: Blood Pressure",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "Finds mentions of blood pressure measurements.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN
    },
    {
        NKeys.NAME: "cbt",
        NKeys.TITLE: "GATE processor: Cognitive Behavioural Therapy",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "Identifies mentions of cases where the patient "
                           "has attended CBT sessions.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN
    },
    {
        NKeys.NAME: "lives-alone",
        NKeys.TITLE: "GATE processor: Lives Alone",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "Identifies if the patient lives alone.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN
    },
    {
        NKeys.NAME: "mmse",
        NKeys.TITLE: "GATE processor: Mini-Mental State Exam Result Extractor",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "The Mini-Mental State Exam (MMSE) Results "
                           "Extractor finds the results of this common "
                           "dementia screening test within documents along "
                           "with the date on which the test was administered.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN
    },
    {
        NKeys.NAME: "bmi",
        NKeys.TITLE: "GATE processor: Body Mass Index",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "Finds mentions of BMI scores.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN
    },
    {
        NKeys.NAME: "smoking",
        NKeys.TITLE: "GATE processor: Smoking Status Annotator",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "Identifies instances of smoking being discussed "
                           "and determines the status and subject (patient or "
                           "someone else).",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN
    },
    {
        NKeys.NAME: "ADR",
        NKeys.TITLE: "GATE processor: ADR",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "Adverse drug event mentions in clinical notes.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN
    },
    {
        NKeys.NAME: "suicide",
        NKeys.TITLE: "GATE processor: Symptom finder - Suicide",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "App derived from TextHunter project suicide.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN
    },
    {
        NKeys.NAME: "appetite",
        NKeys.TITLE: "GATE processor: Symptom finder - Appetite",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "Finds markers of good or poor appetite.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN
    },
    {
        NKeys.NAME: "low_mood",
        NKeys.TITLE: "GATE processor: Symptom finder - Low_Mood",
        NKeys.VERSION: "0.1",
        NKeys.IS_DEFAULT_VERSION: True,
        NKeys.DESCRIPTION: "App derived from TextHunter project low_mood.",
        KEY_PROCTYPE: PROCTYPE_GATE,
        NKeys.SCHEMA_TYPE: NlprpValues.UNKNOWN
    },
]


# =============================================================================
# Convenience method: if you run the file, it prints its results.
# =============================================================================

if __name__ == "__main__":
    import json  # delayed import
    print(json.dumps(PROCESSORS, indent=4, sort_keys=True))


# Generated at 2019-10-10 10:23:36

7.8.4. crate_nlp_webserver_initialize_db

usage: crate_nlp_webserver_initialize_db [-h] config_uri

Tool to initialize the database used by CRATE's implementation of an NLPRP
server.

positional arguments:
  config_uri  Config file to read (e.g. 'development.ini'); URL of database is
              found here.

optional arguments:
  -h, --help  show this help message and exit

# Generated at 2019-10-10 10:23:33

7.8.5. crate_nlp_webserver_manage_users

usage: crate_nlp_webserver_manage_users [-h]
                                        [--adduser USERNAME PASSWORD | --rmuser USERNAME | --changepw USERNAME PASSWORD]

Manage users for the CRATE nlp_web server.

optional arguments:
  -h, --help            show this help message and exit
  --adduser USERNAME PASSWORD
                        Add a user and associated password.
  --rmuser USERNAME     Remove a user by specifying their username.
  --changepw USERNAME PASSWORD
                        Change a user's password.

# Generated at 2019-09-28 18:29:51

7.8.6. crate_nlp_webserver_generate_encryption_key

Generates a random encryption key and prints it to the screen.

7.8.7. crate_nlp_webserver_pserve

This is the standard Pyramid pserve command. At its most basic, it takes a single parameter, being the name of your NLP web server config file, and it starts the web server.

usage: crate_nlp_webserver_pserve [-h] [-n NAME] [-s SERVER_TYPE]
                                  [--server-name SECTION_NAME] [--reload]
                                  [--reload-interval RELOAD_INTERVAL] [-b]
                                  [-v] [-q]
                                  [config_uri] [config_vars [config_vars ...]]

This command serves a web application that uses a PasteDeploy
configuration file for the server and application.

You can also include variable assignments like 'http_port=8080'
and then use %(http_port)s in your config files.

positional arguments:
  config_uri            The URI to the configuration file.
  config_vars           Variables required by the config file. For example,
                        `http_port=%(http_port)s` would expect
                        `http_port=8080` to be passed here.

optional arguments:
  -h, --help            show this help message and exit
  -n NAME, --app-name NAME
                        Load the named application (default main)
  -s SERVER_TYPE, --server SERVER_TYPE
                        Use the named server.
  --server-name SECTION_NAME
                        Use the named server as defined in the configuration
                        file (default: main)
  --reload              Use auto-restart file monitor
  --reload-interval RELOAD_INTERVAL
                        Seconds between checking files (low number can cause
                        significant CPU usage)
  -b, --browser         Open a web browser to the server url. The server url
                        is determined from the 'open_url' setting in the
                        'pserve' section of the configuration file.
  -v, --verbose         Set verbose level (default 1)
  -q, --quiet           Suppress verbose output

# Generated at 2019-10-10 10:23:37

7.8.8. crate_nlp_webserver_launch_gunicorn

This is the preferred alternative to crate_nlp_webserver_pserve for launching the CRATE NLP web server via Gunicorn (it stops Gunicorn complaining but otherwise does the same thing).

usage: crate_nlp_webserver_launch_gunicorn [-h] [--crate_config CRATE_CONFIG]

Launch CRATE NLP web server via Gunicorn. (Any leftover arguments will be
passed to Gunicorn.)

optional arguments:
  -h, --help            show this help message and exit
  --crate_config CRATE_CONFIG
                        CRATE NLP web server config file (default is read from
                        environment variable CRATE_NLP_WEB_CONFIG)

# Generated at 2019-10-10 10:23:38

7.8.9. crate_nlp_webserver_launch_celery

This launches the Celery back-end job controller for the CRATE NLP web server. It needs to be running for your NLP web server to do any proper work!

usage: crate_nlp_webserver_launch_celery [-h] [--command COMMAND] [--debug]

Launch CRATE NLP web server Celery processes. (Any leftover arguments will be
passed to Celery.)

optional arguments:
  -h, --help         show this help message and exit
  --command COMMAND  Celery command (default: worker)
  --debug            Ask Celery to be verbose (default: False)

# Generated at 2019-10-10 10:23:40

7.8.10. crate_nlp_webserver_launch_flower

This command has no options. It launches the Celery Flower tool, which is for monitoring Celery, and associates it with the CRATE NLP web server. It starts a local web server (by default on port 5555; see TCP/IP ports); if you browse to http://localhost:5555/ or http://127.0.0.1:5555/, you can monitor what’s happening.

7.8.11. Internal operations: where is your data stored?

  • CRATE’s NLP web server uses Redis to store web sessions (for user/session authentication). No content is stored here.
  • It uses Celery for back-end jobs.
    • Celery is configured with a broker and a backend.
    • The broker is a messaging system, such as RabbitMQ via AMQP.
    • The backend is typically a database of jobs. Job results are stored here, but CRATE does not use this database for storing job results; it uses a separate database (used for storing, transiently, the potentially confidential incoming client information and outgoing NLP results).
    • If you want, the Celery backend database can be the same as your main CRATE NLP server database (Celery uses tables named celery_taskmeta and celery_tasksetmeta; these do not conflict with CRATE’s NLP servertable names).
  • All client data and all NLP results are stored in a single database.

Footnotes

[1]CRATE then defines paste.app_factory in its setup.py, which allows PasteDeploy to find the actual WSGI app factory.