.. crate_anon/docs/source/nlp/nlp_webserver.rst
.. Copyright (C) 2015, University of Cambridge, Department of Psychiatry.
Created by Rudolf Cardinal (rnc1001@cam.ac.uk).
.
This file is part of CRATE.
.
CRATE is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
.
CRATE is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
.
You should have received a copy of the GNU General Public License
along with CRATE. If not, see .
.. _AMQP: http://www.amqp.org
.. _Celery: http://www.celeryproject.org
.. _CherryPy: https://cherrypy.org
.. _Flower: http://flower.readthedocs.io/
.. _Gunicorn: https://gunicorn.org
.. _MySQL: https://www.mysql.com/
.. _Paste: https://pythonpaste.readthedocs.io/
.. _PasteDeploy: https://pastedeploy.readthedocs.io
.. _pserve: https://docs.pylonsproject.org/projects/pyramid/en/latest/pscripts/pserve.html
.. _RabbitMQ: https://www.rabbitmq.com
.. _Redis: https://redis.io
.. _Waitress: https://docs.pylonsproject.org/projects/waitress/
.. _WSGI: https://en.wikipedia.org/wiki/Web_Server_Gateway_Interface
.. _nlp_webserver:
NLPRP web server
----------------
This is CRATE's implementation of a full :ref:`NLPRP ` web server. To
use it:
#. Make sure you have the necessary other software installed and running,
including Redis_ and (if you wish to use it as your Celery broker)
RabbitMQ_, plus a database such as MySQL_.
#. Create a blank database, for storing documents and processing requests
transiently.
#. Create a blank text file to contain details of your users (with their
encrypted passwords).
#. Create a processor definition file with crate_nlp_webserver_print_demo_.
Edit it.
#. Create a config file with crate_nlp_webserver_print_demo_. Edit it,
including pointing it to the database(s), the users file, and the
processors file, and setting an encryption key (e.g. with
crate_nlp_webserver_generate_encryption_key_). For more details, see below.
#. Initialize your empty database with crate_nlp_webserver_initialize_db_,
pointing it at your config file.
#. Add a test user with crate_nlp_webserver_manage_users_.
#. Launch the web server, e.g. via crate_nlp_webserver_pserve_ or
crate_nlp_webserver_launch_gunicorn_.
#. Launch the Celery workers with crate_nlp_webserver_launch_celery_.
To test it, set up your NLP client for a :ref:`cloud processor
`, point it at your server, and try some NLP.
Suppose your :ref:`NLP definition ` is called
``cloud_nlp_demo``:
.. code-block:: bash
# Show what the server's offering:
crate_nlp --nlpdef cloud_nlp_demo --verbose --print_cloud_processors
# Run without queuing:
crate_nlp --nlpdef cloud_nlp_demo --verbose --full --cloud --immediate
# Run with queuing:
crate_nlp --nlpdef cloud_nlp_demo --verbose --full --cloud
crate_nlp --nlpdef cloud_nlp_demo --verbose --full --showqueue
# crate_nlp --nlpdef cloud_nlp_demo --verbose --full --cancelrequest
# crate_nlp --nlpdef cloud_nlp_demo --verbose --full --cancelall
crate_nlp --nlpdef cloud_nlp_demo --verbose --full --retrieve
.. _crate_nlp_webserver_print_demo:
crate_nlp_webserver_print_demo
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Prints a demo NLP web server config.
.. literalinclude:: _crate_nlp_webserver_print_demo_help.txt
:language: none
Config file format
~~~~~~~~~~~~~~~~~~
The NLP web server's config file is a PasteDeploy_ file. This system is used to
define WSGI_ applications and servers.
Here's a specimen config file:
.. literalinclude:: _nlp_webserver_demo_config.ini
:language: ini
Application section
^^^^^^^^^^^^^^^^^^^
The ``[app:main]`` section defines an *application* named *main*, which is
the default name. Options within this section are provided as keyword arguments
to the WSGI factory; see
:func:`crate_anon.nlp_webserver.wsgi_app.make_wsgi_app` (and its ``settings``
argument) to see how this works.
These options include:
#. ``use``, which is a PasteDeploy_ setting to say where the code for the
WSGI application lives. For CRATE's NLP server, this should be
``egg:crate_anon`` or ``egg:crate_anon#main`` [#pastedeployuse]_.
#. Pyramid settings, such as ``pyramid.reload_templates``;
#. CRATE NLP web server settings, as follows.
nlp_webserver.secret
####################
*String.*
A secret key for cookies (see `Pyramid AuthTktAuthenticationPolicy
`_;
make one using crate_nlp_webserver_generate_encryption_key_).
sqlalchemy.url
##############
*String.*
The SQLAlchemy URL to your database; see `database URLs
`_.
**Other SQLAlchemy parameters also work;** all begin ``sqlalchemy.` For
example, ``sqlalchemy.echo = True`` enables a debugging feature where all SQL
is echoed.
users_file
##########
*String.*
The path to your user definition file; see crate_nlp_webserver_manage_users_.
processors_path
###############
*String.*
The path to your processor definition file; see :ref:`Processors file format
`.
broker_url
##########
*String.*
The URL to your Celery_ broker server, e.g. via AMQP_, for back-end processing.
backend_url
###########
*String.* Default: None.
The URL to your Celery_ backend database, used to store queuing information.
For the format, see `Celery database URL examples
`_.
- You can ignore this, as it is not necessary to configure a backend for
Celery, since results are stored elsewhere. See :ref:`Internals
`.
- If you do want to enable a backend: you can use the same database as above,
if you wish, or you can create a separate database for Celery.
encryption_key
##############
*String.*
A secret key used for password encryption in the users file. You can make one
with crate_nlp_webserver_generate_encryption_key_.
redis_host
##########
*String.* Default: ``localhost``.
Host for Redis_ database,
redis_port
##########
*Integer.* Default: 6379.
Port for Redis_.
redis_password
##############
*String.* Default: None.
Password for Redis_.
redis_db_number
###############
*Integer.* Default: 0.
Database number for Redis_.
Web server section
^^^^^^^^^^^^^^^^^^
The ``[server:main]`` section defines the web server configuration for the app
named ``main``.
- The ``use`` setting determines which web server should be used.
- Other parameters are passed to the web server in use
Examples include:
- Waitress_:
.. code-block:: ini
[server:main]
use = egg:waitress#main
# ... alternative: use = egg:crate_anon#waitress
listen = localhost:6543
For arguments, see `usage
`_
and `Arguments to waitress.serve
`_.
- CherryPy_:
.. code-block:: ini
[server:main]
use = egg:crate_anon#cherrypy
server.socket_host = 127.0.0.1
server.socket_port = 8080
For arguments, see `CherryPy: Configure
`_.
- Gunicorn_ (Linux only):
.. code-block:: ini
[server:main]
use = egg:gunicorn#main
bind = localhost:6543
workers = 4
# certfile = /etc/ssl/certs/ca-certificates.crt
# ssl_version = 5
For arguments, see `Gunicorn: Settings
`_.
.. _nlp_webserver_processors:
Processors file format
~~~~~~~~~~~~~~~~~~~~~~
This is a Python file whose job is to define the ``PROCESSORS`` variable.
This is a list of dictionaries in the format shown below. Each dictionary
defines a processor's:
- name;
- descriptive title;
- version string;
- whether this is the default version (used when the client doesn't ask for
a particular version);
- processor type (e.g. GATE, CRATE);
- schema (database table) information, if known.
As you will see below, CRATE does all this work for you, for its own
processors, via
:func:`crate_anon.nlp_manager.all_processors.all_crate_python_processors_nlprp_processor_info`.
Specimen processors file:
.. literalinclude:: _nlp_webserver_demo_processors.py
:language: python
.. _crate_nlp_webserver_initialize_db:
crate_nlp_webserver_initialize_db
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. literalinclude:: _crate_nlp_webserver_initialize_db_help.txt
:language: none
.. _crate_nlp_webserver_manage_users:
crate_nlp_webserver_manage_users
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. literalinclude:: _crate_nlp_webserver_manage_users_help.txt
:language: none
.. _crate_nlp_webserver_generate_encryption_key:
crate_nlp_webserver_generate_encryption_key
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Generates a random encryption key and prints it to the screen.
.. _crate_nlp_webserver_pserve:
crate_nlp_webserver_pserve
~~~~~~~~~~~~~~~~~~~~~~~~~~
This is the standard Pyramid pserve_ command. At its most basic, it takes a
single parameter, being the name of your NLP web server config file, and it
starts the web server.
Note that its help (provided by Pyramid's ``pserve`` itself) talks about a file
URI, which might mislead you into thinking you need something like
``file:///home/person/blah.ini``, but actually it wants a filename, like
``/home/person/blah.ini``.
.. literalinclude:: _crate_nlp_webserver_pserve_help.txt
:language: none
.. _crate_nlp_webserver_launch_gunicorn:
crate_nlp_webserver_launch_gunicorn
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is the preferred alternative to crate_nlp_webserver_pserve_ for launching
the CRATE NLP web server via Gunicorn_ (it stops Gunicorn complaining but
otherwise does the same thing).
.. literalinclude:: _crate_nlp_webserver_launch_gunicorn_help.txt
:language: none
.. _crate_nlp_webserver_launch_celery:
crate_nlp_webserver_launch_celery
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This launches the Celery_ back-end job controller for the CRATE NLP web server.
It needs to be running for your NLP web server to do any proper work!
.. literalinclude:: _crate_nlp_webserver_launch_celery_help.txt
:language: none
.. _crate_nlp_webserver_launch_flower:
crate_nlp_webserver_launch_flower
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This command has no options. It launches the Celery Flower_ tool, which is for
monitoring Celery, and associates it with the CRATE NLP web server. It starts a
local web server (by default on port 5555; see :ref:`TCP/IP ports
`); if you browse to http://localhost:5555/ or
http://127.0.0.1:5555/, you can monitor what's happening.
.. _nlp_webserver_internals:
Internal operations: where is your data stored?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- CRATE's NLP web server uses Redis to store web sessions (for user/session
authentication). No content is stored here.
- It uses Celery for back-end jobs.
- Celery is configured with a *broker* and a *backend*.
- The `broker
`_
is a messaging system, such as RabbitMQ_ via AMQP_.
- The `backend
`_
is typically a database of jobs. Job results are stored here, but CRATE
does not use this database for storing job results; it uses a separate
database (used for storing, transiently, the potentially confidential
incoming client information and outgoing NLP results).
- If you want, the Celery backend database can be the same as your main CRATE
NLP server database (Celery uses tables named ``celery_taskmeta`` and
``celery_tasksetmeta``; these do not conflict with CRATE's NLP servertable
names).
- All client data and all NLP results are stored in a single database.
===============================================================================
.. rubric:: Footnotes
.. [#pastedeployuse]
CRATE then defines ``paste.app_factory`` in its ``setup.py``, which allows
PasteDeploy_ to find the actual WSGI app factory.