.. crate_anon/docs/source/installation/installation.rst .. Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk). . This file is part of CRATE. . CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. . CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. . You should have received a copy of the GNU General Public License along with CRATE. If not, see . Installing CRATE without Docker ------------------------------- .. contents:: :local: URLs for CRATE source code ~~~~~~~~~~~~~~~~~~~~~~~~~~ - https://github.com/ucam-department-of-psychiatry/crate (for source) - https://pypi.io/project/crate-anon/ (for ``pip install crate-anon``) Installing CRATE itself is straightforward, but you probably want a lot of supporting tools. Here's a logical sequence. Python ~~~~~~ Install Python 3.10 or higher. If it's not already installed: **Linux** .. code-block:: bash sudo apt-get install python3.10-dev **Windows** - https://www.python.org/ → Downloads Python virtual environment and CRATE itself ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Create a Python virtual environment (an isolated set of Python programs that won’t interfere with any other Python things) and install CRATE. Choose your own directory names. **Linux** .. code-block:: bash python3.10 -m venv ~/venvs/crate source ~/venvs/crate/bin/activate python -m pip install --upgrade pip pip install crate-anon **Windows** .. code-block:: bat C:\Python39\python.exe -m ensurepip C:\Python39\python.exe -m venv C:\venvs\crate C:\venvs\crate\Scripts\activate C:\Python39\python.exe -m pip install --upgrade pip pip install crate-anon .. _activate_venv: Activating your virtual environment ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Every time you want to work within your virtual environment, you should activate it, by running (Windows) or sourcing (Linux) the ``activate`` script within it, as above.** Once activated, - the PATHs are set up for the programs in the virtual environment; - when you run Python, you will run the copy in the virtual environment; - the Python package installation tool, ``pip``, will be the one in the virtual environment and will modify the virtual environment (not the whole system). See: - https://docs.python.org/3/tutorial/venv.html - https://realpython.com/python-virtual-environments-a-primer/ RabbitMQ ~~~~~~~~ Install RabbitMQ, required by the CRATE web site. **Linux** .. code-block:: bash sudo apt-get install rabbitmq # Check it's working: sudo rabbitmqctl status **Windows** - Download/install Erlang from http://www.erlang.org/downloads. The 32-bit Windows download (Erlang/OTP 18.3) does not work on Windows XP, so everything that follows has been tested on Windows 10, 64-bit. - Download/install RabbitMQ from https://www.rabbitmq.com/ → Download. (If you use the default installer, it will find Erlang automatically.) - Check it’s working: :menuselection:`Start --> RabbitMQ Server --> RabbitMQ Command Prompt (sbin dir)`. Then type ``rabbitmqctl status``. It’s helpful to do this, because you need to tell Windows to allow the various bits of RabbitMQ/Erlang to communicate over internal networks, and (under Windows 10) this triggers the appropriate prompts. - For additional RabbitMQ help see https://cmatskas.com/getting-started-with-rabbitmq-on-windows/. Java ~~~~ Install a Java development kit, to compile support for GATE natural language processing (NLP). **Linux** - Usually built in. **Windows** - Download/run the Java Development Kit installer from Oracle. GATE ~~~~ Install GATE, for NLP. - Download and install GATE from https://gate.ac.uk/download/ .. _third_party_text_extractors: Third-party text extractors ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Ensure any necessary third-party text extractor tools are installed and on the PATH. Good extractors are built into CRATE for: - Office Open XML (DOCX, DOCM), for Microsoft Word 2007 onwards; - HTM(L), XML; - Open Document text format (ODT), for OpenOffice/LibreOffice; - plain text (LOG, TXT). For some, there is a fallback converter built in, but third-party tools are faster: - PDF: speed improves by installing ``pdftotext`` [#pdftotext]_ - Rich Text Format (RTF): speed improves by installing ``unrtf`` [#unrtf]_ For some, you will need an external tool: - For Microsoft Word 97–2003 binary (DOC) files, you will need ``antiword`` [#antiword]_ - As a fallback tool (“extract text from anything”), CRATE will use ``strings`` or ``strings2`` [#strings]_, whichever it finds first. If you install any manually, check they run, as follows. To check that your text extractors are available and visible to CRATE via the ``PATH``, you can use the :ref:`crate_anon_check_text_extractor ` tool. C/C++ compiler ~~~~~~~~~~~~~~ .. note:: This is optional. If you want to install C-based Python libraries, you’ll need a C/C++ compiler. **Linux** Built in. **Windows** Install Visual C++ 14.x [#vs2015]_ (or later?), the official compiler for Python 3.10-3.11 under Windows [#pythonvstudio]_. Visual Studio Community is free [#vscommunity]_. Database and database drivers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You'll want drivers for at least one database. See :ref:`Recommended database drivers `. In the CPFT NHS environment, we use SQL Server and these: .. code-block:: none pip install pyodbc mssql-django Build the CRATE Java NLP interfaces ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code-block:: bash crate_nlp_build_gate_java_interface --help crate_nlp_build_gate_java_interface --javac JAVA_COMPILER_FILENAME --gatedir GATE_DIRECTORY For example, on Windows: .. code-block:: bat crate_nlp_build_gate_java_interface ^ --javac "C:\Program Files\Java\jdk1.8.0_91\bin\javac.exe" ^ --gatedir "C:\Program Files\GATE_Developer_8.1" Once built, you can run the script again with an additional ``--launch`` parameter to launch the GATE framework in an interactive demonstration mode (using GATE’s supplied “people and places” app). Configure CRATE for your system ------------------------------- The anonymiser and NLP manager are run on an ad-hoc or regularly scheduled basis, and do not need to be kept running continuously. For the anonymiser, you will need a .INI-style configuration file (see :ref:`the anonymiser config file ` that the `CRATE_ANON_CONFIG` environment variable points to when the anonymiser is run (and a .TSV format data dictionary that the configuration file points to -- see :ref:`data dictionary `). For the NLP manager, you will need another .INI-style configuration file (see :ref:`NLP config file `) that the `CRATE_NLP_CONFIG` environment variable points to when the NLP manager is run. For the web service, which you will want to run continuously, you will need a Python (Django) configuration file (see :ref:`web config file `) that the `CRATE_WEB_LOCAL_SETTINGS` environment variable points to when the web server processes are run. Use ``crate_print_demo_crateweb_config`` to make a new one, and edit it for your own settings. Set up the web site infrastructure ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Create the database yourself using your normal database management tool. Make sure that the config file pointed to by the `CRATE_WEB_LOCAL_SETTINGS` environment variable is set up to point to the database. From the activated Python virtual environment, you want to build the admin database, collect static files, populate relevant parts of the database, and create a superuser: .. code-block:: bash crate_django_manage migrate crate_django_manage collectstatic crate_django_manage populate crate_django_manage createsuperuser Test the web server and message queue ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In two separate command windows, with the virtual environment activated in each, run the following two programs: .. code-block:: bash crate_launch_cherrypy_server .. code-block:: bash crate_launch_celery --debug Browse to the web site. Choose ‘Test message queue by sending an e-mail to the RDBM’. If an e-mail arrives, that’s good. If you can’t see the web site, there’s a configuration problem. If you can see the web site but no e-mail arrives, check: - that e-mail server and the RDBM e-mail destination are correctly configured in the Django config file (as per the `CRATE_WEB_LOCAL_SETTINGS` environment variable); - check the Django log; - check the Celery log; - from the RabbitMQ administrative command prompt, run ``rabbitmqctl list_queues name messages consumers``; this shows each queue’s name along with the number of messages in the queue and the number of consumers. If the number of messages is stuck at >0, they’re not being consumed properly. - run ``crate_launch_flower`` and browse to http://localhost:5555/ to explore the messaging system. Configure the CRATE web service to run automatically ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ CRATE's web service has two parts: the web site itself runs Django, and the offline message handling part (e.g. to send emails) runs Celery. **Linux** Try to avoid managing this by hand! That’s what the `.deb` file is there for. **Windows: service method** Using a privileged command prompt [e.g. on Windows 10: :menuselection:`Winkey+X --> Command Prompt (Admin)`], activate the virtual environment and install the service: .. code-block:: bat C:\venvs\crate\Scripts\activate crate_windows_service install Set the following system (not user!) environment variables (if you can’t find the Environment Variables part of Control Panel, use the command ``sysdm.cpl``): - `CRATE_ANON_CONFIG` – to your main database’s CRATE anonymisation config file - `CRATE_CHERRYPY_ARGS` – e.g. to ``--port 8999 --root_path /`` (for relevant options, see ``crate_django_manage runcpserver --help``) - `CRATE_WEB_LOCAL_SETTINGS` – to your Django site-specific Python configuration file. - `CRATE_WINSERVICE_LOGDIR` – to a writable directory. In older versions of Windows you had to reboot or the service manager wouldn’t see it, but Windows 10 seems to cope happily. You can start the CRATE service manually, or configure it to start automatically on boot, with the Automatic or Automatic (Delayed Start) option [#servicedelayedstart]_, or (with the virtual environment activated) with ``crate_windows_service start``. Any messages will appear in the Windows ‘Application’ event log. **Windows: task scheduler method** In principle you could also run the scripts via the Windows Task Scheduler, rather than as a service [#taskscheduler]_, e.g. with tasks like .. code-block:: bat cmd /c C:\venvs\crate\Scripts\crate_launch_cherrypy_server >>C:\crate_logs\djangolog.txt 2>&1 .. code-block:: bat cmd /c C:\venvs\crate\Scripts\crate_launch_celery >>C:\crate_logs\celerylog.txt 2>&1 … but I’ve not bothered to test this, as the Service method works fine. Retest the web server and message queue ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Going to a “behind-the-scenes” (service) mode of operation has the potential to go wrong, so retest that the web server and the e-mail transmission task work. =============================================================================== .. rubric:: Footnotes .. [#servicedelayedstart] https://stackoverflow.com/questions/11015189/automatic-vs-automatic-delayed-start .. [#taskscheduler] See https://www.calazan.com/windows-tip-run-applications-in-the-background-using-task-scheduler/ .. [#pdftotext] ``pdftotext``: Ubuntu: ``sudo apt-get install poppler-utils``. Windows: see http://blog.alivate.com.au/poppler-windows/, then install it and add it to the PATH. .. [#unrtf] ``unrtf``: Ubuntu: ``sudo apt-get install unrtf``. Windows: see http://gnuwin32.sourceforge.net/packages/unrtf.htm, then install it and add it to the PATH. .. [#antiword] ``antiword``: Ubuntu: ``sudo apt-get install antiword``. Windows: see http://www.winfield.demon.nl/, then install it and add it to the PATH. .. [#strings] ``strings`` and ``strings2``: ``strings`` is part of Linux by default; for Windows, see https://technet.microsoft.com/en-us/sysinternals/strings.aspx or http://split-code.com/strings2.html (then install it and add it to the PATH. .. [#vs2010] Visual Studio 2010; VC++ 10.0; MSC_VER=1600 .. [#vs2015] Visual Studio 2015; VC++ 14.0; MSC_VER=1900 .. [#pythonvstudio] See https://wiki.python.org/moin/WindowsCompilers .. [#vstudiogeneral] To map Visual C++/Studio versions to compiler numbers, see https://stackoverflow.com/questions/2676763. For more detail see https://stackoverflow.com/questions/2817869. .. [#vscommunity] https://visualstudio.microsoft.com/vs/community/