2.5. Databases and database drivers

2.5.1. Database engines

Supported engines include:

  • MySQL: free, open source, simple administration including via MySQL Workbench.

  • PostgreSQL: free, open source, emphasis on standards compliance, advanced column types.

  • SQL Server: Microsoft; not free; common in UK NHS contexts.

2.5.3. More detail

Internally, CRATE is written primarily in Python 3. It uses SQLAlchemy for the anonymisation, and it uses Django for its web interface.

In general, software can talk to database via files (e.g. SQLite), via TCP/IP (e.g. MySQL, PostgreSQL, SQL Server), via custom interfaces (e.g. Windows-based authentication for SQL Server), and via standardized intermediate interfaces (e.g. ODBC, JDBC) in which the client software communicates with the intermediate interface which talks (somehow) to the final database.

Python libraries may be “pure Python”, in which case they install well via standard tools like pip, or use a mixture of Python and other code (e.g. C, Java), in which case appropriate tools (e.g. a C compiler or a Java virtual machine) must also be available on the destination machine. Since the latter are often specific to an operating system and/or CPU architecture, they are sometimes more complex to install.

None of Django, SQLAlchemy, or CRATE “bake in” support for specific databases [1]. This is because it’s easy to bake in old versions, making it hard to upgrade. Keep things modular.

2.5.4. A catalogue of Python database drivers

2.5.4.1. MySQL + mysqlclient

mysqlclient is an open-source fork of MySQLdb (MySQL-python) that adds Python 3 support and fixes bugs. (Similarly, it is a Python interface to the MySQL C API.) It is a drop-in replacement for MySQLdb, all Python references remain to MySQLdb.

Driver

mysqlclient

Home page

Database

MySQL

Installation

pip install mysqlclient

Import

import MySQLdb  # same as MySQLdb

Django ENGINE

django.db.backends.mysql

SQLAlchemy URL

mysql+mysqldb://user:password@host:port/database?charset=utf8

Licence

GPL

Note

This is Django’s recommended method for using MySQL [2].

Note

It’s quick, because it’s C-based [5].

Note

This module needs to compile itself from C/C++ source. Under Linux, this should just work. Under Windows, you may need to install a compiler, which must (to some degree) match the Python version you’re using. If you don’t have a compiler, you will get errors relating to “query_vcvarsall”. For Python 3.4, try Microsoft Visual Studio 2010 (Visual C++ 10.0) or Visual C++ Express 2010. Under 64-bit Windows, or if you have a later version of Visual Studio, see also https://stackoverflow.com/questions/28251314; you’ll need to set the VS100COMNTOOLS environment variable.

Note

Under Linux, you might see the error NameError: name '_mysql' is not defined, despite installing the relevant MySQL libraries in the operating system (e.g. via sudo apt install mysql-client). Higher up in the error trace, you may see this error: ImportError: libmysqlclient.so.21: cannot open shared object file: No such file or directory. (It may be buried in lots of other error messages; to see it by itself, just run Python and try import MySQLdb.) The MySQL libraries live somewhere within /usr/lib. This error can occur if mysqlclient was installed under a different operating system version to the one you’re using. You can fix it by reinstalling the Python library with pip uninstall mysqlclient && pip install --no-binary mysqlclient mysqlclient.

Under Windows, there can be additional compilation problems; see https://github.com/PyMySQL/mysqlclient-python/issues/54.

2.5.4.2. MySQL + MySQL Connector/Python

MySQL Connector/Python is a pure Python driver for MySQL from MySQL themselves (well, Oracle; Oracle own MySQL).

Driver

MySQL Connector/Python

Home page

Database

MySQL

Installation

Use pip with a custom URL. See below.

Import

import mysql.connector

Django ENGINE

mysql.connector.django

SQLAlchemy URL

mysql+mysqlconnector://...

Licence

GPL [4]

Installation: use pip with a custom URL. See https://stackoverflow.com/questions/34489271. They have a presence at https://pypi.python.org/pypi/mysql-connector-python, but you can’t do pip install mysql-connector-python; the subpage for a given version (e.g. 2.0.4) advertises a URL that you can use with pip install <URL>, and you can substitute https for http if required.

Re the Django ENGINE setting: see https://dev.mysql.com/doc/connector-python/en/connector-python-django-backend.html.

It’s slower than C-based interfaces, obviously [6].

2.5.4.3. MySQL + PyMySQL

PyMySQL is a pure-Python MySQL client library. It’s slower than mysqlclient as a result.

Driver

PyMySQL

Home page

Database

MySQL

Installation

pip install pymysql

Import

import pymysql

Django ENGINE

django.db.backends.mysql plus extras; see below

SQLAlchemy URL

mysql+pymysql://user:password@host:port/database?charset=utf8

Licence

MIT License

PyMySQL can masquerade as MySQLdb upon explicit request. The Django ENGINE setting remains django.db.backends.mysql, but a short extra import statement is required in manage.py. See:

CRATE implements this fix, though actually if you want to run Celery as well, you need the fix via the Celery entry point, so it’s easier to put one fix in settings.py.

2.5.4.4. SQL Server + pymssql

A Python interface to SQL Server via FreeTDS.

Driver

pymssql

Home page

http://www.pymssql.org/

Database

SQL Server

Installation

pip install pymssql

Import

import pymssql

Django ENGINE

SQLAlchemy URL

mssql+pymssql://username:password@freetdsname/database?charset=utf8

Licence

LGPL

Intended for Linux use. The pymssql library uses FreeTDS code to communicate with SQL Server. Under Ubuntu, there are prerequisites: sudo apt-get install freetds-dev first.

2.5.4.5. mssql-django

A replacement for django-mssql-backend (q.v.)

Driver

mssql-django

Home page

Database

Any with an ODBC connection

Installation

pip install mssql-django and also needs PyODBC

Import

Django ENGINE

mssql

SQLAlchemy URL

Licence

BSD License

2.5.4.6. PostgreSQL + psycopg2

Python interface to PostgreSQL.

Driver

psycopg2

Home page

http://initd.org/psycopg/docs/

Database

PostgreSQL

Installation

pip install psycopg2

Import

import psycopg2

Django ENGINE

django.db.backends.postgresql_psycopg2

SQLAlchemy URL

postgresql://user:password@host/database

Licence

LGPL

Note

This module needs to compile itself from C/C++ source. Under Linux, this should just work. Under Windows, you may need to install a compiler, which must (to some degree) match the Python version you’re using. If you don’t have a compiler, you will get errors relating to “query_vcvarsall”. For Python 3.4, try Microsoft Visual Studio 2010 (Visual C++ 10.0) or Visual C++ Express 2010. Under 64-bit Windows, or if you have a later version of Visual Studio, see also https://stackoverflow.com/questions/28251314; you’ll need to set the VS100COMNTOOLS environment variable.

2.5.4.7. Any database + PyODBC

A Python interface to any database via ODBC.

Driver

PyODBC

Home page

Database

Any with an ODBC connection

Installation

pip install pyodbc

Import

import pyodbc

Django ENGINE

SQLAlchemy URL

mssql+pyodbc://username:password@MY_DATABASE
mssql+pyodbc://@MY_DATABASE [for e.g. Windows authentication]

In the SQLAlchemy URL examples, MY_DATABASE is an example ODBC data source name (DSN).

When talking to SQL Server databases via SQLAlchemy, you can also specify the host/port directly in the connection string (rather than having to set up a DSN); see Hostname connections.

SQLAlchemy deprecates MySQL via this route: http://docs.sqlalchemy.org/en/rel_1_0/dialects/mysql.html.

  • Install an ODBC driver.

  • On Windows systems with SQL Server installed, you get driver choices like “SQL Server”, “SQL Server Native Client 10.0”, “SQL Server Native Client 10.0”. I tested with “SQL Server Native Client 11.0” talking to SQL Server 10.50.6000 [= SQL Server 2008 R2 SP3]. See https://support.microsoft.com/en-us/kb/321185

  • In creating the ODBC data source, you choose whether you want username/password authentication or Integrated Windows authentication, and you give the data source a name (DSN), e.g. MY_DATABASE. The SQLAlchemy URL is then mssql+pyodbc://@MY_DATABASE (for Windows authentication), or mssql+pyodbc://username:password@MY_DATABASE for username/password authentication.

  • Addendum 2017-01-16: Microsoft have deprecated the SQL Server Native Client; use the Microsoft ODBC Driver for SQL Server instead. (See https://msdn.microsoft.com/en-us/library/ms130828.aspx; https://blogs.msdn.microsoft.com/sqlnativeclient/2013/01/23/introducing-the-new-microsoft-odbc-drivers-for-sql-server/) This typically appears as “ODBC Driver 11 for SQL Server”.

2.5.4.8. Others to ignore

2.5.4.8.1. django-mssql-backend

[https://pypi.org/project/django-mssql-backend/]. Superseded by mssql-django.

2.5.4.8.2. django-pyodbc-azure

[https://pypi.org/project/django-pyodbc-azure/]. Superseded by django-mssql-backend and most recently mssql-django

2.5.4.8.3. django-pymssql

[https://pypi.org/project/django-pymssql/]. Not maintained since 2016.

2.5.4.8.4. django-mssql

[https://pypi.org/project/django-mssql/]. Not maintained since 2016.

2.5.4.8.5. MySQL-python / MySQLdb

[https://pypi.org/project/MySQL-python/]. Doesn’t support Python 3. Not maintained since 2014.

2.5.4.8.6. django-pyodbc

[https://pypi.python.org/pypi/django-pyodbc/]. Doesn’t support Python 3.

2.5.4.8.7. django-sqlserver

[https://github.com/denisenkom/django-sqlserver/]. Failed in 2015 with Django 1.9rc1,

and is documented as buggy [older version was https://bitbucket.org/denisenkom/django-pytds]. It was formerly known as django-pytds and uses the python-tds interface [https://pypi.python.org/pypi/python-tds].

2.5.4.8.8. django-jython

[https://pypi.org/project/django-jython/]. (via zxJDBC then JDBC, then jTDS or a

native driver): this requires running Django under Jython, adding complexity. Not maintained since 2012.

2.5.4.8.9. mxODBC

[http://www.egenix.com/products/python/mxODBC/]. Ignored; commercial

2.5.4.8.10. adodbapi

[https://pypi.org/project/adodbapi/]. Not implemented in SQLAlchemy. Not maintained since 2019. [http://docs.sqlalchemy.org/en/latest/dialects/mssql.html].

2.5.5. Other database connection components

2.5.5.1. ODBC

ODBC is a generic API for talking to databases; see https://en.wikipedia.org/wiki/Open_Database_Connectivity.

It’s normally easy for Python to talk to ODBC (via PyODBC). However, the ODBC system also requires a database-specific driver, and sometimes this is fiddly.

For Windows, there shouldn’t be many problems. ODBC is built into Windows, and database-specific drivers for e.g. SQL Server are installed with SQL Server itself, or built in, or readily available.

ODBC is provided for Linux via third-party products, notably unixODBC. However, the ODBC system also requires a database-specific driver, and sometimes this is fiddly under Linux (e.g. for SQL Server).

There is some support in Python for JDBC. Here we have better drivers for SQL Server. A full (“Type 4”) JDBC driver talks directly to the database, and does so on any platform, as it runs in a Java virtual machine (JVM). There are two of note: (1) Microsoft provide a Type 4 JDBC driver for SQL Server [https://www.microsoft.com/en-gb/download/details.aspx?id=11774]; (2) jTDS is an open-source Type 4 JDBC driver for SQL Server (based on FreeTDS) [http://jtds.sourceforge.net/]. That sounds great, but the use of JDBC requires Python to talk to Java. This brings some complexities. JDBC drivers for Python can run Jython (Python in a JVM), but that’s getting complex (e.g. Django under Jython → django-jython → zxJDBC → jTDS → SQL Server).

2.5.5.2. FreeTDS

FreeTDS is a set of Unix/Linux C libraries to communicate natively with SQL Server, via an open-source implementation of the TDS protocol. See http://www.freetds.org/. (There is also a Java equivalent, jTDS, not discussed further.)

Configuring? Under Linux, the system-wide file is /etc/freetds/freetds.conf; this contains server/database settings [http://www.freetds.org/userguide/freetdsconf.htm].

Testing.

  • Use the tsql command to test it. See man tsql.

  • For example, to connect to a server called wombatvmxp, try: tsql -L -H wombatvmxp to list instances, and tsql -H wombatvmxp -p 1433 -U user -P password to connect.

  • Once you have configured /etc/freetds/freetds.conf, you can use the config section as the server name: tsql -S my_sqlserver_connection -U user -P password, and you’ll need to get this version working before you can use SQLAlchemy with FreeTDS.

  • A command likely to work at the TSQL command prompt is SELECT * FROM information_schema.tables.

  • Remember to type GO (the default batch separator) to execute a command (https://stackoverflow.com/questions/2299249). To specify a particular database on the server: add -D database to the tsql command (it should say “Default database being set to …”).

  • Note that though http://www.freetds.org/userguide/freetdsconf.htm specifies a database parameter, man freetds.conf doesn’t (for tsql -C showing a version of 0.91). Mind you, v0.91 is from 2011 (http://www.freetds.org/news.html). So for v0.91, freetds.conf is not the place to specify the database, it seems. But specifying the database in the SQLAlchemy URL works.

WARNING. FreeTDS prior to version 0.95 does not support MARS. Without this, CRATE will fail (stopping during/after the first patient). Furthermore, FreeTDS only supports MARS via ODBC [http://www.freetds.org/mars.html], i.e. presumably not this way. Use pyodbc → FreeTDS instead.

To get this working under Linux:

TL;DR: install FreeTDS 1.0 or higher, compiled for MARS; use TDS protocol 7.2 or higher; ensure MARS_Connection = Yes.

Ubuntu 16.04 uses FreeTDS 0.91. However, this has one minor problem and one major one. The minor one is that freetds.conf doesn’t support the “database” parameter (but tsql -D does, and so does the SQL Alchemy URL). The major one is that it doesn’t support MARS, so you can’t use it with CRATE and SQL Server (CRATE will silently stop during processing of the first patient).

To install FreeTDS 1.0, you have to get fairly low-level:

# http://www.freetds.org/userguide/config.htm
# http://www.linuxforums.org/forum/applications/199305-wmvolman-2-0-1-install-problems.html
cd /tmp  # or somewhere
git clone https://github.com/FreeTDS/freetds.git
cd freetds
libtoolize --force
aclocal
autoheader
automake --force-missing --add-missing
autoconf
./configure --enable-mars=yes
make
sudo make install

Note in particular --enable-mars=yes.

Use tsql -C to check the compile-time options (you should see “MARS: yes”).

Be aware that FreeTDS only supports MARS via ODBC. So don’t expect an SQLAlchemy URL of

mssql+pymssql://username:password@freetds_name/database_name?charset=utf8

to work. Instead, use

mssql+pyodbc://username:password@odbc_name/?charset=utf8

If this fails, edit /etc/odbc.ini to add a logfile, e.g.

[crate_sqlserver_odbc]
description = "CRATE test SQL Server database on Wombat VMXP"
driver = FreeTDS
; this is looked up in /etc/odbcinst.ini
TDS_Version = 7.4
; see http://www.freetds.org/userguide/choosingtdsprotocol.htm
server = 192.168.1.13
port = 1433
Database = crate_test_src
MARS_Connection = Yes
DumpFile = /tmp/freedump.log

to check the actual FreeTDS software version and TDS protocol version being used; the lines are near the top and look like

log.c:196:Starting log file for FreeTDS 0.91
                                        ^^^ bad; need higher FreeTDS version
net.c:207:Connecting to 192.168.1.13 port 1433 (TDS version 7.1)
                                                            ^^^ bad; need TDS 7.2

Note that a TDS_Version of “8.0” in /etc/odbc.ini will be converted to “7.1”; if you specify a version that exceeds your SQL Server (e.g. specify “7.4” when you’re running SQL Server 2005) it will fall back to 4.2; and you need 7.2 or higher for MARS.

In this case, the problem was the TDS_Version = 7.4 (should have been 7.2, for a SQL Server 2005 backend), and the driver = FreeTDS referring to an installation of FreeTDS 0.91. Here’s a working version:

# /etc/odbcinst.ini

    [FreeTDS_v1_1]
Description = FreeTDS 1.1 (SQL Server protocol driver for Unix)
Driver = /usr/local/lib/libtdsodbc.so
# /etc/odbc.ini

[crate_sqlserver_odbc]
description = "CRATE test SQL Server database on Wombat VMXP"
driver = FreeTDS_v1_1
TDS_Version = 7.2
server = 192.168.1.13
port = 1433
Database = crate_test_src
MARS_Connection = Yes
DumpFile = /tmp/freedump.log
; remove this line after testing!

This worked.


Footnotes