11. FAQs and troubleshooting

11.1. Known bugs elsewhere affecting CRATE 

wkhtmltopdf font size bug
- See notes next to PATIENT_FONTSIZE in config/settings.py
- https://github.com/wkhtmltopdf/wkhtmltopdf/issues/2505
If you try to use django-debug-toolbar when proxying via a Unix domain socket, you need to use a custom INTERNAL_IPS setting; see the specimen config file.
SQL Server returns a rowcount of -1; this is normal. See https://code.google.com/p/pyodbc/wiki/Cursor.

11.2. General 

11.2.1. ImportError: No module named ‘{mysqldb, pyodbc, …}’

You haven’t installed the right drivers for the database URL(s) that you have specified in your CRATE configuration file. Make sure the virtualenv is activated (see Activating your virtual environment), and then install it (see Database drivers).

11.2.2. Which Python modules should be installed?

Use pip freeze to show what you have installed. The ones you have to install manually are marked with # MANUAL; CRATE installs the others.

django-pyodbc-azure==1.10.4.0  # MANUAL: for Django to talk to SQL Server
pyodbc==4.0.3  # MANUAL: for ODBC database connections

11.2.3. Pretty colours all gone (anonymiser, NLP, etc.)

That’s what happens when you pipe the tool through tee.

11.3. CRATE anonymiser 

11.3.1. Anonymisation is slow 

Make sure you have indexes created on all patient_id fields, because the tool will use this to find (a) values for scrubbing, and (b) records for anonymisation. Indexing here makes a huge difference!

11.3.2. CRATE uses lots of memory 

A normal run should see CRATE using roughly 60–80 Mb per process. Values much in excess of this likely relate to the text extraction process, which uses third-party software over which CRATE has no control (I’ve seen >1 Gb) [1].

11.3.3. “File is not a zip file”

In full: “Caught exception from document_to_text: File is not a zip file” when extracting text from DOCX documents

This error usually appears with encrypted, password-protected DOCX files. The anonymiser will not be able to read these, and this error can be ignored.

11.3.4. “UnRtf: … has stopped working”

If a third-party program is used by CRATE for text extraction and fails, you may get lots of messages from Windows like “UnRtf: convert document in RTF format to other formats has stopped working. Windows can check online for a solution to the problem…”

To disable this in Windows Server 2008, run the Server Manager, and in the main page scroll down to a section titled Resources and Support. There should be a “Windows Error Reporting” section. It’s probably set to “Ask me about sending reports every time an error occurs”; change this to “I don’t want to participate, and don’t ask me again”.

That gets rid of the options to tell Microsoft, but it still pops up some “close or debug?” dialog boxes. To fix that, add the following registry entries [2]:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting : “ForceQueue”=dword:00000000
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting : “DontShowUI”=dword:00000001
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting : “DontSendAdditionalData”=dword:00000001
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\Consent : “DefaultConsent”=dword:00000002

11.3.5. “UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xff in position 0: invalid start byte”

You can get this error when you try to use a config file in Windows, certainly by using Notepad to create/edit a CRATE config file. Notepad can default to “UTF-16 LE” encoding. Re-save the config file with UTF-8 encoding.

11.4. CRATE NLP 

11.4.1. How can I update a source file in KConnect/Bio-YODIE 

Example: replacing plugins/Tagger_ConText/src/gate/context/ContextFeaturesTagger.java.

Make sure Apache Ant is installed [3].
Replace the source file.
Delete the existing .jar file.
Run ant build.
If it fails, check the build.properties file, which contains local variables such as directories (e.g. gate.home); edit this and try again.

11.5. CRATE web site 

11.5.1. crate_launch_cherrypy_server can’t find its config files 

Use forward slashes (/) rather than backslashes (\) for filename paths in the environment variable CRATE_CHERRYPY_ARGS (e.g. C:/somepath/somefile.ext) or escape the backslashes by doubling them (e.g. C:\\somepath\\somefile.ext).

11.5.2. Port 443 not free on ‘127.0.0.1’

Problem: Under Windows Server: OSError(“Port 443 not free on ‘127.0.0.1’”)

If you launch the CRATE web server on port 443 (for HTTPS) and get this error, something else is probably using port 443. If that’s unexpected to you, it’s because Microsoft uses it by default [13]. Try:

netstat -ban

to see what’s using it. In our case we had this:

 Proto  Local Address      Foreign Address    State
...
[svchost.exe]
 TCP    0.0.0.0:443        0.0.0.0            LISTENING
...
[svchost.exe]
 TCP    [::]:443           [::]:0             LISTENING

That is, svchost.exe is using port 443. One question now is: which actual program is using this port via svchost.exe (which is a service host program that does all sorts of things) [14]? A possibility relates to VMWare [15].

If you have sufficient control over your machine to wrest port 443 away from whatever’s using it, fine. Otherwise, you may need to use an alternative port. A common choice might be 8443 [16].

11.5.3. OSError: No socket could be created… permission denied 

The error OSError: No socket could be created -- (('127.0.0.1', 443): [Errno 13] Permission denied) may be seen on Linux and is similar to port 443 not free.

Under Linux, programs without root authority cannot, by default, open port numbers below 1024. The best thing to do under Linux is to run a privileged “main” web server like Apache (which becomes responsible for encryption and so forth), and configure CRATE to operate behind Apache. See Configuring for Apache. Alternative approaches are outlined at https://superuser.com/questions/710253/allow-non-root-process-to-bind-to-port-80-and-443.

11.5.4. “Your connection is not private…” browser error 

You will always get this error until you get a proper HTTPS certificate. This error occurs when you self-sign a certificate. Browers will offer you a way round, usually in small print [e.g. in Chrome: Advanced ‣ Proceed… (unsafe)].

11.5.5. 403 Forbidden: CSRF verification failed 

There are at least two possible reasons:

Your browser must enable cookies, at least for ‘same-origin’ requests; this is a critical part of CRATE’s security to prevent cross-site request forgeries. If you’re using Firefox, try Chrome [7].
By default, the CRATE web site uses only HTTPS (secure HTTP). This is governed by the CRATE_HTTPS parameter in CRATE’s own crateweb/config/settings.py. When CRATE_HTTPS is True, then CSRF cookies are only permitted over HTTPS, so if you use plain HTTP, you will see this error. A quick hack is to set CRATE_HTTPS = False in your local settings, but this is a bad idea; set up HTTPS properly instead, as above.

11.5.6. “Unknown, invalid, or unsupported option… in a getsockopt or setsockopt call”

Problem: the front end might produce the error: “An unknown, invalid, or unsupported option or level was specified in a getsockopt or setsockopt call” while initiating a back-end task.

This is due to a bug in the Python AMQP package version 2.1.4 [4]. (AMQP is a protocol used by Celery to talk to a message broker such as RabbitMQ; CRATE uses Celery to manage its back-end asynchronous tasks, like sending e-mails.)

Solution: downgrade AMQP. From the activated CRATE virtual environment:

pip uninstall amqp
pip install -Iv amqp==2.1.3

This change has been hardcoded into CRATE’s setup scripts to prevent Celery from picking the buggy version of the Python amqp package. As a consequence, other requirements are also downgraded (celery to 4.0.1; kombu to 4.0.1).

11.5.7. Static files not served via CherryPy 

Problem: Static files are not served (e.g. broken icons rather than plus/minus symbols; broken-looking admin site) when using the CherryPy server for CRATE.

Solution: You should specify either (1) FORCE_SCRIPT_NAME = "/crate" in your config file, or (2) --root_path /crate in your CRATE_CHERRYPY_ARGS environment variable. (The default value for the latter is taken from the former.)

The cause of the problem is as follows: if you don’t do this, then https://mysite/ is meant to be your site, while https://mysite/crate_static/ is meant to be your static root. However, the latter comes under the former, so Django says “it’s for me” then “it doesn’t exist”. If you use https://mysite/crate/ as your site root, with https://mysite/crate_static/ as your static root, then the software is happy.

11.5.8. I can’t restart the CRATE Windows Service cleanly 

There may be a problem whereby the CRATE web service doesn’t entirely shut down when its service is stopped. You can manually kill leftover processes (which will appear as python.exe or python.exe *32) using taskmgr.

This should be fixed now.

11.5.9. CRATE service doesn’t start… errors in Windows Event Log 

If your CRATE service doesn’t start and you see this error in the Event Log:

Unable to read Server Queue performance data from the Server service.
The first four bytes (DWORD) of the Data section contains the status
code, the second four bytes contains the IOSB.Status and the next four
bytes contains the IOSB.Information.

Log Name: Application
Source: PerfNet
Event ID: 2006

then consider first if this might be a bug relating to HP ProLiant servers. See https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00041653en_us&docLocale=en_US and a fix at https://support.microsoft.com/en-in/help/4057142/windows-10-update-kb4057142. However, that is for Windows Server 2016.

The CPFT server from Dec 2016 is an HP ProLiant DL360 Gen9 server with 2 × 12-core/24-thread Intel Xeon E5-2687WV4 3 GHz CPUs (48 effective CPUs), with 96 Gb RAM later upgraded to 672 Gb RAM (in 2018), and 11.5 Tb SSD storage in a RAID configuration (8.9 Tb available); it was about £15k inc. VAT initially plus £8k for the extra RAM. It runs Windows Server 2008 R2.

11.5.10. “No connection could be made because the target machine actively refused it”

Problem: From the front end, you see: “No connection could be made because the target machine actively refused it”.

Check that RabbitMQ is running. Check also CRATE’s Celery log. If RabbitMQ is not running, you’ll see something like this in CRATE’s Celery log, giving a little more detail:

[2017-02-25 23:50:11,433: Error/MainProcess] consumer: Cannot connect to
amqp://guest:**@127.0.0.1:5672//: [WinError 10061] No connection could be
made because the target machine actively refused it.

This indicates that Celery (called by CRATE) is looking for RabbitMQ on port 5672, finding it, but being refused access. Make sure RabbitMQ is installed and its service started. Run rabbitmqctl, which on Windows machines is typically typically at

C:\Program Files\RabbitMQ Server\rabbitmq_server-3.6.6\sbin\rabbitmqctl.bat

Specifically, run:

rabbitmqctl status

If you see this unhappy output:

Status of node 'rabbit@cpft-crate-p01' ...
Error: unable to connect to node 'rabbit@cpft-crate-p01': nodedown

DIAGNOSTICS
===========

attempted to contact: ['rabbit@cpft-crate-p01']

rabbit@cpft-crate-p01:
  * connected to epmd (port 4369) on cpft-crate-p01
  * epmd reports: node 'rabbit' not running at all
                  other nodes on cpft-crate-p01: ['RabbitMQ']
  * suggestion: start the node

current node details:
- node name: 'rabbitmq-cli-11@cpft-crate-p01'
- home dir: P:\
- cookie hash: <...some hash...>

… then one possibility is that RabbitMQ was improperly installed. This can happen if installed by a non-administrative user [18], or if your Windows variables HOMEDRIVE and HOMESHARE are pointing to a network drive [19]. From an administrative command prompt, this was one solution:

REM Remove the old installation:
net stop rabbitmq
“C:\Program Files\RabbitMQ Server\uninstall.exe”

REM Set environment variables for the new installation:
SET HOMEDRIVE=C:\
SET HOMESHARE=C:\Users
SET ERLANG_HOME=C:\Program Files\erl8.2

REM Now reinstall:
C:\some_download_dir\rabbitmq-server-3.6.6.exe
REM ... and watch the detailed output closely to make sure there are no errors

C:\Program Files\RabbitMQ Server\rabbitmq_server_3.6.6\sbin\rabbitmqctl.bat status

Here’s some happy output:

Status of node 'rabbit@cpft-crate-p01' ...
[{pid,55372},
 {running_applications,[{rabbit,"RabbitMQ","3.6.6"},
                        {rabbit_common,[],"3.6.6"},
                        {mnesia,"MNESIA  CXC 138 12","4.14.2"},
                        {ranch,"Socket acceptor pool for TCP protocols.",
                               "1.2.1"},
                        {xmerl,"XML parser","1.3.12"},
                        {os_mon,"CPO  CXC 138 46","2.4.1"},
                        {sasl,"SASL  CXC 138 11","3.0.2"},
                        {stdlib,"ERTS  CXC 138 10","3.2"},
                        {kernel,"ERTS  CXC 138 10","5.1.1"}]},
 {os,{win32,nt}},
 {erlang_version,"Erlang/OTP 19 [erts-8.2] [64-bit] [smp:24:24] [async-threads:64]\n"},
 {memory,[{total,63923600},
          {connection_readers,0},
          {connection_writers,0},
          {connection_channels,0},
          {connection_other,0},
          {queue_procs,2736},
          {queue_slave_procs,0},
          {plugins,0},
          {other_proc,23674272},
          {mnesia,61784},
          {mgmt_db,0},
          {msg_index,42592},
          {other_ets,1003792},
          {binary,22848},
          {code,17795673},
          {atom,752561},
          {other_system,20567342}]},
 {alarms,[]},

{listeners,[{clustering,25672,"::"},{amqp,5672,"::"},{amqp,5672,"0.0.0.0"}]},
 {vm_memory_high_watermark,0.4},
 {vm_memory_limit,41174066790},
 {disk_free_limit,50000000},
 {disk_free,8951801614336},
 {file_descriptors,[{total_limit,8092},
                    {total_used,2},
                    {sockets_limit,7280},
                    {sockets_used,0}]},
 {processes,[{limit,1048576},{used,179}]},
 {run_queue,0},
 {uptime,28},
 {kernel,{net_ticktime,60}}]

If you see something like that, all should be well.

11.6. MySQL 

11.6.1. Can’t connect to MySQL, even manually 

See https://dev.mysql.com/doc/refman/5.5/en/problems-connecting.html.

11.6.2. How do I reconfigure MySQL?

Edit the MySQL configuration file.

Under Ubuntu Linux this is usually /etc/mysql/my.cnf.
Under Windows it can be in several locations [11]. If you’re not sure where yours is, find mysqld (typically C:\Program Files\MySQL\MySQL Server 5.7\bin\mysqld.exe) and run mysqld --verbose --help > helpoutput.txt and inspect the resulting file helpoutput.txt (which is quite long). There’ll be a pair of lines like:
```
Default options are read from the following files in the given order:
C:\Windows\my.ini C:\Windows\my.cnf C:\my.ini C:\my.cnf C:\Program
Files\MySQL\MySQL Server 5.7\my.ini C:\Program Files\MySQL\MySQL Server
5.7\my.cnf
```
Note that on our main test system (Windows Server 2008 R2, MySQL 5.7) the only file existing was C:\ProgramData\MySQL\MySQL Server 5.7\my.ini, but this was busy being ignored when mysqld was run manually (see below for why – it can be specified directly as an argument to the Windows service, so a manual call to mysqld doesn’t see the same file, which is confusing).
For running MySQL systems, you can also view the configuration filename via MySQL Workbench (under Server Status), but what you see here can be wrong. For example, we’ve had it showing C:\Program Files\MySQL\MySQL Server 5.0\my.ini, when there was no such directory (we were using MySQL 5.7), and the actual file was C:\my.ini.

Restart MySQL.

Under Ubuntu Linux, use sudo service mysql restart.
Under Windows, restart the MySQL service in the Services tool (it’s typically called e.g. MySQL57).

If MySQL fails to restart, run the mysqld program manually so you can see why.

If it sits there appearing to do nothing, it’s probably happy; check the log files, on Windows usually called HOSTNAME.err in the database data directory (where HOSTNAME is the name of your computer).
You can run mysqld --console --standalone --log-error-verbosity 3 to get it to write to the console. These options also allow you to abort it with CTRL-C.
Under Ubuntu, precede that with sudo -u mysql bash to get a shell running as the mysql user.
You might also try mysqld --print-defaults to see its options.
Under Ubuntu, try also journalctl -xe | grep -i mysql | less (sometimes apparmor will block access to MySQL files, if you’ve moved them from their default location, which can be very confusing; in this case, you’ll need to edit /etc/apparmor.d/usr.sbin.mysqld or /etc/apparmor.d/local/usr.sbin.mysqld).

If the Windows service is stuck in the ‘starting’ state, for example after you’ve reconfigured MySQL:

To kill a dead/stuck service: (1) Check the service short name by double-clicking it in Services. Let’s support it’s MySQL57. (2) sc queryex MySQL57 to see its process ID or PID. (3) taskkill /f /pid PIDNUM (where PIDNUM is the process ID from the previous step).
Inspect the Properties of the malfunctioning service carefully. These include a “path to executable” option, which can look like this: "C:\Program Files\MySQL\MySQL Server 5.7\bin\mysqld.exe" --defaults-file="C:\ProgramData\MySQL\MySQL Server 5.7\my.ini" MySQL57. This gives you the service name and also the hidden configuration path!
To reinstall the service: mysqld --install [12]. The default service name is ‘MySQL’, but you can override this. You’re probably best being explicit, like this: mysqld --install MySQL57 --defaults-file="C:\my.ini"
After creating a service, start it manually; if it fails, check Event Viewer ‣ Windows Logs ‣ Application.
If MySQL fails to start and you see errors like The innodb_system data file ‘ibdata1’ must be writable, the first thing to check is that another copy of mysqld is not already running.
To delete a defunct service: sc delete servicename. Exercise extreme caution with this!

Your target is a happy MySQL installation that restarts automatically when you reboot.

11.6.3. Table names are always lower case using MySQL under Windows 

Under Windows, MySQL converts table names to lower case by default (but is happy with mixed-case column names, and is happy with table and column names being mixed case under Linux). This is because of the default setting for lower_case_table_names in my.ini or my.cnf. In turn, this is because Windows can use a case-insensitive file system (and since tables can be stored by the name, this would result in an almighty mess) [5]. However, when using NTFS, Windows filenames are case-sensitive [6]. Therefore, under Windows with NTFS, you have more options for lower_case_table_names. Note, however, that it also affects the case-sensitivity of table names using SQL (but not of column names). So you’re probably better off always using lower_case_table_names = 1, as per the MySQL advice. This is the default behaviour under Windows.

11.6.4. “Got a packet bigger than ‘max_allowed_packet’ bytes” or “MySQL has gone away”

Problem: sqlalchem.exc.OperationalError: (mysql.connector.errors.OperationalError)… Got a packet bigger than ‘max_allowed_packet’ bytes or MySQL has gone away when sending large packets.

Using large “chunks” is efficient but you have to configure MySQL to allow it. The max_allowed_packet setting in the MySQL configuration file (see above) governs this. Try changing the default, e.g. from

max_allowed_packet=4M  # too small!

to

max_allowed_packet=40M

and restart MySQL (as above). You can also view current settings using MySQL Workbench (Management ‣ Status and System Variables ‣ System Variables; search for max_allowed_packet).

If you can’t get this working, reduce the --chunksize parameter to the CRATE anonymiser.

11.6.5. How do I hot-swap two MySQL databases?

Since anonymisation is slow, you may want a live research database and another that you can update offline. When you’re ready to swap, you’ll want to

create DEFUNCT
rename LIVE -> DEFUNCT
rename OFFLINE -> LIVE

then either revert:

rename LIVE -> OFFLINE
rename DEFUNCT -> LIVE

or commit:

drop DEFUNCT

How?

11.6.6. “MySQL server has gone away”

One possibility is that you are processing a big binary field, and MySQL’s max_allowed_packet parameter is too small. Try increasing it (e.g. from 16M to 32M). See also https://camcops.readthedocs.io/en/latest/administrator/server_troubleshooting.html?highlight=max_allowed_packet#mysql-server-has-gone-away

11.6.7. How to convert a database from SQL Server to MySQL?

This facility is provided by MySQL Workbench, which will connect to an SQL Server instance. Use the “ODBC via connection string” option if other methods aren’t working: DSN=XXX;UID=YYY;PWD=ZZZ.

If the schema definitions are not seen, it’s a permissions issue (https://stackoverflow.com/questions/17038716), in which case you can also copy copy the database using CRATE’s anonymiser, treating all tables as non-patient tables (i.e. doing no actual anonymisation).

11.6.8. What settings do I need in /etc/mysql/my.cnf?

Probably these:

[mysqld]
max_allowed_packet = 32M

innodb_strict_mode = 1
innodb_file_per_table = 1
innodb_file_format = Barracuda

# Only for MySQL prior to 5.7.5 (http://dev.mysql.com/doc/relnotes/mysql/5.6/en/news-5-6-20.html):
innodb_log_file_size = 320M

# For more performance, less safety:
innodb_flush_log_at_trx_commit = 2

# To save memory?
# Default is 8; suggestion is ncores * 2
# innodb_thread_concurrency = ...

[mysqldump]
max_allowed_packet = 32M

11.6.9. “_mysql_exceptions.OperationalError: (1118, ‘Row size too large (> 8126)”

In full, the error is:

_mysql_exceptions.OperationalError: (1118, 'Row size too large (> 8126).
Changing some columns to TEXT or BLOB or using ROW_FORMAT=DYNAMIC or
ROW_FORMAT=COMPRESSED may help. In current row format, BLOB prefix of 768
bytes is stored inline.')

See above. If you need to change the log file size, FOLLOW THIS PROCEDURE: https://dev.mysql.com/doc/refman/5.0/en/innodb-data-log-reconfiguration.html

11.6.10. “Segmentation fault (core dumped)…”

This error can be seen when using the Microsoft ODBC driver for Linux, which is buggy. In this situation, use the Microsoft JDBC driver instead.

11.6.11. “Killed.”

You may be out of memory, on a small computer. Try reducing MySQL’s memory footprint. (Steps have already been taken to reduce memory usage by the anonymiser itself.)

11.6.12. Can’t create FULLTEXT index(es)

MySQL v5.6 is required to use FULLTEXT indexes with InnoDB tables (as opposed to MyISAM tables, which don’t support transactions).

On Ubuntu 14.04, the default MySQL version is 5.5, so use:

sudo apt-get install mysql-server-5.6 mysql-server-core-5.6 \
    mysql-client-5.6 mysql-client-core-5.6

11.6.13. How to search with FULLTEXT indexes?

In conventional SQL, you would use:

... WHERE field LIKE '%word%'

In a field having a MySQL FULLTEXT index, you can use:

... WHERE MATCH(field) AGAINST ('word')

There are several variants. See https://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html

11.7. SQL Server 

11.7.1. […] “Connection is busy with results for another command” […]

Or: configuring Windows ODBC for MARS.

If you see this with Microsoft SQL Server via ODBC/pyodbc, you need to enable Multiple Active Result Sets (MARS), because for some reason Microsoft think it’s unusual to want more than one cursor open (more than one simultaneous query) to a single database at once. There are several ways:

Windows

(DOESN’T WORK.) Append ;MultipleActiveResultSets=True to the connection URL, e.g. mssql+pyodbc://@MYDSN;MultipleActiveResultSets=True. However, although this is documented [8], it didn’t work via pyodbc [9]!
(WORKS.)
- Fire up an Administrator command prompt (e.g. press the Start button, start typing “Command Prompt”, and when it appears, right-click it and choose “Run as Administrator”).
- Run the command: odbcconf /a {CONFIGSYSDSN "SQL Server Native Client 11.0" "DSN=MY_DSN|MARS_Connection=Yes"} (replacing the driver and DSN names with your own).
- You can re-run the ODBC configuration wizard, and it should now say Multiple Active Result Sets(MARS): YES where it said … NO before.
This does work. Use CONFIGDSN instead of CONFIGSYSDSN if you are using a user DSN. Your changes should be visible if you restart the ODBC control panel (e.g. with odbccp32.cpl) and go through the configuration wizard again; the MARS option (which you can’t edit) should have changed from “No” to “Yes”.
There’s also a registry hack [10].

Linux

Under Linux, in /etc/odbc.ini, for that DSN, set MARS_Connection = yes. See
- https://msdn.microsoft.com/en-us/library/cfa084cz(v=vs.110).aspx
- https://msdn.microsoft.com/en-us/library/h32h3abf(v=vs.110).aspx
- Rationale: We use gen_patient_ids() to iterate through patients, but then we fetch data for that patient via the same connection to the source database(s). Therefore, we’re operating multiple result sets through one connection.

11.7.2. “The data types nvarchar(max) and ntext are incompatible…”

Problem: Using SQL Server, you get an error from the CRATE web front end saying “The data types nvarchar(max) and ntext are incompatible in the equal to operator.”

Solution: Upgrade pyodbc.

(This error occurs with pyodbc 3.1.1 but not with pyodbc 4.0.3, for example.) The error relates to pyodbc passing text parameters to SQL Server as NTEXT rather than NVARCHAR(MAX).

11.7.3. “A default full-text catalog does not exist in database…”

Problem: Using SQL Server: “A default full-text catalog does not exist in database ‘XXX’ or user does not have permission to perform this action.”

Assuming the user does have permission, this means you need to run this SQL beforehand:

USE mydatabase;
CREATE FULLTEXT CATALOG default_fulltext_catalog AS DEFAULT;

See https://technet.microsoft.com/en-us/library/dd283095(v=sql.100).aspx.

11.7.4. New tables are named like mydb.[SERVERNAMEUSERNAME].mytable 

Problem: Under SQL Server, new tables (e.g. from NLP) look like mydatabase.[SERVERNAMEUSERNAME].tablename rather than mydatabase.dbo.tablename.

Under Microsoft SQL Server, the standard full notation for a table is database.schema.table. The default schema is often dbo, so tables look like mydatabase.dbo.mytable. However [17]:

The default schema for a user can be defined by using the DEFAULT_SCHEMA option of CREATE USER or ALTER USER. If no default schema is defined for a user account, SQL Server will assume dbo is the default schema. It is important [to] note that if the user is authenticated by SQL Server as a member of a group in the Windows operating system, no default schema will be associated with the user. If the user creates an object, a new schema will be created and named the same as the user, and the object will be associated with that user schema.

So, for example, if your username is RCardinal and you authenticate to SQL Server via Windows authentication, and then create a table, it is likely to be called something like [mydatabase].[myserver\RCardinal].[mytable]. You can try this:

USE mydatabase;
SELECT name, type_desc, default_schema_name FROM sys.database_principals;

USE mydatabase;
ALTER USER [myserver\RCardinal] WITH DEFAULT_SCHEMA = dbo;

11.7.5. “Restricted data type attribute violation”

We’ve seen this whilst performing an initial scan (data dictionary autogeneration) of a source database using the SQL Server Native Client 11.0 ODBC driver with SQL Server 2012 (version series 11 per https://sqlserverbuilds.blogspot.com/; unsure of specific version), via pyodbc==4.0.x, default ODBC settings (except MARS enabled, as above), and a read-only connection. The SQLAlchemy URL for this setup is mssql+pyodbc://@odbc_data_source_name. The error went away by using the ODBC Driver 17 for SQL Server driver instead.

11.8. Windows 

11.8.1. Control Panel looks blank in Windows 2008 Server 

This isn’t a CRATE problem. However, it’s something we encountered as a problem when using CRATE. If your Control Panel looks blank, (1) run gpedit.msc (Local Group Policy Editor), then User Configuration ‣ Administrative Templates ‣ Control Panel. Check the settings there.

When that doesn’t work, I’m a bit stuck; try running .cpl items from the command line instead.

11.9. MedEx-UIMA 

11.9.1. MedEx-UIMA gives Java errors 

Yes. Frankly, it’s just fairly badly written, from a programmer’s perspective. I’ve fixed a few of its bugs in a nasty patch-based way; see build_medex_itself.py. MedEx takes the approach of allowing bugs to throw exceptions, catching its own exceptions, printing the stack trace, and carrying on regardless; this can be off-putting, but I’ve not bothered to fix all its bugs. CRATE will abort if any exceptions make it out of MedEx, but you can just ignore ones that you see. Here are a couple I fixed:

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 2
    at java.lang.String.charAt(Unknown Source)
    at org.apache.NLPTools.Document.<init>(Document.java:134)
    at org.apache.medex.MedTagger.run_batch_medtag(MedTagger.java:256)
    at CrateMedexPipeline.processInput(CrateMedexPipeline.java:302)
    at CrateMedexPipeline.<init>(CrateMedexPipeline.java:128)
    at CrateMedexPipeline.main(CrateMedexPipeline.java:320)

java.lang.StringIndexOutOfBoundsException: String index out of range: 1
    at java.lang.String.charAt(Unknown Source)
    at org.apache.algorithms.SuffixArray.construct_tree_word(SuffixArray.java:375)
    at org.apache.algorithms.SuffixArray.re_build(SuffixArray.java:97)
    at org.apache.algorithms.SuffixArray.<init>(SuffixArray.java:60)
    at org.apache.medex.MedTagger.medtagging(MedTagger.java:359)
    at org.apache.medex.MedTagger.run_batch_medtag(MedTagger.java:264)
    at CrateMedexPipeline.processInput(CrateMedexPipeline.java:302)
    at CrateMedexPipeline.<init>(CrateMedexPipeline.java:128)
    at CrateMedexPipeline.main(CrateMedexPipeline.java:320)

(the first being an example of using & when they meant && in a logic test). Here are a couple of MedEx bugs I haven’t fixed, so you might see them:

java.lang.ArrayIndexOutOfBoundsException: -1
    at java.util.Vector.elementData(Unknown Source)
    at java.util.Vector.get(Unknown Source)
    at org.apache.NLPTools.SentenceBoundary.detect_boundaries(SentenceBoundary.java:329)
    at org.apache.medex.MedTagger.medtagging(MedTagger.java:354)
    at org.apache.medex.MedTagger.run_batch_medtag(MedTagger.java:264)
    at CrateMedexPipeline.processInput(CrateMedexPipeline.java:312)
    at CrateMedexPipeline.runPipeline(CrateMedexPipeline.java:138)
    at CrateMedexPipeline.<init>(CrateMedexPipeline.java:112)
    at CrateMedexPipeline.main(CrateMedexPipeline.java:330)

java.lang.NullPointerException
    at org.apache.algorithms.SuffixArray.search(SuffixArray.java:636)
    at org.apache.medex.MedTagger.medtagging(MedTagger.java:362)
    at org.apache.medex.MedTagger.run_batch_medtag(MedTagger.java:264)
    at CrateMedexPipeline.processInput(CrateMedexPipeline.java:312)
    at CrateMedexPipeline.runPipeline(CrateMedexPipeline.java:138)
    at CrateMedexPipeline.<init>(CrateMedexPipeline.java:112)
    at CrateMedexPipeline.main(CrateMedexPipeline.java:330)

java.lang.NullPointerException

11.9.2. CRATE reports an encoding error when talking to MedEx 

You have probably missed out the -Dfile.encoding=UTF-8 setting for Java in your config file; see the example.

Footnotes