11. FAQs and troubleshooting
11.1. Known bugs elsewhere affecting CRATE
wkhtmltopdf font size bug
See notes next to PATIENT_FONTSIZE in config/settings.py
If you try to use django-debug-toolbar when proxying via a Unix domain socket, you need to use a custom INTERNAL_IPS setting; see the specimen config file.
SQL Server returns a rowcount of -1; this is normal. See https://code.google.com/p/pyodbc/wiki/Cursor.
11.2. General
11.2.1. ImportError: No module named ‘{mysqldb, pyodbc, …}’
You haven’t installed the right drivers for the database URL(s) that you have specified in your CRATE configuration file. Make sure the virtualenv is activated (see Activating your virtual environment), and then install it (see Database drivers).
11.2.2. Which Python modules should be installed?
Use pip freeze
to show what you have installed. The ones you have to
install manually are marked with # MANUAL
; CRATE installs the others.
django-pyodbc-azure==1.10.4.0 # MANUAL: for Django to talk to SQL Server
pyodbc==4.0.3 # MANUAL: for ODBC database connections
11.2.3. Pretty colours all gone (anonymiser, NLP, etc.)
That’s what happens when you pipe the tool through tee
.
11.3. CRATE anonymiser
11.3.1. Anonymisation is slow
Make sure you have indexes created on all patient_id fields, because the tool will use this to find (a) values for scrubbing, and (b) records for anonymisation. Indexing here makes a huge difference!
11.3.2. CRATE uses lots of memory
A normal run should see CRATE using roughly 60–80 Mb per process. Values much in excess of this likely relate to the text extraction process, which uses third-party software over which CRATE has no control (I’ve seen >1 Gb) 1.
11.3.3. “File is not a zip file”
In full: “Caught exception from document_to_text: File is not a zip file” when extracting text from DOCX documents
This error usually appears with encrypted, password-protected DOCX files. The anonymiser will not be able to read these, and this error can be ignored.
11.3.4. “UnRtf: … has stopped working”
If a third-party program is used by CRATE for text extraction and fails, you may get lots of messages from Windows like “UnRtf: convert document in RTF format to other formats has stopped working. Windows can check online for a solution to the problem…”
To disable this in Windows Server 2008, run the Server Manager, and in the main page scroll down to a section titled Resources and Support. There should be a “Windows Error Reporting” section. It’s probably set to “Ask me about sending reports every time an error occurs”; change this to “I don’t want to participate, and don’t ask me again”.
That gets rid of the options to tell Microsoft, but it still pops up some “close or debug?” dialog boxes. To fix that, add the following registry entries 2:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting : “ForceQueue”=dword:00000000
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting : “DontShowUI”=dword:00000001
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting : “DontSendAdditionalData”=dword:00000001
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\Consent : “DefaultConsent”=dword:00000002
11.3.5. “UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xff in position 0: invalid start byte”
You can get this error when you try to use a config file in Windows, certainly by using Notepad to create/edit a CRATE config file. Notepad can default to “UTF-16 LE” encoding. Re-save the config file with UTF-8 encoding.
11.4. CRATE NLP
11.4.1. How can I update a source file in KConnect/Bio-YODIE
Example: replacing plugins/Tagger_ConText/src/gate/context/ContextFeaturesTagger.java.
Make sure Apache Ant is installed 3.
Replace the source file.
Delete the existing .jar file.
Run
ant build
.If it fails, check the build.properties file, which contains local variables such as directories (e.g. gate.home); edit this and try again.
11.5. CRATE web site
11.5.1. crate_launch_cherrypy_server can’t find its config files
Use forward slashes (/
) rather than backslashes (\
) for filename paths
in the environment variable CRATE_CHERRYPY_ARGS (e.g.
C:/somepath/somefile.ext
) or escape the backslashes by doubling them (e.g.
C:\\somepath\\somefile.ext
).
11.5.2. Port 443 not free on ‘127.0.0.1’
Problem: Under Windows Server: OSError(“Port 443 not free on ‘127.0.0.1’”)
If you launch the CRATE web server on port 443 (for HTTPS) and get this error, something else is probably using port 443. If that’s unexpected to you, it’s because Microsoft uses it by default 13. Try:
netstat -ban
to see what’s using it. In our case we had this:
Proto Local Address Foreign Address State
...
[svchost.exe]
TCP 0.0.0.0:443 0.0.0.0 LISTENING
...
[svchost.exe]
TCP [::]:443 [::]:0 LISTENING
That is, svchost.exe is using port 443. One question now is: which actual program is using this port via svchost.exe (which is a service host program that does all sorts of things) 14? A possibility relates to VMWare 15.
If you have sufficient control over your machine to wrest port 443 away from whatever’s using it, fine. Otherwise, you may need to use an alternative port. A common choice might be 8443 16.
11.5.3. OSError: No socket could be created… permission denied
The error OSError: No socket could be created -- (('127.0.0.1', 443): [Errno
13] Permission denied)
may be seen on Linux and is similar to port 443
not free.
Under Linux, programs without root authority cannot, by default, open port numbers below 1024. The best thing to do under Linux is to run a privileged “main” web server like Apache (which becomes responsible for encryption and so forth), and configure CRATE to operate behind Apache. See Configuring for Apache. Alternative approaches are outlined at https://superuser.com/questions/710253/allow-non-root-process-to-bind-to-port-80-and-443.
11.5.4. “Your connection is not private…” browser error
You will always get this error until you get a proper HTTPS certificate. This error occurs when you self-sign a certificate. Browers will offer you a way round, usually in small print [e.g. in Chrome:
].11.5.5. 403 Forbidden: CSRF verification failed
There are at least two possible reasons:
Your browser must enable cookies, at least for ‘same-origin’ requests; this is a critical part of CRATE’s security to prevent cross-site request forgeries. If you’re using Firefox, try Chrome 7.
By default, the CRATE web site uses only HTTPS (secure HTTP). This is governed by the CRATE_HTTPS parameter in CRATE’s own crateweb/config/settings.py. When CRATE_HTTPS is True, then CSRF cookies are only permitted over HTTPS, so if you use plain HTTP, you will see this error. A quick hack is to set
CRATE_HTTPS = False
in your local settings, but this is a bad idea; set up HTTPS properly instead, as above.
11.5.6. “Unknown, invalid, or unsupported option… in a getsockopt or setsockopt call”
Problem: the front end might produce the error: “An unknown, invalid, or unsupported option or level was specified in a getsockopt or setsockopt call” while initiating a back-end task.
This is due to a bug in the Python AMQP package version 2.1.4 4. (AMQP is a protocol used by Celery to talk to a message broker such as RabbitMQ; CRATE uses Celery to manage its back-end asynchronous tasks, like sending e-mails.)
Solution: downgrade AMQP. From the activated CRATE virtual environment:
pip uninstall amqp
pip install -Iv amqp==2.1.3
This change has been hardcoded into CRATE’s setup scripts to prevent Celery from picking the buggy version of the Python amqp package. As a consequence, other requirements are also downgraded (celery to 4.0.1; kombu to 4.0.1).
11.5.7. Static files not served via CherryPy
Problem: Static files are not served (e.g. broken icons rather than plus/minus symbols; broken-looking admin site) when using the CherryPy server for CRATE.
Solution: You should specify either (1) FORCE_SCRIPT_NAME = "/crate"
in
your config file, or (2) --root_path /crate
in your CRATE_CHERRYPY_ARGS
environment variable. (The default value for the latter is taken from the
former.)
The cause of the problem is as follows: if you don’t do this, then https://mysite/ is meant to be your site, while https://mysite/crate_static/ is meant to be your static root. However, the latter comes under the former, so Django says “it’s for me” then “it doesn’t exist”. If you use https://mysite/crate/ as your site root, with https://mysite/crate_static/ as your static root, then the software is happy.
11.5.8. I can’t restart the CRATE Windows Service cleanly
There may be a problem whereby the CRATE web service doesn’t entirely shut down when its service is stopped. You can manually kill leftover processes (which will appear as python.exe or python.exe *32) using taskmgr.
This should be fixed now.
11.5.9. CRATE service doesn’t start… errors in Windows Event Log
If your CRATE service doesn’t start and you see this error in the Event Log:
Unable to read Server Queue performance data from the Server service.
The first four bytes (DWORD) of the Data section contains the status
code, the second four bytes contains the IOSB.Status and the next four
bytes contains the IOSB.Information.
Log Name: Application
Source: PerfNet
Event ID: 2006
then consider first if this might be a bug relating to HP ProLiant servers. See https://support.hpe.com/hpsc/doc/public/display?docId=emr_na-a00041653en_us&docLocale=en_US and a fix at https://support.microsoft.com/en-in/help/4057142/windows-10-update-kb4057142. However, that is for Windows Server 2016.
The CPFT server from Dec 2016 is an HP ProLiant DL360 Gen9 server with 2 × 12-core/24-thread Intel Xeon E5-2687WV4 3 GHz CPUs (48 effective CPUs), with 96 Gb RAM later upgraded to 672 Gb RAM (in 2018), and 11.5 Tb SSD storage in a RAID configuration (8.9 Tb available); it was about £15k inc. VAT initially plus £8k for the extra RAM. It runs Windows Server 2008 R2.
Other possible problems:
https://support.microsoft.com/en-my/help/2607486/windows-server-2008-r2-reports-perfnet-error-in-application-log-on-mac … but that relates to machines with >64 processors;
https://support.microsoft.com/en-us/help/2279566/32-bit-application-cannot-query-performance-server-work-queues-counter … that’s more likely since it relates to machines with >32 processors, and in turn this suggests that a 32-bit application is having trouble. However, we have 64-bit Python installed.
As it turned out, s per the CRATE Windows service help, we try
crate_windows_service
, and it reported that theservicemanager
module was missing; that’d explain it! The virtual environment had got messed up.
So the general rescue method:
remove the old virtual environment
recreate the virtual environment and reinstall, e.g.
cd \srv\crate "\Program Files\Python35\python.exe" -m venv crate_virtualenv crate_virtualenv\Scripts\activate.bat pip install crate_anon==0.18.51 pyodbc django-pyodbc-azure
remove and reinstall the CRATE service, using an Administrator command prompt:
crate_windows_service remove
… reboot…
crate_windows_service install
11.5.10. “No connection could be made because the target machine actively refused it”
Problem: From the front end, you see: “No connection could be made because the target machine actively refused it”.
Check that RabbitMQ is running. Check also CRATE’s Celery log. If RabbitMQ is not running, you’ll see something like this in CRATE’s Celery log, giving a little more detail:
[2017-02-25 23:50:11,433: Error/MainProcess] consumer: Cannot connect to
amqp://guest:**@127.0.0.1:5672//: [WinError 10061] No connection could be
made because the target machine actively refused it.
This indicates that Celery (called by CRATE) is looking for RabbitMQ on port
5672, finding it, but being refused access. Make sure RabbitMQ is installed and
its service started. Run rabbitmqctl
, which on Windows machines is typically
typically at
C:\Program Files\RabbitMQ Server\rabbitmq_server-3.6.6\sbin\rabbitmqctl.bat
Specifically, run:
rabbitmqctl status
If you see this unhappy output:
Status of node 'rabbit@cpft-crate-p01' ...
Error: unable to connect to node 'rabbit@cpft-crate-p01': nodedown
DIAGNOSTICS
===========
attempted to contact: ['rabbit@cpft-crate-p01']
rabbit@cpft-crate-p01:
* connected to epmd (port 4369) on cpft-crate-p01
* epmd reports: node 'rabbit' not running at all
other nodes on cpft-crate-p01: ['RabbitMQ']
* suggestion: start the node
current node details:
- node name: 'rabbitmq-cli-11@cpft-crate-p01'
- home dir: P:\
- cookie hash: <...some hash...>
… then one possibility is that RabbitMQ was improperly installed. This can happen if installed by a non-administrative user 18, or if your Windows variables HOMEDRIVE and HOMESHARE are pointing to a network drive 19. From an administrative command prompt, this was one solution:
REM Remove the old installation:
net stop rabbitmq
“C:\Program Files\RabbitMQ Server\uninstall.exe”
REM Set environment variables for the new installation:
SET HOMEDRIVE=C:\
SET HOMESHARE=C:\Users
SET ERLANG_HOME=C:\Program Files\erl8.2
REM Now reinstall:
C:\some_download_dir\rabbitmq-server-3.6.6.exe
REM ... and watch the detailed output closely to make sure there are no errors
C:\Program Files\RabbitMQ Server\rabbitmq_server_3.6.6\sbin\rabbitmqctl.bat status
Here’s some happy output:
Status of node 'rabbit@cpft-crate-p01' ...
[{pid,55372},
{running_applications,[{rabbit,"RabbitMQ","3.6.6"},
{rabbit_common,[],"3.6.6"},
{mnesia,"MNESIA CXC 138 12","4.14.2"},
{ranch,"Socket acceptor pool for TCP protocols.",
"1.2.1"},
{xmerl,"XML parser","1.3.12"},
{os_mon,"CPO CXC 138 46","2.4.1"},
{sasl,"SASL CXC 138 11","3.0.2"},
{stdlib,"ERTS CXC 138 10","3.2"},
{kernel,"ERTS CXC 138 10","5.1.1"}]},
{os,{win32,nt}},
{erlang_version,"Erlang/OTP 19 [erts-8.2] [64-bit] [smp:24:24] [async-threads:64]\n"},
{memory,[{total,63923600},
{connection_readers,0},
{connection_writers,0},
{connection_channels,0},
{connection_other,0},
{queue_procs,2736},
{queue_slave_procs,0},
{plugins,0},
{other_proc,23674272},
{mnesia,61784},
{mgmt_db,0},
{msg_index,42592},
{other_ets,1003792},
{binary,22848},
{code,17795673},
{atom,752561},
{other_system,20567342}]},
{alarms,[]},
{listeners,[{clustering,25672,"::"},{amqp,5672,"::"},{amqp,5672,"0.0.0.0"}]},
{vm_memory_high_watermark,0.4},
{vm_memory_limit,41174066790},
{disk_free_limit,50000000},
{disk_free,8951801614336},
{file_descriptors,[{total_limit,8092},
{total_used,2},
{sockets_limit,7280},
{sockets_used,0}]},
{processes,[{limit,1048576},{used,179}]},
{run_queue,0},
{uptime,28},
{kernel,{net_ticktime,60}}]
If you see something like that, all should be well.
11.6. MySQL
11.6.1. Can’t connect to MySQL, even manually
See https://dev.mysql.com/doc/refman/5.5/en/problems-connecting.html.
11.6.2. How do I reconfigure MySQL?
Edit the MySQL configuration file.
Under Ubuntu Linux this is usually /etc/mysql/my.cnf.
Under Windows it can be in several locations 11. If you’re not sure where yours is, find mysqld (typically
C:\Program Files\MySQL\MySQL Server 5.7\bin\mysqld.exe
) and runmysqld --verbose --help > helpoutput.txt
and inspect the resulting file helpoutput.txt (which is quite long). There’ll be a pair of lines like:Default options are read from the following files in the given order: C:\Windows\my.ini C:\Windows\my.cnf C:\my.ini C:\my.cnf C:\Program Files\MySQL\MySQL Server 5.7\my.ini C:\Program Files\MySQL\MySQL Server 5.7\my.cnf
Note that on our main test system (Windows Server 2008 R2, MySQL 5.7) the only file existing was
C:\ProgramData\MySQL\MySQL Server 5.7\my.ini
, but this was busy being ignored when mysqld was run manually (see below for why – it can be specified directly as an argument to the Windows service, so a manual call tomysqld
doesn’t see the same file, which is confusing).For running MySQL systems, you can also view the configuration filename via MySQL Workbench (under Server Status), but what you see here can be wrong. For example, we’ve had it showing
C:\Program Files\MySQL\MySQL Server 5.0\my.ini
, when there was no such directory (we were using MySQL 5.7), and the actual file wasC:\my.ini
.
Restart MySQL.
Under Ubuntu Linux, use
sudo service mysql restart
.Under Windows, restart the MySQL service in the Services tool (it’s typically called e.g. MySQL57).
If MySQL fails to restart, run the mysqld
program manually so you can see
why.
If it sits there appearing to do nothing, it’s probably happy; check the log files, on Windows usually called HOSTNAME.err in the database data directory (where HOSTNAME is the name of your computer).
You can run
mysqld --console --standalone --log-error-verbosity 3
to get it to write to the console. These options also allow you to abort it with CTRL-C.Under Ubuntu, precede that with
sudo -u mysql bash
to get a shell running as the mysql user.You might also try
mysqld --print-defaults
to see its options.Under Ubuntu, try also
journalctl -xe | grep -i mysql | less
(sometimes apparmor will block access to MySQL files, if you’ve moved them from their default location, which can be very confusing; in this case, you’ll need to edit /etc/apparmor.d/usr.sbin.mysqld or /etc/apparmor.d/local/usr.sbin.mysqld).
If the Windows service is stuck in the ‘starting’ state, for example after you’ve reconfigured MySQL:
To kill a dead/stuck service: (1) Check the service short name by double-clicking it in Services. Let’s support it’s MySQL57. (2)
sc queryex MySQL57
to see its process ID or PID. (3)taskkill /f /pid PIDNUM
(where PIDNUM is the process ID from the previous step).Inspect the Properties of the malfunctioning service carefully. These include a “path to executable” option, which can look like this:
"C:\Program Files\MySQL\MySQL Server 5.7\bin\mysqld.exe" --defaults-file="C:\ProgramData\MySQL\MySQL Server 5.7\my.ini" MySQL57
. This gives you the service name and also the hidden configuration path!To reinstall the service:
mysqld --install
12. The default service name is ‘MySQL’, but you can override this. You’re probably best being explicit, like this:mysqld --install MySQL57 --defaults-file="C:\my.ini"
After creating a service, start it manually; if it fails, check
.If MySQL fails to start and you see errors like The innodb_system data file ‘ibdata1’ must be writable, the first thing to check is that another copy of mysqld is not already running.
To delete a defunct service:
sc delete servicename
. Exercise extreme caution with this!
Your target is a happy MySQL installation that restarts automatically when you reboot.
11.6.3. Table names are always lower case using MySQL under Windows
Under Windows, MySQL converts table names to lower case by default (but is
happy with mixed-case column names, and is happy with table and column names
being mixed case under Linux). This is because of the default setting for
lower_case_table_names
in my.ini or my.cnf. In turn, this is because
Windows can use a case-insensitive file system (and since tables can be stored
by the name, this would result in an almighty mess) 5.
However, when using NTFS, Windows filenames are case-sensitive
6. Therefore, under Windows with NTFS, you have more options for
lower_case_table_names
. Note, however, that it also affects the
case-sensitivity of table names using SQL (but not of column names). So you’re
probably better off always using lower_case_table_names = 1
, as per the
MySQL advice. This is the default behaviour under Windows.
11.6.4. “Got a packet bigger than ‘max_allowed_packet’ bytes” or “MySQL has gone away”
Problem: sqlalchem.exc.OperationalError: (mysql.connector.errors.OperationalError)… Got a packet bigger than ‘max_allowed_packet’ bytes or MySQL has gone away when sending large packets.
Using large “chunks” is efficient but you have to configure MySQL to allow it.
The max_allowed_packet
setting in the MySQL configuration file (see above)
governs this. Try changing the default, e.g. from
max_allowed_packet=4M # too small!
to
max_allowed_packet=40M
and restart MySQL (as above). You can also view current settings using MySQL
Workbench (max_allowed_packet
).
If you can’t get this working, reduce the --chunksize
parameter to the
CRATE anonymiser.
11.6.5. How do I hot-swap two MySQL databases?
Since anonymisation is slow, you may want a live research database and another that you can update offline. When you’re ready to swap, you’ll want to
create DEFUNCT
rename LIVE -> DEFUNCT
rename OFFLINE -> LIVE
then either revert:
rename LIVE -> OFFLINE
rename DEFUNCT -> LIVE
or commit:
drop DEFUNCT
How?
11.6.6. “MySQL server has gone away”
One possibility is that you are processing a big binary field, and MySQL’s
max_allowed_packet
parameter is too small. Try increasing it (e.g. from 16M
to 32M). See also
https://camcops.readthedocs.io/en/latest/administrator/server_troubleshooting.html?highlight=max_allowed_packet#mysql-server-has-gone-away
11.6.7. How to convert a database from SQL Server to MySQL?
This facility is provided by MySQL Workbench, which will connect to an SQL
Server instance. Use the “ODBC via connection string” option if other methods
aren’t working: DSN=XXX;UID=YYY;PWD=ZZZ
.
If the schema definitions are not seen, it’s a permissions issue (https://stackoverflow.com/questions/17038716), in which case you can also copy copy the database using CRATE’s anonymiser, treating all tables as non-patient tables (i.e. doing no actual anonymisation).
11.6.8. What settings do I need in /etc/mysql/my.cnf?
Probably these:
[mysqld]
max_allowed_packet = 32M
innodb_strict_mode = 1
innodb_file_per_table = 1
innodb_file_format = Barracuda
# Only for MySQL prior to 5.7.5 (http://dev.mysql.com/doc/relnotes/mysql/5.6/en/news-5-6-20.html):
innodb_log_file_size = 320M
# For more performance, less safety:
innodb_flush_log_at_trx_commit = 2
# To save memory?
# Default is 8; suggestion is ncores * 2
# innodb_thread_concurrency = ...
[mysqldump]
max_allowed_packet = 32M
11.6.9. “_mysql_exceptions.OperationalError: (1118, ‘Row size too large (> 8126)”
In full, the error is:
_mysql_exceptions.OperationalError: (1118, 'Row size too large (> 8126).
Changing some columns to TEXT or BLOB or using ROW_FORMAT=DYNAMIC or
ROW_FORMAT=COMPRESSED may help. In current row format, BLOB prefix of 768
bytes is stored inline.')
See above. If you need to change the log file size, FOLLOW THIS PROCEDURE: https://dev.mysql.com/doc/refman/5.0/en/innodb-data-log-reconfiguration.html
11.6.10. “Segmentation fault (core dumped)…”
This error can be seen when using the Microsoft ODBC driver for Linux, which is buggy. In this situation, use the Microsoft JDBC driver instead.
11.6.11. “Killed.”
You may be out of memory, on a small computer. Try reducing MySQL’s memory footprint. (Steps have already been taken to reduce memory usage by the anonymiser itself.)
11.6.12. Can’t create FULLTEXT index(es)
MySQL v5.6 is required to use FULLTEXT indexes with InnoDB tables (as opposed to MyISAM tables, which don’t support transactions).
On Ubuntu 14.04, the default MySQL version is 5.5, so use:
sudo apt-get install mysql-server-5.6 mysql-server-core-5.6 \
mysql-client-5.6 mysql-client-core-5.6
11.6.13. How to search with FULLTEXT indexes?
In conventional SQL, you would use:
... WHERE field LIKE '%word%'
In a field having a MySQL FULLTEXT index, you can use:
... WHERE MATCH(field) AGAINST ('word')
There are several variants. See https://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
11.7. SQL Server
11.7.1. […] “Connection is busy with results for another command” […]
Or: configuring Windows ODBC for MARS.
If you see this with Microsoft SQL Server via ODBC/pyodbc, you need to enable Multiple Active Result Sets (MARS), because for some reason Microsoft think it’s unusual to want more than one cursor open (more than one simultaneous query) to a single database at once. There are several ways:
Windows
(DOESN’T WORK.) Append
;MultipleActiveResultSets=True
to the connection URL, e.g.mssql+pyodbc://@MYDSN;MultipleActiveResultSets=True
. However, although this is documented 8, it didn’t work via pyodbc 9!(WORKS.)
Fire up an Administrator command prompt (e.g. press the Start button, start typing “Command Prompt”, and when it appears, right-click it and choose “Run as Administrator”).
Run the command:
odbcconf /a {CONFIGSYSDSN "SQL Server Native Client 11.0" "DSN=MY_DSN|MARS_Connection=Yes"}
(replacing the driver and DSN names with your own).You can re-run the ODBC configuration wizard, and it should now say Multiple Active Result Sets(MARS): YES where it said … NO before.
This does work. Use
CONFIGDSN
instead ofCONFIGSYSDSN
if you are using a user DSN. Your changes should be visible if you restart the ODBC control panel (e.g. withodbccp32.cpl
) and go through the configuration wizard again; the MARS option (which you can’t edit) should have changed from “No” to “Yes”.There’s also a registry hack 10.
Linux
Under Linux, in
/etc/odbc.ini
, for that DSN, setMARS_Connection = yes
. Seehttps://msdn.microsoft.com/en-us/library/cfa084cz(v=vs.110).aspx
https://msdn.microsoft.com/en-us/library/h32h3abf(v=vs.110).aspx
Rationale: We use gen_patient_ids() to iterate through patients, but then we fetch data for that patient via the same connection to the source database(s). Therefore, we’re operating multiple result sets through one connection.
11.7.2. “The data types nvarchar(max) and ntext are incompatible…”
Problem: Using SQL Server, you get an error from the CRATE web front end saying “The data types nvarchar(max) and ntext are incompatible in the equal to operator.”
Solution: Upgrade pyodbc.
(This error occurs with pyodbc 3.1.1 but not with pyodbc 4.0.3, for example.) The error relates to pyodbc passing text parameters to SQL Server as NTEXT rather than NVARCHAR(MAX).
11.7.3. “A default full-text catalog does not exist in database…”
Problem: Using SQL Server: “A default full-text catalog does not exist in database ‘XXX’ or user does not have permission to perform this action.”
Assuming the user does have permission, this means you need to run this SQL beforehand:
USE mydatabase;
CREATE FULLTEXT CATALOG default_fulltext_catalog AS DEFAULT;
See https://technet.microsoft.com/en-us/library/dd283095(v=sql.100).aspx.
11.7.4. New tables are named like mydb.[SERVERNAMEUSERNAME].mytable
Problem: Under SQL Server, new tables (e.g. from NLP) look like mydatabase.[SERVERNAMEUSERNAME].tablename rather than mydatabase.dbo.tablename.
Under Microsoft SQL Server, the standard full notation for a table is database.schema.table. The default schema is often dbo, so tables look like mydatabase.dbo.mytable. However 17:
The default schema for a user can be defined by using the
DEFAULT_SCHEMA
option ofCREATE USER
orALTER USER
. If no default schema is defined for a user account, SQL Server will assumedbo
is the default schema. It is important [to] note that if the user is authenticated by SQL Server as a member of a group in the Windows operating system, no default schema will be associated with the user. If the user creates an object, a new schema will be created and named the same as the user, and the object will be associated with that user schema.
So, for example, if your username is RCardinal and you authenticate to SQL
Server via Windows authentication, and then create a table, it is likely to be
called something like [mydatabase].[myserver\RCardinal].[mytable]
. You can
try this:
USE mydatabase;
SELECT name, type_desc, default_schema_name FROM sys.database_principals;
USE mydatabase;
ALTER USER [myserver\RCardinal] WITH DEFAULT_SCHEMA = dbo;
11.7.5. “Restricted data type attribute violation”
We’ve seen this whilst performing an initial scan (data dictionary
autogeneration) of a source database using the SQL Server Native Client
11.0
ODBC driver with SQL Server 2012 (version series 11 per
https://sqlserverbuilds.blogspot.com/; unsure of specific version), via
pyodbc==4.0.x
, default ODBC settings (except MARS enabled, as above), and a
read-only connection. The SQLAlchemy URL for this setup is
mssql+pyodbc://@odbc_data_source_name
. The error went away by using the
ODBC Driver 17 for SQL Server
driver instead.
11.8. Windows
11.8.1. Control Panel looks blank in Windows 2008 Server
This isn’t a CRATE problem. However, it’s something we encountered as a problem
when using CRATE. If your Control Panel looks blank, (1) run gpedit.msc
(Local Group Policy Editor), then . Check the settings there.
When that doesn’t work, I’m a bit stuck; try running .cpl
items from the
command line instead.
11.9. MedEx-UIMA
11.9.1. MedEx-UIMA gives Java errors
Yes. Frankly, it’s just fairly badly written, from a programmer’s perspective.
I’ve fixed a few of its bugs in a nasty patch-based way; see
build_medex_itself.py
. MedEx takes the approach of allowing bugs to throw
exceptions, catching its own exceptions, printing the stack trace, and carrying
on regardless; this can be off-putting, but I’ve not bothered to fix all its
bugs. CRATE will abort if any exceptions make it out of MedEx, but you can just
ignore ones that you see. Here are a couple I fixed:
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 2
at java.lang.String.charAt(Unknown Source)
at org.apache.NLPTools.Document.<init>(Document.java:134)
at org.apache.medex.MedTagger.run_batch_medtag(MedTagger.java:256)
at CrateMedexPipeline.processInput(CrateMedexPipeline.java:302)
at CrateMedexPipeline.<init>(CrateMedexPipeline.java:128)
at CrateMedexPipeline.main(CrateMedexPipeline.java:320)
java.lang.StringIndexOutOfBoundsException: String index out of range: 1
at java.lang.String.charAt(Unknown Source)
at org.apache.algorithms.SuffixArray.construct_tree_word(SuffixArray.java:375)
at org.apache.algorithms.SuffixArray.re_build(SuffixArray.java:97)
at org.apache.algorithms.SuffixArray.<init>(SuffixArray.java:60)
at org.apache.medex.MedTagger.medtagging(MedTagger.java:359)
at org.apache.medex.MedTagger.run_batch_medtag(MedTagger.java:264)
at CrateMedexPipeline.processInput(CrateMedexPipeline.java:302)
at CrateMedexPipeline.<init>(CrateMedexPipeline.java:128)
at CrateMedexPipeline.main(CrateMedexPipeline.java:320)
(the first being an example of using & when they meant && in a logic test). Here are a couple of MedEx bugs I haven’t fixed, so you might see them:
java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.Vector.elementData(Unknown Source)
at java.util.Vector.get(Unknown Source)
at org.apache.NLPTools.SentenceBoundary.detect_boundaries(SentenceBoundary.java:329)
at org.apache.medex.MedTagger.medtagging(MedTagger.java:354)
at org.apache.medex.MedTagger.run_batch_medtag(MedTagger.java:264)
at CrateMedexPipeline.processInput(CrateMedexPipeline.java:312)
at CrateMedexPipeline.runPipeline(CrateMedexPipeline.java:138)
at CrateMedexPipeline.<init>(CrateMedexPipeline.java:112)
at CrateMedexPipeline.main(CrateMedexPipeline.java:330)
java.lang.NullPointerException
at org.apache.algorithms.SuffixArray.search(SuffixArray.java:636)
at org.apache.medex.MedTagger.medtagging(MedTagger.java:362)
at org.apache.medex.MedTagger.run_batch_medtag(MedTagger.java:264)
at CrateMedexPipeline.processInput(CrateMedexPipeline.java:312)
at CrateMedexPipeline.runPipeline(CrateMedexPipeline.java:138)
at CrateMedexPipeline.<init>(CrateMedexPipeline.java:112)
at CrateMedexPipeline.main(CrateMedexPipeline.java:330)
java.lang.NullPointerException
11.9.2. CRATE reports an encoding error when talking to MedEx
You have probably missed out the -Dfile.encoding=UTF-8
setting for Java in
your config file; see the example.
Footnotes
- 1
For tracking it down: https://www.huyng.com/posts/python-performance-analysis; http://chase-seibert.github.io/blog/2013/08/03/diagnosing-memory-leaks-python.html
- 2
https://stackoverflow.com/questions/396369/how-do-i-disable-the-debug-close-application-dialog-on-windows-vista; https://msdn.microsoft.com/en-us/library/windows/desktop/bb513638(v=vs.85).aspx; https://stackoverflow.com/questions/3561545/how-to-terminate-a-program-when-it-crashes-which-should-just-fail-a-unit-test/3637710#3637710
- 3
Apache Ant uses build.xml files to build Java .jar files from Java .java source files. From https://ant.apache.org, menuselection:Download –> Binary distributions, fetch apache-ant-1.10.1.zip or similar, and unzip it (e.g. to
C:\Program Files
). Set the JAVA_HOME environment variable to the Java JDK root directory. Set the ANT_HOME environment variable to the Apache Ant root directory.- 4
https://github.com/celery/py-amqp/issues/135; https://stackoverflow.com/questions/41775353; https://github.com/celery/py-amqp/issues/130
- 5
http://dev.mysql.com/doc/refman/5.7/en/identifier-case-sensitivity.html
- 6
- 7
Or maybe see http://superuser.com/questions/461608
- 8
https://msdn.microsoft.com/en-us/library/h32h3abf(v=vs.110).aspx
- 9
Connection string emitted by SQLAlchemy (found by placing a trace within sqlalchemy.engine.default.DefaultDialect.connect):
dsn=MY_DSN;MultipleActiveResultSets=True;Trusted_Connection=Yes
.- 10
- 11
- 12
http://dev.mysql.com/doc/refman/5.7/en/windows-start-service.html
- 14
- 15
http://superuser.com/questions/125455/why-is-the-system-process-listening-on-port-443
- 16
- 17
https://technet.microsoft.com/en-us/library/dd283095(v=sql.100).aspx
- 18