7.7. MedEx-UIMA drug NLP
MedEx-UIMA NLP (for drugs and drug doses) 1 is supported via an external program. MedEx-UIMA runs in Java. CRATE supplies an external front-end Java program (CrateMedexPipeline.java) that loads the MedEx app, sends text to it (via a temporary disk file, for reasons relating to MedEx-UIMA’s internal workings), and returns answers.
7.7.1. Installation
Download it from https://sbmi.uth.edu/ccb/resources/medex.htm
CRATE provides Java code (see CrateMedexPipeline.java) to talk to MedEx-UIMA. Use
crate_nlp_build_medex_java_interface
to build this before you use it for the first time.CRATE fixes some bugs in MedEx-UIMA. Run
crate_nlp_build_medex_itself
to rebuild MedEx and fix them.
7.7.2. Output columns
In addition to the standard NLP output columns, the CRATE MedEx processor produces these output columns:
Column |
SQL type |
Description |
---|---|---|
sentence_index |
INT |
One-based index of sentence in text |
sentence_text |
TEXT |
Text recognized as a sentence by MedEx |
drug |
TEXT |
Drug name, as in the text |
drug_startpos |
INT |
Start position of drug |
drug_endpos |
INT |
End position of drug |
brand |
TEXT |
Drug brand name (?lookup ?only if given) |
brand_startpos |
INT |
Start position of brand |
brand_endpos |
INT |
End position of brand |
form |
VARCHAR(255) |
Drug/dose form (e.g. ‘tablet’) |
form_startpos |
INT |
Start position of form |
form_endpos |
INT |
End position of form |
strength |
VARCHAR(50) |
Strength (e.g. ‘75mg’) |
strength_startpos |
INT |
Start position of strength |
strength_endpos |
INT |
End position of strength |
dose_amount |
VARCHAR(50) |
Dose amount (e.g. ‘2 tablets’) |
dose_amount_startpos |
INT |
Start position of dose_amount |
dose_amount_endpos |
INT |
End position of dose_amount |
route |
VARCHAR(50) |
Route (e.g. ‘by mouth’) |
route_startpos |
INT |
Start position of route |
route_endpos |
INT |
End position of route |
frequency |
VARCHAR(50) |
frequency (e.g. ‘by mouth’) |
frequency_startpos |
INT |
Start position of frequency |
frequency_endpos |
INT |
End position of frequency |
frequency_timex3 |
VARCHAR(50) |
Normalized frequency in TIMEX3 format (e.g. ‘R1P12H’) |
duration |
VARCHAR(50) |
Duration (e.g. ‘for 10 days’) |
duration_startpos |
INT |
Start position of duration |
duration_endpos |
INT |
End position of duration |
necessity |
VARCHAR(50) |
Necessity (e.g. ‘prn’) |
necessity_startpos |
INT |
Start position of necessity |
necessity_endpos |
INT |
End position of necessity |
necessity |
VARCHAR(50) |
Necessity (e.g. ‘prn’) |
necessity_startpos |
INT |
Start position of necessity |
necessity_endpos |
INT |
End position of necessity |
umls_code |
VARCHAR(8) |
UMLS CUI |
rx_code |
INT |
RxNorm RxCUI for drug |
generic_code |
INT |
RxNorm RxCUI for generic name |
generic_name |
TEXT |
Generic drug name (associated with RxCUI code) |
Start positions are the zero-based index of the first relevant character. End positions are the zero-based index of one beyond the last relevant character.
7.7.3. crate_nlp_build_medex_java_interface
Options:
USAGE: crate_nlp_build_medex_java_interface [-h] [--builddir BUILDDIR]
[--medexdir MEDEXDIR]
[--java JAVA] [--javac JAVAC]
[--verbose] [--launch]
Compile Java classes for CRATE's interface to MedEx-UIMA
OPTIONS:
-h, --help show this help message and exit
--builddir BUILDDIR Output directory for compiled .class files (default:
/path/to/crate/crate_anon/nlp_manager/compiled_nlp_clas
ses)
--medexdir MEDEXDIR Root directory of MedEx installation (default:
/path/to/Medex/installation)
--java JAVA Java executable (default: java)
--javac JAVAC Java compiler (default: javac)
--verbose, -v Be verbose (use twice for extra verbosity) (default: 0)
--launch Launch script in demonstration mode (having previously
compiled it) (default: False)
7.7.4. crate_nlp_build_medex_itself
This program builds MedEx and implements some bug fixes and improvements for the UK.
Options:
USAGE: crate_nlp_build_medex_itself [-h] [--medexdir MEDEXDIR] [--javac JAVAC]
[--deletefirst] [--verbose]
Compile MedEx-UIMA itself (in Java)
OPTIONS:
-h, --help show this help message and exit
--medexdir MEDEXDIR Root directory of MedEx installation (default:
/path/to/Medex/installation)
--javac JAVAC Java compiler (default: javac)
--deletefirst Delete existing .class files first (optional) (default:
False)
--verbose, -v Be verbose (default: False)
7.7.5. CrateMedexPipeline
The following specimen script assumes specific locations for the compiled Java
(CrateMedexPipeline.class
); edit it as required.
Asking CrateMedexPipeline to show its command-line options:
crate_show_crate_medex_pipeline_options
The resulting output:
usage: CrateMedexPipeline -i DIR -o DIR
[-h] [-v [-v]] [-lt LOGTAG]
[-data_ready_signal DATA_READY]
[-results_ready_signal RESULTS_READY]
Java front end to MedEx-UIMA natural language processor for drugs.
Takes signals on stdin, and data on disk.
Writes signals to stdout, and data to disk.
required arguments:
-i DIR (*) Specifies the input directory to read text from.
-o DIR (*) Specifies the input directory to write results to.
optional arguments:
--help Show this help message and exit.
-h
-v Verbose (use twice to be more verbose).
-lt LOGTAG Use an additional tag for stderr logging.
Helpful in multiprocess environments.
-data_ready_signal DATA_READY
Sets the 'data ready' signal that this program waits for
on stdin before scanning for data.
-results_ready_signal RESULTS_READY
Sets the 'data ready' signal that this program sends on
stdout once results are ready on disk.
(*) MedEx argument
Footnotes
- 1
MedEx UIMA reference publication: https://www.ncbi.nlm.nih.gov/pubmed/25954575
- 2
MedEx-UIMA downloads: https://sbmi.uth.edu/ccb/resources/medex.htm