7.6. MedEx-UIMA drug NLP

MedEx-UIMA NLP (for drugs and drug doses) [1] is supported via an external program. MedEx-UIMA runs in Java. CRATE supplies an external front-end Java program (CrateMedexPipeline.java) that loads the MedEx app, sends text to it (via a temporary disk file, for reasons relating to MedEx-UIMA’s internal workings), and returns answers.

7.6.1. Installation

  • Download it from https://sbmi.uth.edu/ccb/resources/medex.htm
  • CRATE provides Java code (see CrateMedexPipeline.java) to talk to MedEx-UIMA. Use crate_nlp_build_medex_java_interface to build this before you use it for the first time.
  • CRATE fixes some bugs in MedEx-UIMA. Run crate_nlp_build_medex_itself to rebuild MedEx and fix them.

7.6.2. Output columns

In addition to the standard NLP output columns, the CRATE MedEx processor produces these output columns:

Column SQL type Description
sentence_index INT One-based index of sentence in text
sentence_text TEXT Text recognized as a sentence by MedEx
drug TEXT Drug name, as in the text
drug_startpos INT Start position of drug
drug_endpos INT End position of drug
brand TEXT Drug brand name (?lookup ?only if given)
brand_startpos INT Start position of brand
brand_endpos INT End position of brand
form VARCHAR(255) Drug/dose form (e.g. ‘tablet’)
form_startpos INT Start position of form
form_endpos INT End position of form
strength VARCHAR(50) Strength (e.g. ‘75mg’)
strength_startpos INT Start position of strength
strength_endpos INT End position of strength
dose_amount VARCHAR(50) Dose amount (e.g. ‘2 tablets’)
dose_amount_startpos INT Start position of dose_amount
dose_amount_endpos INT End position of dose_amount
route VARCHAR(50) Route (e.g. ‘by mouth’)
route_startpos INT Start position of route
route_endpos INT End position of route
frequency VARCHAR(50) frequency (e.g. ‘by mouth’)
frequency_startpos INT Start position of frequency
frequency_endpos INT End position of frequency
frequency_timex3 VARCHAR(50) Normalized frequency in TIMEX3 format (e.g. ‘R1P12H’)
duration VARCHAR(50) Duration (e.g. ‘for 10 days’)
duration_startpos INT Start position of duration
duration_endpos INT End position of duration
necessity VARCHAR(50) Necessity (e.g. ‘prn’)
necessity_startpos INT Start position of necessity
necessity_endpos INT End position of necessity
necessity VARCHAR(50) Necessity (e.g. ‘prn’)
necessity_startpos INT Start position of necessity
necessity_endpos INT End position of necessity
umls_code VARCHAR(8) UMLS CUI
rx_code INT RxNorm RxCUI for drug
generic_code INT RxNorm RxCUI for generic name
generic_name TEXT Generic drug name (associated with RxCUI code)

Start positions are the zero-based index of the first relevant character. End positions are the zero-based index of one beyond the last relevant character.

7.6.3. crate_nlp_build_medex_java_interface

Options:

usage: crate_nlp_build_medex_java_interface [-h] [--builddir BUILDDIR]
                                            [--medexdir MEDEXDIR]
                                            [--java JAVA] [--javac JAVAC]
                                            [--verbose] [--launch]

Compile Java classes for CRATE's interface to MedEx-UIMA

optional arguments:
  -h, --help           show this help message and exit
  --builddir BUILDDIR  Output directory for compiled .class files (default: /h
                       ome/rudolf/Documents/code/crate/crate_anon/nlp_manager/
                       compiled_nlp_classes)
  --medexdir MEDEXDIR  Root directory of MedEx installation (default:
                       /home/rudolf/dev/Medex_UIMA_1.3.6)
  --java JAVA          Java executable (default: java)
  --javac JAVAC        Java compiler (default: javac)
  --verbose, -v        Be verbose (use twice for extra verbosity) (default: 0)
  --launch             Launch script in demonstration mode (having previously
                       compiled it) (default: False)

# Generated at 2019-10-10 10:23:27

7.6.4. crate_nlp_build_medex_itself

This program builds MedEx and implements some bug fixes and improvements for the UK.

Options:

usage: crate_nlp_build_medex_itself [-h] [--medexdir MEDEXDIR] [--javac JAVAC]
                                    [--deletefirst] [--verbose]

Compile MedEx-UIMA itself (in Java)

optional arguments:
  -h, --help           show this help message and exit
  --medexdir MEDEXDIR  Root directory of MedEx installation (default:
                       /home/rudolf/dev/Medex_UIMA_1.3.6)
  --javac JAVAC        Java compiler (default: javac)
  --deletefirst        Delete existing .class files first (optional) (default:
                       False)
  --verbose, -v        Be verbose (default: False)

# Generated at 2019-10-10 10:23:26

7.6.5. CrateMedexPipeline

The following specimen script assumes specific locations for the compiled Java (CrateMedexPipeline.class); edit it as required.

Asking CrateMedexPipeline to show its command-line options:

#!/usr/bin/env bash

CRATE_NLP_JAVA_CLASS_DIR=${CRATE_SOURCE_ROOT}/crate_anon/nlp_manager/compiled_nlp_classes
MEDEX_DIR=~/dev/Medex_UIMA_1.3.6

java \
    -classpath "${CRATE_NLP_JAVA_CLASS_DIR}":"${MEDEX_DIR}/bin":"${MEDEX_DIR}/lib/*" \
    CrateMedexPipeline \
    --help \
    -v -v

exit 0

The resulting output:

usage: CrateMedexPipeline -i DIR -o DIR
                          [-h] [-v [-v]] [-lt LOGTAG]
                          [-data_ready_signal DATA_READY]
                          [-results_ready_signal RESULTS_READY]

Java front end to MedEx-UIMA natural language processor for drugs.
Takes signals on stdin, and data on disk.
Writes signals to stdout, and data to disk.

required arguments:
  -i DIR           (*) Specifies the input directory to read text from.
  -o DIR           (*) Specifies the input directory to write results to.

optional arguments:
  -h               Show this help message and exit.
  -v               Verbose (use twice to be more verbose).
  -lt LOGTAG       Use an additional tag for stderr logging.
                   Helpful in multiprocess environments.
  -data_ready_signal DATA_READY
                   Sets the 'data ready' signal that this program waits for
                   on stdin before scanning for data.
  -results_ready_signal RESULTS_READY
                   Sets the 'data ready' signal that this program sends on
                   stdout once results are ready on disk.

(*) MedEx argument

# Generated at 2019-10-10 10:23:25

Footnotes

[1]MedEx UIMA reference publication: https://www.ncbi.nlm.nih.gov/pubmed/25954575
[2]MedEx-UIMA downloads: https://sbmi.uth.edu/ccb/resources/medex.htm