7.7. MedEx-UIMA drug NLP

MedEx-UIMA NLP (for drugs and drug doses) 1 is supported via an external program. MedEx-UIMA runs in Java. CRATE supplies an external front-end Java program (CrateMedexPipeline.java) that loads the MedEx app, sends text to it (via a temporary disk file, for reasons relating to MedEx-UIMA’s internal workings), and returns answers.

7.7.1. Installation

  • Download it from https://sbmi.uth.edu/ccb/resources/medex.htm

  • CRATE provides Java code (see CrateMedexPipeline.java) to talk to MedEx-UIMA. Use crate_nlp_build_medex_java_interface to build this before you use it for the first time.

  • CRATE fixes some bugs in MedEx-UIMA. Run crate_nlp_build_medex_itself to rebuild MedEx and fix them.

7.7.2. Output columns

In addition to the standard NLP output columns, the CRATE MedEx processor produces these output columns:

Column

SQL type

Description

sentence_index

INT

One-based index of sentence in text

sentence_text

TEXT

Text recognized as a sentence by MedEx

drug

TEXT

Drug name, as in the text

drug_startpos

INT

Start position of drug

drug_endpos

INT

End position of drug

brand

TEXT

Drug brand name (?lookup ?only if given)

brand_startpos

INT

Start position of brand

brand_endpos

INT

End position of brand

form

VARCHAR(255)

Drug/dose form (e.g. ‘tablet’)

form_startpos

INT

Start position of form

form_endpos

INT

End position of form

strength

VARCHAR(50)

Strength (e.g. ‘75mg’)

strength_startpos

INT

Start position of strength

strength_endpos

INT

End position of strength

dose_amount

VARCHAR(50)

Dose amount (e.g. ‘2 tablets’)

dose_amount_startpos

INT

Start position of dose_amount

dose_amount_endpos

INT

End position of dose_amount

route

VARCHAR(50)

Route (e.g. ‘by mouth’)

route_startpos

INT

Start position of route

route_endpos

INT

End position of route

frequency

VARCHAR(50)

frequency (e.g. ‘by mouth’)

frequency_startpos

INT

Start position of frequency

frequency_endpos

INT

End position of frequency

frequency_timex3

VARCHAR(50)

Normalized frequency in TIMEX3 format (e.g. ‘R1P12H’)

duration

VARCHAR(50)

Duration (e.g. ‘for 10 days’)

duration_startpos

INT

Start position of duration

duration_endpos

INT

End position of duration

necessity

VARCHAR(50)

Necessity (e.g. ‘prn’)

necessity_startpos

INT

Start position of necessity

necessity_endpos

INT

End position of necessity

necessity

VARCHAR(50)

Necessity (e.g. ‘prn’)

necessity_startpos

INT

Start position of necessity

necessity_endpos

INT

End position of necessity

umls_code

VARCHAR(8)

UMLS CUI

rx_code

INT

RxNorm RxCUI for drug

generic_code

INT

RxNorm RxCUI for generic name

generic_name

TEXT

Generic drug name (associated with RxCUI code)

Start positions are the zero-based index of the first relevant character. End positions are the zero-based index of one beyond the last relevant character.

7.7.3. crate_nlp_build_medex_java_interface

Options:

USAGE: crate_nlp_build_medex_java_interface [-h] [--builddir BUILDDIR]
                                            [--medexdir MEDEXDIR]
                                            [--java JAVA] [--javac JAVAC]
                                            [--verbose] [--launch]

Compile Java classes for CRATE's interface to MedEx-UIMA

OPTIONS:
  -h, --help           show this help message and exit
  --builddir BUILDDIR  Output directory for compiled .class files (default:
                       /path/to/crate/crate_anon/nlp_manager/compiled_nlp_clas
                       ses)
  --medexdir MEDEXDIR  Root directory of MedEx installation (default:
                       /path/to/Medex/installation)
  --java JAVA          Java executable (default: java)
  --javac JAVAC        Java compiler (default: javac)
  --verbose, -v        Be verbose (use twice for extra verbosity) (default: 0)
  --launch             Launch script in demonstration mode (having previously
                       compiled it) (default: False)

7.7.4. crate_nlp_build_medex_itself

This program builds MedEx and implements some bug fixes and improvements for the UK.

Options:

USAGE: crate_nlp_build_medex_itself [-h] [--medexdir MEDEXDIR] [--javac JAVAC]
                                    [--deletefirst] [--verbose]

Compile MedEx-UIMA itself (in Java)

OPTIONS:
  -h, --help           show this help message and exit
  --medexdir MEDEXDIR  Root directory of MedEx installation (default:
                       /path/to/Medex/installation)
  --javac JAVAC        Java compiler (default: javac)
  --deletefirst        Delete existing .class files first (optional) (default:
                       False)
  --verbose, -v        Be verbose (default: False)

7.7.5. CrateMedexPipeline

The following specimen script assumes specific locations for the compiled Java (CrateMedexPipeline.class); edit it as required.

Asking CrateMedexPipeline to show its command-line options:

crate_show_crate_medex_pipeline_options

The resulting output:

usage: CrateMedexPipeline -i DIR -o DIR
                          [-h] [-v [-v]] [-lt LOGTAG]
                          [-data_ready_signal DATA_READY]
                          [-results_ready_signal RESULTS_READY]

Java front end to MedEx-UIMA natural language processor for drugs.
Takes signals on stdin, and data on disk.
Writes signals to stdout, and data to disk.

required arguments:
  -i DIR           (*) Specifies the input directory to read text from.
  -o DIR           (*) Specifies the input directory to write results to.

optional arguments:
  --help           Show this help message and exit.
  -h

  -v               Verbose (use twice to be more verbose).

  -lt LOGTAG       Use an additional tag for stderr logging.
                   Helpful in multiprocess environments.

  -data_ready_signal DATA_READY
                   Sets the 'data ready' signal that this program waits for
                   on stdin before scanning for data.

  -results_ready_signal RESULTS_READY
                   Sets the 'data ready' signal that this program sends on
                   stdout once results are ready on disk.

(*) MedEx argument

Footnotes

1

MedEx UIMA reference publication: https://www.ncbi.nlm.nih.gov/pubmed/25954575

2

MedEx-UIMA downloads: https://sbmi.uth.edu/ccb/resources/medex.htm