15. Things to do

Todo

Check minimal anonymiser config example works.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/crateanon/checkouts/latest/docs/source/anonymisation/anon_config.rst, line 1373.)

Todo

Check minimal data dictionary example works.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/crateanon/checkouts/latest/docs/source/anonymisation/data_dictionary.rst, line 497.)

Todo

also: use celery beat to refresh regularly +/- trigger withdrawal of consent if consent mode changed; http://docs.celeryproject.org/en/latest/userguide/periodic-tasks.html

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/crateanon/envs/latest/lib/python3.6/site-packages/crate_anon-0.18.92-py3.6.egg/crate_anon/crateweb/consent/models.py:docstring of crate_anon.crateweb.consent.models.ConsentMode.refresh_from_primary_clinical_record, line 18.)

Todo

also make automatic opt-out list

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/crateanon/envs/latest/lib/python3.6/site-packages/crate_anon-0.18.92-py3.6.egg/crate_anon/crateweb/consent/models.py:docstring of crate_anon.crateweb.consent.models.ConsentMode.refresh_from_primary_clinical_record, line 22.)

Todo

Might it be better to feed the resulting query back into the main Query system, allowing users to turn columns on/off, etc.?

At present it forces query_id to None and this is detected by query_result.html.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/crateanon/envs/latest/lib/python3.6/site-packages/crate_anon-0.18.92-py3.6.egg/crate_anon/crateweb/research/views.py:docstring of crate_anon.crateweb.research.views.pe_one_table, line 14.)

Todo

cloud_parser: handle new tabular_schema info from server

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/crateanon/envs/latest/lib/python3.6/site-packages/crate_anon-0.18.92-py3.6.egg/crate_anon/nlp_manager/cloud_parser.py:docstring of crate_anon.nlp_manager.cloud_parser, line 24.)

Todo

preprocess_rio: specific supposed PK failing (non-unique) on incremental

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/crateanon/envs/latest/lib/python3.6/site-packages/crate_anon-0.18.92-py3.6.egg/crate_anon/preprocess/preprocess_rio.py:docstring of crate_anon.preprocess.preprocess_rio, line 31.)

Todo

preprocess_rio: Imperfectly tested: Audit_Created_Date, Audit_Updated_Date … some data for Audit_Created_Date, but incomplete audit table

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/crateanon/envs/latest/lib/python3.6/site-packages/crate_anon-0.18.92-py3.6.egg/crate_anon/preprocess/preprocess_rio.py:docstring of crate_anon.preprocess.preprocess_rio, line 34.)

Todo

preprocess_rio: Similarly, all cross-checks to RCEP output (currently limited by data availability)

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/crateanon/envs/latest/lib/python3.6/site-packages/crate_anon-0.18.92-py3.6.egg/crate_anon/preprocess/preprocess_rio.py:docstring of crate_anon.preprocess.preprocess_rio, line 38.)

Todo

Upgrade to django-pyodbc-azure==2.0.6.1, which will require Django 2.0.6, pyodbc 3.0+.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/crateanon/checkouts/latest/docs/source/installation/database_drivers.rst, line 488.)

Todo

RNC to ask FS for explanation of record_truncated_values, i.e. when should it be used?

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/crateanon/checkouts/latest/docs/source/nlp/nlp_config.rst, line 290.)

Todo

explain

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/crateanon/checkouts/latest/docs/source/nlp/nlp_config.rst, line 705.)

Todo

NLPRP: consider supra-document processing requirements

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/crateanon/checkouts/latest/docs/source/nlp/nlprp.rst, line 1465.)

Todo

is this one currently unused? Looks like it.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/crateanon/checkouts/latest/docs/source/website_config/web_config_file.rst, line 1020.)

Todo

archive: consider Windows authentication to Django

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/crateanon/checkouts/latest/docs/source/website_using/archive.rst, line 267.)

Todo

optional launch page for archive (e.g. allowing JSON POST for patient ID)

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/crateanon/checkouts/latest/docs/source/website_using/archive.rst, line 268.)

Todo

add screenshots

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/crateanon/checkouts/latest/docs/source/website_using/research_queries.rst, line 39.)

  • fix bug (reported by JL 6/11/2018) where the RiO preprocessor tries to put the same column into the same index more than once (see RNC email 6/11/2018)
  • BENCHMARK name blacklisting (with forenames + surnames – English words – eponyms): speed, precision, recall. Share results with MB.
  • option to filter out all free text automatically (as part of full anonymisation)
  • Personal configurable highlight colours (with default set if none configured)? Or just more colours? Look at a standard highlighter pack – e.g.
  • More of JL’s ideas from 8 Jan 2018:
    • A series of functions like fn_age(rid), fn_is_alive(rid), fn_open_referral(rid)
    • Friendly names for the top 10 most used tables, which might appear at the top of the tables listing.
  • When the Windows service stops, it is still failing to kill child processes. See crate_anon/tools/winservice.py.
  • NLP protocol revision whereby processors describe their output fields, saying which SQL dialect they’re using; and (automatic) implementation for our built-in NLP.
  • There’s some placeholder junk in consent_lookup_result.html.
  • Option to add MRID to every table, to make cross-database queries simpler?
    • Column would have to support NULL values; not all patients with a PID (e.g. local identifier) will have a MPID (e.g. national identifier).
    • Would not require sequencing of tables during anonymisation, since the MRID should be found via crate_anon.anonymise.patient.Patient._build_scrubber().
    • Would involve modifying crate_anon.anonymise.anonymise.process_table() to call crate_anon.anonymise.patient.Patient.get_mrid(), possibly where it checks for a column being the primary PID, and adding an extra row there subject to a flag.
    • The flag relates to the whole database rather than a specific row, so it should probably be in the config file – e.g. named add_mrid_wherever_rid_added, within the [main] section, and the “Output fields and formatting” subsection.
    • Might also need an option to index that field automatically (true by default).