12.4.15. crate_anon.linkage.tests.fuzzy_id_match_tests

crate_anon/linkage/tests/fuzzy_id_match_tests.py

Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.

Unit tests.

class crate_anon.linkage.tests.fuzzy_id_match_tests.DummyTemporalIdentifierTests(methodName='runTest')[source]: Unit tests for DummyTemporalIdentifier.

class crate_anon.linkage.tests.fuzzy_id_match_tests.FuzzyLinkageTests(*args, **kwargs)[source]

Tests of the fuzzy linkage system.

__init__(*args, **kwargs) → None[source]: Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

test_exact_match() → None[source]: Test the exact-match system.

test_identifier_transformations() → None[source]: Creating hashed and plaintext JSON representation and loading an identifier back from them.

test_shortlist() → None[source]: Our shortlisting process typically permits people with completely matching or partially matching DOBs, but not those with mismatched DOBs (for efficiency). Test that.

class crate_anon.linkage.tests.fuzzy_id_match_tests.MultipleComparisonTestBase(methodName='runTest')[source]

class crate_anon.linkage.tests.fuzzy_id_match_tests.OrderedMultipleComparisonTests(methodName='runTest')[source]

test_order_correct_with_duplicate_names_1() → None[source]

Compare “A A” to “A A” in ordered fashion.

Think of this as proband A_P1, A_P2 and candidate A_C1, A_C2.

Should give a “correctly ordered” match, A_P1:A_C1 and A_C2:A_C2, with correction for P_O.

Should not treat it as an incorrectly ordered match, A_P1:A_C2 and A_P2:A_C1, and apply a different correction for P_U etc.

This might work without the “distance” sort in ComparisonInfo (it does, in fact), but that is a safety. See below for a test that does depend on that distance metric.

test_order_correct_with_duplicate_names_2() → None[source]

Compare “A B” to “B B” in ordered fashion.

We want this to give A_P1:B_P1 (mismatch) and B_P2:B_C2 (ordered match).

It should not give A_P1:B_P2 (mismatch) and B_P2:B_C1 (unordered match).

This does not work without the “distance” part of the sort in ComparisonInfo.

class crate_anon.linkage.tests.fuzzy_id_match_tests.TestCondition(cfg: MatchConfig, person_a: Person, person_b: Person, should_match: bool, debug: bool = True)[source]

Two representations of a person and whether they should match.

__init__(cfg: MatchConfig, person_a: Person, person_b: Person, should_match: bool, debug: bool = True) → None[source]

Parameters:

cfg – the main MatchConfig object
person_a – one representation of a person
person_b – another representation of a person
should_match – should they be treated as the same person?
debug – be verbose?

check_comparison_as_expected() → None[source]: Asserts that both the raw and hashed versions match, or don’t match, according to self.should_match.

log_odds_same_hashed() → float[source]

Checks whether the hashed versions match.

Returns:: the log odds that they are the same person
Return type:: float

log_odds_same_plaintext() → float[source]

Checks whether the plaintext person objects match.

Returns:: the log odds that they are the same person
Return type:: float

matches_hashed() → Tuple[bool, float][source]

Do the raw versions match, by threshold?

Returns:: is there a match?
Return type:: bool

matches_plaintext() → Tuple[bool, float][source]

Do the plaintext versions match, by threshold?

Returns:: (matches, log_odds)
Return type:: tuple

class crate_anon.linkage.tests.fuzzy_id_match_tests.UnorderedMultipleComparisonTests(methodName='runTest')[source]

test_with_incomparable_identifiers() → None[source]

Use identifiers that aren’t allowed to be compared, e.g. names with non-overlapping timestamps. This will give a comparison that is None, and make the code coverage checks happy.

pip install pytest-cov
pytest --cov --cov-report html

crate_anon.linkage.tests.fuzzy_id_match_tests.mk_test_config(**kwargs) → MatchConfig[source]: Create a dummy config, using dummy name/postcode info.