14.4.14. crate_anon.linkage.tests.fuzzy_id_match_tests

crate_anon/linkage/tests/fuzzy_id_match_tests.py


Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.


Unit tests.

class crate_anon.linkage.tests.fuzzy_id_match_tests.DummyTemporalIdentifierTests(methodName='runTest')[source]

Unit tests for DummyTemporalIdentifier.

class crate_anon.linkage.tests.fuzzy_id_match_tests.FuzzyLinkageTests(*args, **kwargs)[source]

Tests of the fuzzy linkage system.

__init__(*args, **kwargs) None[source]

Create an instance of the class that will use the named test method when executed. Raises a ValueError if the instance does not have a method with the specified name.

test_exact_match() None[source]

Test the exact-match system.

test_identifier_transformations() None[source]

Creating hashed and plaintext JSON representation and loading an identifier back from them.

test_shortlist() None[source]

Our shortlisting process typically permits people with completely matching or partially matching DOBs, but not those with mismatched DOBs (for efficiency). Test that.

class crate_anon.linkage.tests.fuzzy_id_match_tests.MultipleComparisonTestBase(methodName='runTest')[source]
class crate_anon.linkage.tests.fuzzy_id_match_tests.OrderedMultipleComparisonTests(methodName='runTest')[source]
test_order_correct_with_duplicate_names_1() None[source]

Compare “A A” to “A A” in ordered fashion.

Think of this as proband A_P1, A_P2 and candidate A_C1, A_C2.

Should give a “correctly ordered” match, A_P1:A_C1 and A_C2:A_C2, with correction for P_O.

Should not treat it as an incorrectly ordered match, A_P1:A_C2 and A_P2:A_C1, and apply a different correction for P_U etc.

This might work without the “distance” sort in ComparisonInfo (it does, in fact), but that is a safety. See below for a test that does depend on that distance metric.

test_order_correct_with_duplicate_names_2() None[source]

Compare “A B” to “B B” in ordered fashion.

We want this to give A_P1:B_P1 (mismatch) and B_P2:B_C2 (ordered match).

It should not give A_P1:B_P2 (mismatch) and B_P2:B_C1 (unordered match).

This does not work without the “distance” part of the sort in ComparisonInfo.

class crate_anon.linkage.tests.fuzzy_id_match_tests.TestCondition(cfg: crate_anon.linkage.matchconfig.MatchConfig, person_a: crate_anon.linkage.person.Person, person_b: crate_anon.linkage.person.Person, should_match: bool, debug: bool = True)[source]

Two representations of a person and whether they should match.

__init__(cfg: crate_anon.linkage.matchconfig.MatchConfig, person_a: crate_anon.linkage.person.Person, person_b: crate_anon.linkage.person.Person, should_match: bool, debug: bool = True) None[source]
Parameters
  • cfg – the main MatchConfig object

  • person_a – one representation of a person

  • person_b – another representation of a person

  • should_match – should they be treated as the same person?

  • debug – be verbose?

check_comparison_as_expected() None[source]

Asserts that both the raw and hashed versions match, or don’t match, according to self.should_match.

log_odds_same_hashed() float[source]

Checks whether the hashed versions match.

Returns

the log odds that they are the same person

Return type

float

log_odds_same_plaintext() float[source]

Checks whether the plaintext person objects match.

Returns

the log odds that they are the same person

Return type

float

matches_hashed() Tuple[bool, float][source]

Do the raw versions match, by threshold?

Returns

is there a match?

Return type

bool

matches_plaintext() Tuple[bool, float][source]

Do the plaintext versions match, by threshold?

Returns

(matches, log_odds)

Return type

tuple

class crate_anon.linkage.tests.fuzzy_id_match_tests.UnorderedMultipleComparisonTests(methodName='runTest')[source]
test_with_incomparable_identifiers() None[source]

Use identifiers that aren’t allowed to be compared, e.g. names with non-overlapping timestamps. This will give a comparison that is None, and make the code coverage checks happy.

pip install pytest-cov
pytest --cov --cov-report html
crate_anon.linkage.tests.fuzzy_id_match_tests.mk_test_config(**kwargs) crate_anon.linkage.matchconfig.MatchConfig[source]

Create a dummy config, using dummy name/postcode info.