14.4.3. crate_anon.linkage.comparison

crate_anon/linkage/comparison.py

Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.

Comparison classes for linkage tools.

These implement the maths without regard to the kind of identifier being compared. Includes classes for full/partial matches, and a function to iterate through a bunch of comparisons as part of a Bayesian probability calculation. The hypothesis H throughout is that two people being compared are in fact the same person.

class crate_anon.linkage.comparison.AdjustLogOddsComparison(log_odds_delta: float, description: str = '?')[source]

Used to adjust log odds (via the log likelihood ratio) directly. See crate_anon.linkage.identifiers.gen_best_comparisons_unordered().

__init__(log_odds_delta: float, description: str = '?') → None[source]

property d_description: str: A description of D, the data (e.g. “match” or “mismatch”).

property p_d_given_h: float: Returns $P(D | H)$ , the probability of the observed data given the hypothesis of a match.

property p_d_given_not_h: float: Returns $P(D | \neg H)$ , the probability of the observed data given no match.

posterior_log_odds(prior_log_odds: float) → float[source]

Returns the posterior log odds, given the prior log odds. Often overriden in derived classes for a faster version.

Parameters:: prior_log_odds – prior log odds that they’re the same person
Returns:: posterior log odds, O(H | D), as above
Return type:: float

class crate_anon.linkage.comparison.CertainComparison[source]

Special comparison to denote failure, i.e. for when P(D | H) = 0, that doesn’t bother with all the calculations involved in calculating a likelihood ratio of 0.

property d_description: str: A description of D, the data (e.g. “match” or “mismatch”).

property p_d_given_h: float: Returns $P(D | H)$ , the probability of the observed data given the hypothesis of a match.

property p_d_given_not_h: float: Returns $P(D | \neg H)$ , the probability of the observed data given no match.

posterior_log_odds(prior_log_odds: float) → float[source]

Returns the posterior log odds, given the prior log odds. Often overriden in derived classes for a faster version.

Parameters:: prior_log_odds – prior log odds that they’re the same person
Returns:: posterior log odds, O(H | D), as above
Return type:: float

class crate_anon.linkage.comparison.Comparison[source]

Abstract base class for comparing two pieces of information and calculating the posterior probability of a person match.

This code must be fast, so avoid extraneous parameters.

__init__() → None[source]

property d_description: str: A description of D, the data (e.g. “match” or “mismatch”).

property p_d_given_h: float: Returns $P(D | H)$ , the probability of the observed data given the hypothesis of a match.

property p_d_given_not_h: float: Returns $P(D | \neg H)$ , the probability of the observed data given no match.

posterior_log_odds(prior_log_odds: float) → float[source]

Returns the posterior log odds, given the prior log odds. Often overriden in derived classes for a faster version.

Parameters:: prior_log_odds – prior log odds that they’re the same person
Returns:: posterior log odds, O(H | D), as above
Return type:: float

class crate_anon.linkage.comparison.DirectComparison(p_d_given_same_person: float, p_d_given_diff_person: float, d_description: str = '?')[source]

Represents a comparison where the user supplies $P(D | H)$ and $P(D | \neg H)$ directly. This is the fastest real comparison. It precalculates the log likelihood ratio for speed; that way, our comparison can be re-used fast.

__init__(p_d_given_same_person: float, p_d_given_diff_person: float, d_description: str = '?') → None[source]

Parameters:

p_d_given_same_person – $P(D | H)$
p_d_given_diff_person – $P(D | \neg H)$

property d_description: str: A description of D, the data (e.g. “match” or “mismatch”).

property p_d_given_h: float: Returns $P(D | H)$ , the probability of the observed data given the hypothesis of a match.

property p_d_given_not_h: float: Returns $P(D | \neg H)$ , the probability of the observed data given no match.

posterior_log_odds(prior_log_odds: float) → float[source]

Returns the posterior log odds, given the prior log odds. Often overriden in derived classes for a faster version.

Parameters:: prior_log_odds – prior log odds that they’re the same person
Returns:: posterior log odds, O(H | D), as above
Return type:: float

class crate_anon.linkage.comparison.FullPartialNoMatchComparison(full_match: bool, p_f: float, p_e: float, partial_match: bool, p_p: float)[source]

Represents a comparison where there can be a full or a partial match. (If there is neither a full nor a partial match, the hypothesis is rejected.)

Again, this is for clarity. Code that produces one of these could equally produce one of three DirectComparison objects, conditional upon full_match and partial_match, but this is generally much clearer.

Not currently used in main code.

__init__(full_match: bool, p_f: float, p_e: float, partial_match: bool, p_p: float) → None[source]

Parameters:

full_match – was there a full match?
p_f – $p_f = P(\text{full match} | \neg H)$
p_e – $p_e = P(\text{partial but not full match} | H)$
partial_match – was there a partial match?
p_p – $p_p = P(\text{partial match} | \neg H)$

property d_description: str: A description of D, the data (e.g. “match” or “mismatch”).

property p_d_given_h: float: Returns $P(D | H)$ , the probability of the observed data given the hypothesis of a match.

property p_d_given_not_h: float: Returns $P(D | \neg H)$ , the probability of the observed data given no match.

posterior_log_odds(prior_log_odds: float) → float[source]

Returns the posterior log odds, given the prior log odds. Often overriden in derived classes for a faster version.

Parameters:: prior_log_odds – prior log odds that they’re the same person
Returns:: posterior log odds, O(H | D), as above
Return type:: float

class crate_anon.linkage.comparison.ImpossibleComparison[source]

Special comparison to denote impossibility/failure, i.e. for when P(D | H) = 0, that doesn’t bother with all the calculations involved in calculating a likelihood ratio of 0.

property d_description: str: A description of D, the data (e.g. “match” or “mismatch”).

property p_d_given_h: float: Returns $P(D | H)$ , the probability of the observed data given the hypothesis of a match.

property p_d_given_not_h: float: Returns $P(D | \neg H)$ , the probability of the observed data given no match.

posterior_log_odds(prior_log_odds: float) → float[source]

Returns the posterior log odds, given the prior log odds. Often overriden in derived classes for a faster version.

Parameters:: prior_log_odds – prior log odds that they’re the same person
Returns:: posterior log odds, O(H | D), as above
Return type:: float

class crate_anon.linkage.comparison.MatchNoMatchComparison(match: bool, p_match_given_same_person: float, p_match_given_diff_person: float)[source]

Represents a comparison when there can be a match or not.

The purpose of this is to represent this choice CLEARLY. Code that produces one of these could equally produce one of two DirectComparison objects, conditional upon match, but this is often clearer.

Not currently used in main code.

__init__(match: bool, p_match_given_same_person: float, p_match_given_diff_person: float) → None[source]

Parameters:

match – D; is there a match?
p_match_given_same_person – If match: $P(D | H) = P(\text{match given same person}) = 1 - p_e$ . If no match: $P(D | H) = 1 - P(\text{match given same person}) = p_e$ .
p_match_given_diff_person – If match: $P(D | \neg H) = P(\text{match given different person}) = p_f$ . If no match: $P(D | \neg H) = 1 - P(\text{match given different person}) = 1 - p_f$ .

property d_description: str: A description of D, the data (e.g. “match” or “mismatch”).

property p_d_given_h: float: Returns $P(D | H)$ , the probability of the observed data given the hypothesis of a match.

property p_d_given_not_h: float: Returns $P(D | \neg H)$ , the probability of the observed data given no match.

crate_anon.linkage.comparison.bayes_compare(log_odds: float, comparisons: Iterable[Comparison | None]) → float[source]

Works through multiple comparisons and returns posterior log odds. Ignore comparisons that are None.

Parameters:

log_odds – prior log odds
comparisons – an iterable of Comparison objects

Returns:

posterior log odds

Return type:

float