14.4.3. crate_anon.linkage.comparison
crate_anon/linkage/comparison.py
Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).
This file is part of CRATE.
CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.
Comparison classes for linkage tools.
These implement the maths without regard to the kind of identifier being compared. Includes classes for full/partial matches, and a function to iterate through a bunch of comparisons as part of a Bayesian probability calculation. The hypothesis H throughout is that two people being compared are in fact the same person.
- class crate_anon.linkage.comparison.AdjustLogOddsComparison(log_odds_delta: float, description: str = '?')[source]
Used to adjust log odds (via the log likelihood ratio) directly. See
crate_anon.linkage.identifiers.gen_best_comparisons_unordered()
.- property d_description: str
A description of D, the data (e.g. “match” or “mismatch”).
- property p_d_given_h: float
Returns
, the probability of the observed data given the hypothesis of a match.
- property p_d_given_not_h: float
Returns
, the probability of the observed data given no match.
- posterior_log_odds(prior_log_odds: float) float [source]
Returns the posterior log odds, given the prior log odds. Often overriden in derived classes for a faster version.
- Parameters:
prior_log_odds – prior log odds that they’re the same person
- Returns:
posterior log odds, O(H | D), as above
- Return type:
float
- class crate_anon.linkage.comparison.CertainComparison[source]
Special comparison to denote failure, i.e. for when P(D | H) = 0, that doesn’t bother with all the calculations involved in calculating a likelihood ratio of 0.
- property d_description: str
A description of D, the data (e.g. “match” or “mismatch”).
- property p_d_given_h: float
Returns
, the probability of the observed data given the hypothesis of a match.
- property p_d_given_not_h: float
Returns
, the probability of the observed data given no match.
- posterior_log_odds(prior_log_odds: float) float [source]
Returns the posterior log odds, given the prior log odds. Often overriden in derived classes for a faster version.
- Parameters:
prior_log_odds – prior log odds that they’re the same person
- Returns:
posterior log odds, O(H | D), as above
- Return type:
float
- class crate_anon.linkage.comparison.Comparison[source]
Abstract base class for comparing two pieces of information and calculating the posterior probability of a person match.
This code must be fast, so avoid extraneous parameters.
- property d_description: str
A description of D, the data (e.g. “match” or “mismatch”).
- property p_d_given_h: float
Returns
, the probability of the observed data given the hypothesis of a match.
- property p_d_given_not_h: float
Returns
, the probability of the observed data given no match.
- posterior_log_odds(prior_log_odds: float) float [source]
Returns the posterior log odds, given the prior log odds. Often overriden in derived classes for a faster version.
- Parameters:
prior_log_odds – prior log odds that they’re the same person
- Returns:
posterior log odds, O(H | D), as above
- Return type:
float
- class crate_anon.linkage.comparison.DirectComparison(p_d_given_same_person: float, p_d_given_diff_person: float, d_description: str = '?')[source]
Represents a comparison where the user supplies
and
directly. This is the fastest real comparison. It precalculates the log likelihood ratio for speed; that way, our comparison can be re-used fast.
- __init__(p_d_given_same_person: float, p_d_given_diff_person: float, d_description: str = '?') None [source]
- Parameters:
p_d_given_same_person –
p_d_given_diff_person –
- property d_description: str
A description of D, the data (e.g. “match” or “mismatch”).
- property p_d_given_h: float
Returns
, the probability of the observed data given the hypothesis of a match.
- property p_d_given_not_h: float
Returns
, the probability of the observed data given no match.
- posterior_log_odds(prior_log_odds: float) float [source]
Returns the posterior log odds, given the prior log odds. Often overriden in derived classes for a faster version.
- Parameters:
prior_log_odds – prior log odds that they’re the same person
- Returns:
posterior log odds, O(H | D), as above
- Return type:
float
- class crate_anon.linkage.comparison.FullPartialNoMatchComparison(full_match: bool, p_f: float, p_e: float, partial_match: bool, p_p: float)[source]
Represents a comparison where there can be a full or a partial match. (If there is neither a full nor a partial match, the hypothesis is rejected.)
Again, this is for clarity. Code that produces one of these could equally produce one of three
DirectComparison
objects, conditional uponfull_match
andpartial_match
, but this is generally much clearer.Not currently used in main code.
- __init__(full_match: bool, p_f: float, p_e: float, partial_match: bool, p_p: float) None [source]
- Parameters:
full_match – was there a full match?
p_f –
p_e –
partial_match – was there a partial match?
p_p –
- property d_description: str
A description of D, the data (e.g. “match” or “mismatch”).
- property p_d_given_h: float
Returns
, the probability of the observed data given the hypothesis of a match.
- property p_d_given_not_h: float
Returns
, the probability of the observed data given no match.
- posterior_log_odds(prior_log_odds: float) float [source]
Returns the posterior log odds, given the prior log odds. Often overriden in derived classes for a faster version.
- Parameters:
prior_log_odds – prior log odds that they’re the same person
- Returns:
posterior log odds, O(H | D), as above
- Return type:
float
- class crate_anon.linkage.comparison.ImpossibleComparison[source]
Special comparison to denote impossibility/failure, i.e. for when P(D | H) = 0, that doesn’t bother with all the calculations involved in calculating a likelihood ratio of 0.
- property d_description: str
A description of D, the data (e.g. “match” or “mismatch”).
- property p_d_given_h: float
Returns
, the probability of the observed data given the hypothesis of a match.
- property p_d_given_not_h: float
Returns
, the probability of the observed data given no match.
- posterior_log_odds(prior_log_odds: float) float [source]
Returns the posterior log odds, given the prior log odds. Often overriden in derived classes for a faster version.
- Parameters:
prior_log_odds – prior log odds that they’re the same person
- Returns:
posterior log odds, O(H | D), as above
- Return type:
float
- class crate_anon.linkage.comparison.MatchNoMatchComparison(match: bool, p_match_given_same_person: float, p_match_given_diff_person: float)[source]
Represents a comparison when there can be a match or not.
The purpose of this is to represent this choice CLEARLY. Code that produces one of these could equally produce one of two
DirectComparison
objects, conditional uponmatch
, but this is often clearer.Not currently used in main code.
- __init__(match: bool, p_match_given_same_person: float, p_match_given_diff_person: float) None [source]
- Parameters:
match – D; is there a match?
p_match_given_same_person – If match:
. If no match:
.
p_match_given_diff_person – If match:
. If no match:
.
- property d_description: str
A description of D, the data (e.g. “match” or “mismatch”).
- property p_d_given_h: float
Returns
, the probability of the observed data given the hypothesis of a match.
- property p_d_given_not_h: float
Returns
, the probability of the observed data given no match.
- crate_anon.linkage.comparison.bayes_compare(log_odds: float, comparisons: Iterable[Comparison | None]) float [source]
Works through multiple comparisons and returns posterior log odds. Ignore comparisons that are
None
.- Parameters:
log_odds – prior log odds
comparisons – an iterable of
Comparison
objects
- Returns:
posterior log odds
- Return type:
float