14.4.2. crate_anon.linkage.bulk_hash
crate_anon/linkage/bulk_hash.py
Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).
This file is part of CRATE.
CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.
Tool to hash multiple IDs from the command line.
Test code to look at different types of digest:
import hashlib
import hmac
msg = "This is an ex-parrot!"
key = "voom"
key_bytes = str(key).encode('utf-8')
msg_bytes = str(msg).encode('utf-8')
digestmod = hashlib.sha256
hmac_obj = hmac.new(key=key_bytes, msg=msg_bytes, digestmod=digestmod)
# These are the two default kinds of digest:
print(hmac_obj.digest()) # 8-bit binary
print(hmac_obj.hexdigest()) # hexadecimal
# Hex carries 4 bits per character. There are other possibilities,
# notably:
# - Base64 with 6 bits per character;
# - Base32 with 5 bits per character.
- crate_anon.linkage.bulk_hash.bulk_hash(input_filename: str, output_filename: str, hash_method: str, key: str, keep_id: bool = True)[source]
Hash lines from one file to another.
- Parameters:
input_filename – input filename, or “-” for stdin
output_filename – output filename, or “-” for stdin
hash_method – method to use; e.g.
HMAC_SHA256
key – secret key for hasher
keep_id – produce CSV with
hash,id
pairs, rather than just lines with the hashes?
Note that the hash precedes the ID with the
keep_id
option, which works best if the ID might contain commas.