14.4.2. crate_anon.linkage.bulk_hash

crate_anon/linkage/bulk_hash.py


Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.


Tool to hash multiple IDs from the command line.

Test code to look at different types of digest:

import hashlib
import hmac

msg = "This is an ex-parrot!"
key = "voom"

key_bytes = str(key).encode('utf-8')
msg_bytes = str(msg).encode('utf-8')
digestmod = hashlib.sha256
hmac_obj = hmac.new(key=key_bytes, msg=msg_bytes, digestmod=digestmod)

# These are the two default kinds of digest:
print(hmac_obj.digest())  # 8-bit binary
print(hmac_obj.hexdigest())  # hexadecimal

# Hex carries 4 bits per character. There are other possibilities,
# notably:
# - Base64 with 6 bits per character;
# - Base32 with 5 bits per character.
crate_anon.linkage.bulk_hash.bulk_hash(input_filename: str, output_filename: str, hash_method: str, key: str, keep_id: bool = True)[source]

Hash lines from one file to another.

Parameters:
  • input_filename – input filename, or “-” for stdin

  • output_filename – output filename, or “-” for stdin

  • hash_method – method to use; e.g. HMAC_SHA256

  • key – secret key for hasher

  • keep_id – produce CSV with hash,id pairs, rather than just lines with the hashes?

Note that the hash precedes the ID with the keep_id option, which works best if the ID might contain commas.

crate_anon.linkage.bulk_hash.main() None[source]

Command-line entry point.