14.2.13. crate_anon.common.parallel

crate_anon/common/parallel.py


Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.


Assistance functions for “embarrassingly parallel” job assignment.

crate_anon.common.parallel.is_my_job_by_hash(value: Any, tasknum: int, ntasks: int) bool[source]

“Is it my job to do this work?”

Parameters
  • value – anything that’s hashable

  • tasknum – which task number am I?

  • ntasks – how many tasks are there in total?

Returns

is it my job?

Algorithm:

  • We convert some non-integer thing into a deterministic but roughly randomly distributed integer using hash64(). That produces a signed integer, which is OK because % works nonetheless.

When we use it:

  • We use this function to parallelize for non-integer PKs.

  • This is less efficient than dividing the work up via SQL, because we have to fetch/hash something.

  • Perform this test ASAP in loops, for speed.

crate_anon.common.parallel.is_my_job_by_hash_prehashed(hashed_value: int, tasknum: int, ntasks: int) bool[source]

A version of is_my_job_by_hash() for use when you have pre-hashed the value, and ntasks is guaranteed to be >1.

Parameters
  • hashed_value – integer hashed value

  • tasknum – which task number am I?

  • ntasks – how many tasks are there in total?

Returns

is it my job?

crate_anon.common.parallel.is_my_job_by_int(value: int, tasknum: int, ntasks: int) bool[source]

“Is it my job to do this work?”

Parameters
  • value – some integer value that is fairly evenly distributed, to spread the workload

  • tasknum – which task number am I?

  • ntasks – how many tasks are there in total?

Returns

is it my job?

Algorithm:

  • if there’s only one task: yes

  • otherwise, return value % ntasks == tasknum