14.2.15. crate_anon.common.regex_helpers

crate_anon/common/regex_helpers.py


Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.


Constants and helper functionsfor use with regexes.

crate_anon.common.regex_helpers.anchor(x: str, start: bool = True, end: bool = True) str[source]

Anchor a regex at the start and/or end.

crate_anon.common.regex_helpers.assert_alphabetical(x: Union[str, Iterable[str]]) None[source]

Asserts that the string is not empty and contains only alphabetical characters.

crate_anon.common.regex_helpers.at_start_wb(regex_str: str) str[source]

Returns a version of the regex starting with a word boundary.

Beware, though; e.g. “3kg” is reasonable, and this does NOT have a word boundary in.

crate_anon.common.regex_helpers.at_wb_start_end(regex_str: str) str[source]

Returns a version of the regex starting and ending with a word boundary.

Caution using this. Digits do not end a word, so “mm3” will not match if your “mm” group ends in a word boundary.

crate_anon.common.regex_helpers.escape_literal_for_regex_allowing_flexible_whitespace(s: str) str[source]

Escapes literal characters, but creating a regex that allows flexible whitespace (e.g. double space) for every bit of whitespace in the original.

For example, maps Hello there. to Hello\s+there\.

crate_anon.common.regex_helpers.escape_literal_for_regex_giving_charlist(s: str) List[str][source]

Escape any regex characters. Returns a list of characters or escaped characters.

Start with \ -> \\; this should be the first replacement in REGEX_METACHARS.

crate_anon.common.regex_helpers.escape_literal_string_for_regex(s: str) str[source]

Escape any regex characters. Returns a string.

For example, maps Hello there. to Hello\ there\.

Start with \ -> \\; this should be the first replacement in REGEX_METACHARS.

crate_anon.common.regex_helpers.first_n_characters_required(x: str, n: int) str[source]

Returns a regex string that requires the first n characters, and then allows the rest as optional as long as they are in sequence.

Parameters
  • x – String

  • n – Minimum number of characters required at the start

crate_anon.common.regex_helpers.named_capture_group(regex_str: str, name: str) str[source]

Wraps the string in an named capture group, (?P<name>...) The P is for Python extensions; https://docs.python.org/3/howto/regex.html#non-capturing-and-named-groups

crate_anon.common.regex_helpers.noncapture_group(regex_str: str) str[source]

Wraps the string in a non-capture group, (?: ... )

crate_anon.common.regex_helpers.optional_named_capture_group(regex_str: str, name: str) str[source]

As for named_capture_group(), but optional.

crate_anon.common.regex_helpers.optional_noncapture_group(regex_str: str) str[source]

Wraps the string in an optional non-capture group, (?: ... )?

crate_anon.common.regex_helpers.regex_or(*regex_strings: str, wrap_each_in_noncapture_group: bool = False, wrap_result_in_noncapture_group: bool = False) str[source]

Returns a regex representing an “or” join of the components.

Parameters
  • regex_strings – The strings to join with |.

  • wrap_each_in_noncapture_group – Convert each component into (?:component) before joining?

  • wrap_result_in_noncapture_group – Convert the final result into (?:result)?