14.2.16. crate_anon.common.regex_helpers
crate_anon/common/regex_helpers.py
Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).
This file is part of CRATE.
CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.
Constants and helper functionsfor use with regexes.
- crate_anon.common.regex_helpers.anchor(x: str, start: bool = True, end: bool = True) str [source]
Anchor a regex at the start and/or end.
- crate_anon.common.regex_helpers.assert_alphabetical(x: str | Iterable[str]) None [source]
Asserts that the string is not empty and contains only alphabetical characters.
- crate_anon.common.regex_helpers.at_start_wb(regex_str: str) str [source]
Returns a version of the regex starting with a word boundary.
Beware, though; e.g. “3kg” is reasonable, and this does NOT have a word boundary in.
- crate_anon.common.regex_helpers.at_wb_start_end(regex_str: str) str [source]
Returns a version of the regex starting and ending with a word boundary.
Caution using this. Digits do not end a word, so “mm3” will not match if your “mm” group ends in a word boundary.
- crate_anon.common.regex_helpers.escape_literal_for_regex_allowing_flexible_whitespace(s: str) str [source]
Escapes literal characters, but creating a regex that allows flexible whitespace (e.g. double space) for every bit of whitespace in the original.
For example, maps
Hello there.
toHello\s+there\.
- crate_anon.common.regex_helpers.escape_literal_for_regex_giving_charlist(s: str) List[str] [source]
Escape any regex characters. Returns a list of characters or escaped characters.
Start with
\
->\\
; this should be the first replacement inREGEX_METACHARS
.
- crate_anon.common.regex_helpers.escape_literal_string_for_regex(s: str) str [source]
Escape any regex characters. Returns a string.
For example, maps
Hello there.
toHello\ there\.
Start with
\
->\\
; this should be the first replacement inREGEX_METACHARS
.
- crate_anon.common.regex_helpers.first_n_characters_required(x: str, n: int) str [source]
Returns a regex string that requires the first n characters, and then allows the rest as optional as long as they are in sequence.
- Parameters:
x – String
n – Minimum number of characters required at the start
- crate_anon.common.regex_helpers.named_capture_group(regex_str: str, name: str) str [source]
Wraps the string in an named capture group,
(?P<name>...)
The P is for Python extensions; https://docs.python.org/3/howto/regex.html#non-capturing-and-named-groups
- crate_anon.common.regex_helpers.noncapture_group(regex_str: str) str [source]
Wraps the string in a non-capture group,
(?: ... )
- crate_anon.common.regex_helpers.optional_named_capture_group(regex_str: str, name: str) str [source]
As for
named_capture_group()
, but optional.
- crate_anon.common.regex_helpers.optional_noncapture_group(regex_str: str) str [source]
Wraps the string in an optional non-capture group,
(?: ... )?
- crate_anon.common.regex_helpers.regex_or(*regex_strings: str, wrap_each_in_noncapture_group: bool = False, wrap_result_in_noncapture_group: bool = False) str [source]
Returns a regex representing an “or” join of the components.
- Parameters:
regex_strings – The strings to join with
|
.wrap_each_in_noncapture_group – Convert each
component
into(?:component)
before joining?wrap_result_in_noncapture_group – Convert the final
result
into(?:result)
?