14.5.30. crate_anon.nlp_manager.regex_func

crate_anon/nlp_manager/regex_func.py


Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.


Functions to assist in building regular expressions.

2019-01-01: RM notes Ragel (https://en.wikipedia.org/wiki/Ragel) for embedding actions within a regex parser. Not immediately applicable here, I don’t think, but bear in mind.

crate_anon.nlp_manager.regex_func.compile_regex(regex_str: str) Pattern[source]

Compiles a regular expression with our standard flags.

crate_anon.nlp_manager.regex_func.compile_regex_dict(regexstr_to_value_dict: Dict[str, Any]) Dict[Pattern, Any][source]

Converts a dictionary {regex_str: value} to a dictionary {compiled_regex: value}.

crate_anon.nlp_manager.regex_func.get_regex_dict_match(text: str | None, regex_to_value_dict: Dict[Pattern, Any], default: Any | None = None) Tuple[bool, Any][source]

Checks text against a set of regular expressions. Returns whether there is a match, and if there was a match, the value that was associated (in the dictionary) with the matching regex.

(Note: “match”, as usual, means “match at the beginning of the string”.)

Parameters:
  • text – text to test

  • regex_to_value_dict – dictionary mapping {compiled_regex: value}

  • default – value to return if there is no match

Returns:

matched, associated_value_or_default

Return type:

tuple

As for get_regex_dict_match(), but performs a search (find anywhere in the string) rather than a match.