14.2.19. crate_anon.common.stringfunc
crate_anon/common/stringfunc.py
Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).
This file is part of CRATE.
CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.
Simple string functions.
- crate_anon.common.stringfunc.compress_docstring(docstring: str) str [source]
Splats a docstring onto a single line, compressing all whitespace.
- crate_anon.common.stringfunc.does_text_contain_word_chars(text: str) bool [source]
Is a string worth treating as interesting text – does it contain “word” characters?
- crate_anon.common.stringfunc.get_digit_string_from_vaguely_numeric_string(s: str) str [source]
Strips non-digit characters from a string.
For example, converts
"(01223) 123456"
to"01223123456"
.
- crate_anon.common.stringfunc.get_docstring(cls: Type) str [source]
Fetches a docstring from a class.
- crate_anon.common.stringfunc.get_spec_match_regex(spec: str) Pattern [source]
Returns a compiled, case-insensitive regular expression representing a shell-style pattern (using
*
,?
and similar wildcards; see https://docs.python.org/3.5/library/fnmatch.html).- Parameters:
spec – the pattern to pass to
fnmatch
, e.g."patient_addr*"
.- Returns:
the compiled regular expression
- crate_anon.common.stringfunc.make_twocol_table(colnames: List[str], rows: List[List[str]], max_table_width: int = 79, padding_width: int = 1, vertical_lines: bool = True, rewrap_right_col: bool = True) str [source]
Formats a two-column table. Tries not to split/wrap the left-hand column, but resizes the right-hand column.
- crate_anon.common.stringfunc.reduce_to_alphanumeric(s: str) str [source]
Strips non-alphanumeric characters from a string.
For example, converts
"PE12 3AB"
to"PE12 3AB"
.
- crate_anon.common.stringfunc.relevant_for_nlp(x: str | None) bool [source]
Does this string contain content that’s relevant for NLP? We want to eliminate
None
values, and strings that do not contain relevant content. A string containing only whitespace is not relevant.
- crate_anon.common.stringfunc.remove_whitespace(s: str) str [source]
Removes whitespace from a string.
- crate_anon.common.stringfunc.trim_docstring(docstring: str) str [source]
Removes initial/terminal blank lines and leading whitespace from docstrings.
This is the PEP257 implementation (https://peps.python.org/pep-0257/), except with
sys.maxint
replaced bysys.maxsize
(see https://docs.python.org/3.1/whatsnew/3.0.html#integers).Demonstration:
from crate_anon.common.stringfunc import trim_docstring print(trim_docstring.__doc__) print(trim_docstring(trim_docstring.__doc__))
- crate_anon.common.stringfunc.uprint(*objects: ~typing.Any, sep: str = ' ', end: str = '\n', file: ~typing.TextIO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>) None [source]
Prints strings to outputs that support UTF-8 encoding, but also to those that do not (e.g. Windows stdout, sometimes).
- Parameters:
*objects – things to print
sep – separator between those objects
end – print this at the end
file – file-like object to print to
Examples:
Linux, Python 3.6.8 console:
sys.stdout.encoding == "UTF-8"
Windows, Python 3.7.4 console:
sys.stdout.encoding == "utf-8"
Windows, Python 3.7.4, from script:
sys.stdout.encoding == "cp1252"