14.2.3. crate_anon.common.bugfix_flashtext
crate_anon/common/bugfix_flashtext.py
Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).
This file is part of CRATE.
CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.
THIS FILE, however, is by another author: from https://github.com/vi3k6i5/flashtext/issues/44, by Ihor Bobak; added to Flashtext code; licensed under the MIT License as per https://github.com/vi3k6i5/flashtext/blob/master/LICENSE.
Rationale:
There is currently a bug in the method replace_keywords()
in the external
module flashtext
in which certain characters provoke an ‘index out of
range’ error when working in case-insensitive mode. This is because some
non-ascii characters are larger in their lower-case form. Thanks to Ihor Bobak
for this bugfix.
Edits for PyCharm linter.
- class crate_anon.common.bugfix_flashtext.KeywordProcessorFixed(case_sensitive=False)[source]
- replace_keywords(a_sentence: str) str [source]
Searches in the string for all keywords present in corpus. Keywords present are replaced by the clean name and a new string is returned.
- Parameters:
sentence (str) – Line of text where we will replace keywords
- Returns:
Line of text with replaced keywords
- Return type:
new_sentence (str)
Examples
>>> from flashtext import KeywordProcessor >>> keyword_processor = KeywordProcessor() >>> keyword_processor.add_keyword('Big Apple', 'New York') >>> keyword_processor.add_keyword('Bay Area') >>> new_sentence = keyword_processor.replace_keywords('I love Big Apple and bay area.') >>> new_sentence >>> 'I love New York and Bay Area.'