14.2.3. crate_anon.common.bugfix_flashtext

crate_anon/common/bugfix_flashtext.py


Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.


THIS FILE, however, is by another author: from https://github.com/vi3k6i5/flashtext/issues/44, by Ihor Bobak; added to Flashtext code; licensed under the MIT License as per https://github.com/vi3k6i5/flashtext/blob/master/LICENSE.

Rationale:

There is currently a bug in the method replace_keywords() in the external module flashtext in which certain characters provoke an ‘index out of range’ error when working in case-insensitive mode. This is because some non-ascii characters are larger in their lower-case form. Thanks to Ihor Bobak for this bugfix.

Edits for PyCharm linter.

class crate_anon.common.bugfix_flashtext.KeywordProcessorFixed(case_sensitive=False)[source]
replace_keywords(a_sentence: str) str[source]

Searches in the string for all keywords present in corpus. Keywords present are replaced by the clean name and a new string is returned.

Parameters

sentence (str) – Line of text where we will replace keywords

Returns

Line of text with replaced keywords

Return type

new_sentence (str)

Examples

>>> from flashtext import KeywordProcessor
>>> keyword_processor = KeywordProcessor()
>>> keyword_processor.add_keyword('Big Apple', 'New York')
>>> keyword_processor.add_keyword('Bay Area')
>>> new_sentence = keyword_processor.replace_keywords('I love Big Apple and bay area.')
>>> new_sentence
>>> 'I love New York and Bay Area.'