string¶
Initializes string package of arandomness
- Copyright:
__init__.py initializes string package of arandomness Copyright (C) 2017 Alex Hyer
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
Introduction¶
The string
subpackage of arandomness contains a couple functions that
analyze or manipulate strings in some way. That’s about as specific as this
subpackage gets. Enjoy!
autocorrect¶
The autocorrect
function takes a single query string and a list of
“correct” strings and identifies which string in the list the query most
closely matches. There are many far more robust autocorrect algorithms
written in Python than this one, but they all require a list of words
organized by their frequency in a given language. Basically, these
autocorrect algorithms are aimed at correcting words specific to a language and
are thus better suited for use in language processing software, e.g. texting
apps. This algorithm uses any list of strings and is order-agnostic. Thus,
my autocorrect
is better suited for attempting to match queries to small
lists of arbitrary strings.
To help realize this concept, I have used this function in a program that presented data in a database about programs available on a given system. The query was the user’s request and the possible strings was simply the list of program names in the database. Thus, if a user misspelled a program name, the program likely produced the proper entry.
API Documentation¶
-
arandomness.string.
autocorrect
(query, possibilities, delta=0.75)[source]¶ Attempts to figure out what possibility the query is
This autocorrect function is rather simple right now with plans for later improvement. Right now, it just attempts to finish spelling a word as much as possible, and then determines which possibility is closest to said word.
Parameters: - query (unicode) – query to attempt to complete
- possibilities (list) – list of unicodes of possible answers for query
- delta (float) – minimum delta similarity between query and any given possibility for possibility to be considered. Delta used by difflib.get_close_matches().
Returns: best guess of correct answer
Return type: unicode
Raises: AssertionError
– raised if no matches foundExample
>>> autocorrect('bowtei', ['bowtie2', 'bot']) 'bowtie2'
max_substring¶
The max_substring
function takes in a list of strings and finds the
longest substring that they all share. By default, max_substring
starts
at the beginning of each string, but it can be optionally start at a
later position as demonstrated in the docstring examples.
API Documentation¶
-
arandomness.string.
max_substring
(words, position=0, _last_letter='')[source]¶ Finds max substring shared by all strings starting at position
Parameters: - words (list) – list of unicode of all words to compare
- position (int) – starting position in each word to begin analyzing for substring
- _last_letter (unicode) – last common letter, only for use internally unless you really know what you are doing
Returns: max string common to all words
Return type: unicode
Examples
>>> max_substring(['aaaa', 'aaab', 'aaac']) 'aaa' >>> max_substring(['abbb', 'bbbb', 'cbbb'], position=1) 'bbb' >>> max_substring(['abc', 'bcd', 'cde']) ''