string

Initializes string package of arandomness

Copyright:

__init__.py initializes string package of arandomness Copyright (C) 2017 Alex Hyer

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

Introduction

The string subpackage of arandomness contains a couple functions that analyze or manipulate strings in some way. That’s about as specific as this subpackage gets. Enjoy!

autocorrect

The autocorrect function takes a single query string and a list of “correct” strings and identifies which string in the list the query most closely matches. There are many far more robust autocorrect algorithms written in Python than this one, but they all require a list of words organized by their frequency in a given language. Basically, these autocorrect algorithms are aimed at correcting words specific to a language and are thus better suited for use in language processing software, e.g. texting apps. This algorithm uses any list of strings and is order-agnostic. Thus, my autocorrect is better suited for attempting to match queries to small lists of arbitrary strings.

To help realize this concept, I have used this function in a program that presented data in a database about programs available on a given system. The query was the user’s request and the possible strings was simply the list of program names in the database. Thus, if a user misspelled a program name, the program likely produced the proper entry.

API Documentation

arandomness.string.autocorrect(query, possibilities, delta=0.75)[source]

Attempts to figure out what possibility the query is

This autocorrect function is rather simple right now with plans for later improvement. Right now, it just attempts to finish spelling a word as much as possible, and then determines which possibility is closest to said word.

Parameters:
  • query (unicode) – query to attempt to complete
  • possibilities (list) – list of unicodes of possible answers for query
  • delta (float) – minimum delta similarity between query and any given possibility for possibility to be considered. Delta used by difflib.get_close_matches().
Returns:

best guess of correct answer

Return type:

unicode

Raises:

AssertionError – raised if no matches found

Example

>>> autocorrect('bowtei', ['bowtie2', 'bot'])
'bowtie2'

max_substring

The max_substring function takes in a list of strings and finds the longest substring that they all share. By default, max_substring starts at the beginning of each string, but it can be optionally start at a later position as demonstrated in the docstring examples.

API Documentation

arandomness.string.max_substring(words, position=0, _last_letter='')[source]

Finds max substring shared by all strings starting at position

Parameters:
  • words (list) – list of unicode of all words to compare
  • position (int) – starting position in each word to begin analyzing for substring
  • _last_letter (unicode) – last common letter, only for use internally unless you really know what you are doing
Returns:

max string common to all words

Return type:

unicode

Examples

>>> max_substring(['aaaa', 'aaab', 'aaac'])
'aaa'
>>> max_substring(['abbb', 'bbbb', 'cbbb'], position=1)
'bbb'
>>> max_substring(['abc', 'bcd', 'cde'])
''