As a part of my Python learning endeavor I have tossed up this fun script. It finds all anagrams in given dictionary in O(nlogn) time. The algorithm goes like this:
Beside the algorithm, the script also features command line option parsing for telling it where is the dictionary file and where should it store anagrams. It does output a lot of words with just one switched letter. It could therefore be made better, if hamming distance was attached to them. You could sort them again in decreasing order of hamming distance, which would make more interesting anagrams come first in the list. I didn't go there. Here is the basic script:
import argparse
import codecs
import sys
from operator import itemgetter
parser = argparse.ArgumentParser()
parser.add_argument("-f", "--file", help="dictionary file")
parser.add_argument("-o", "--output", help="output file")
args = parser.parse_args()
def append_to_file(filename,text):
if filename is None or filename == "":
return
file = codecs.open(filename, "a", "utf8")
file.write("{0}".format(text))
file.close()
def order (line):
chars = list(line)
chars.sort(key=itemgetter(0))
ordered = "".join(chars)
return ordered
# read from file and prepare
words = []
with open(args.file, "r") as f:
for line in f:
word = []
word.append(line.lower())
word.append(order(word[0]))
words.append(word)
# sort
words.sort(key=itemgetter(1))
anagram = False
for i in range(1,len(words)):
if (words[i-1][1] == words[i][1]) and (words[i-1][0] != words[i][0]): # if duplicates in dictionary
if anagram == False:
anagram = True
append_to_file(args.output,"----\n")
append_to_file(args.output,words[i-1][0])
append_to_file(args.output,words[i][0])
else:
anagram = False
Previous:
Beginner's view on Python
Next:
Cosmic ray threat?