Skip to content

How to find anagrams in Python

As a part of my Python learning endeavor I have tossed up this fun script. It finds all anagrams in given dictionary in O(nlogn) time. The algorithm goes like this:

  1. Read dictionary from file.
  2. Create a copy of all words and store them in 2D array, along with originals.
  3. Sort letters in these copies of words.
  4. Sort 2D array by sorted letters of words. This is the O(nlogn) part.
    • Words that consist of the same letters, anagrams, should now be adjacent.
  5. Traverse the array. If the same sorted letter “word” appears, the original word is anagram of it and it has been found.

Beside the algorithm, the script also features command line option parsing for telling it where is the dictionary file and where should it store anagrams. It does output a lot of words with just one switched letter. It could therefore be made better, if hamming distance was attached to them. You  could sort them again in decreasing order of hamming distance, which would make more interesting anagrams come first in the list. I didn’t go there. Here is the basic script:

import argparse
import codecs
import sys
from operator import itemgetter

parser = argparse.ArgumentParser()
parser.add_argument("-f", "--file", help="dictionary file")
parser.add_argument("-o", "--output", help="output file")
args = parser.parse_args()

def append_to_file(filename,text):
	if filename is None or filename == "":
		return
	file = codecs.open(filename, "a", "utf8")
	file.write("{0}".format(text))
	file.close()

def order (line):
	chars = list(line)
	chars.sort(key=itemgetter(0))
	ordered = "".join(chars)
	return ordered

# read from file and prepare
words = []
with open(args.file, "r") as f:
	for line in f:
		word = []
		word.append(line.lower())
		word.append(order(word[0]))
		words.append(word)

# sort
words.sort(key=itemgetter(1))

anagram = False
for i in range(1,len(words)):
	if (words[i-1][1] == words[i][1]) and (words[i-1][0] != words[i][0]): # if duplicates in dictionary
		if anagram == False:
			anagram = True
			append_to_file(args.output,"----\n")
			append_to_file(args.output,words[i-1][0])
		append_to_file(args.output,words[i][0])
	else:
		anagram = False

I had some fun with Slovene dictionary. I made a whole sentence almost exclusively from anagrams:

Raketirajva akvaterarij zaradi anektiranja Karantanije. /Let’s rocket the aquaterrarium for annexing Carantania.

And a poem exclusively from anagrams! But frankly, it only vaguely means something:

Lagodni dolgani godalni dognali,
dolgina nadlogi gladino odgnali.

I don’t even dare to translate it. I’d be happy if anyone posted some of their interesting/funny/stupid results in comments ;)

Enhanced by Zemanta

Categories: How to.

Tags: , , , ,