About Andy's anagramiser

I did this because my favourite newspaper crossword has plenty of anagram clues (even though they are a bit cryptic) and I often thought about how to get a computer to help me solve them.

Then late one night I had quite a good idea how to go about devising a suitable algorithm. The idea went like this: store all the words in a tree structure, with the front of each word on the trunk of the tree and the ends of each word forming the leaves. The basic node structure is just a set of 26 pointers, each pointing to the next node along the branch (or a null pointer). Now this was all fine, except that the complete tree took up quite a bit more memory than a simple list of the words contained in it. I found the code ran quite well on a fast UN*X box, but my 1MB 8086 PC ran out of memory pretty quick. So I worked out a way of crunching the tree structure into about 1/20th the original size, which wasnt too tricky, but resulted in more code.

The actual anagram engine is recursive and incredibly simple. It starts with a list of all the letters in the phrase to be anagramised and is pointed to the root of the word tree. It pops off the first letter in the list and then calls itself with the location of the branch starting with that letter. A solution is found when there are no letters left in the list. When it returns it pushes the letter it last used back onto the list. The depth of recursion never exceeds the number of letters in the phrase to be anagramised. If you dont understand this then tough!

It's all written in C (apart from the bits written in Perl). In fact I dont think that the bits written in C could be written in anything else apart from a lower-level language. I certainly dont think they could be done in Perl, certainly not Perl 4. Speed of execution is pretty important here.

Anyway, after a bit of tinkering around I managed to get the thing running on my 1 MB PC and it seemed pretty quick and I found I was solving those crosswords a lot quicker than before. However, the 1 MB limit on RAM meant I couldn't use a dictionary bigger than about 18,000 words.

The basic generator uses the word list from /usr/dict/words on this fast UN*X box (a Sparc 10). Since operating the anagram service I have been kindly pointed to some much larger word lists by some good netizens out there. The big dictionary has lots of inflected forms such as plurals and comparatives and is much more fun. It even has words like phoneboxes, which makes one of the longest single word anagrams I know.

There were a few other things to get tidied up before I put the thing on the Web. The main one was that if you gave the thing a longish phrase, it could run for (literally) ages and so all I have done is make it stop after 12 seconds of elapsed (not cpu) time. If this happens, you should see "**time limit exceeded**" at the end of the list.

I'm sorry that everything appears in lower case, I'm working on that!

Other improvements in the pipeline:

Solve that crossword!
My home page