Demivec shows the right way here.
Building on his approach, I found a way to find all anagrams of a given word, regardless of length (up to 15 letters) in around 12-15 milliseconds. You have to use a bit more diskspace, but nothing too major. Here is the method:
1) Start with a list of legal words, I use sowpods.txt with 216555 entries up to 15 letters long.
2) Create a copy of the list with each word accompanied by its position in the file.
i.e. " DANGER003244"
3) Create a copy of the second list with the entries all in the same order but sort the letters of the words.
i.e. " ADEGNR003244" (the numbers are left alone)
4) Sort the third list by words only. This way all anagrams of a given word will be grouped together
and each duplicate will have its own pointer to the word it represents. Here is an example of the
word "easter" in this file:
AEERST180615
AEERST167428
AEERST157997
AEERST163296
AEERST152860
AEERST009349
AEERST056677
AEERST056710
AEERST189775
5) Any entry in this index file which doesn't have at least one duplicate represents a word with no anagrams. It doesn't need to be there. So run through the file and remove any single entries.
6) Create a SQLITE3 database from this fourth list to use as an index into the original word list. Now you're ready to cook with gas.
Method for retrieving anagrams:
1) Read sowpods2.txt into an array of 216555 strings
2) Open the index database you created
3) when a word is supplied to check for anagrams, sort its letters and search as follows:
Code: Select all
s$ = Prepare_Input_String_For_Search( GetGadgetText(0) ) ; sort letters and convert to UCase
sql$="SELECT * FROM Table1 WHERE wordid = '" + s$ + "'"
DatabaseQuery(0, sql$)
While NextDatabaseRow(0)
AddGadgetItem(2,-1,words(GetDatabaseLong(0,1)))
Wend
FinishDatabaseQuery(0)
The diskspace footprint of this approach is around 3mb,2.3 for sowpods2.txt and 0.8 for the index database. A bit more than the diskspace of the original word list, but the payoff is that you can find all anagrams for any length word up to 15 letters in 12 ms or so. That's basically less than one timeslice on most Windows OS's. Put TENTATIVENESS in and you get ATTENTIVENESS back immediately.
Here is a zip file containing sowpods.txt, wordindex.db and anagrams.pb:
http://www.lloydsplace.com/anagrams2.zip
Now that single-word anagrams are attainable quickly, the next challenge is to come up with a method that will generate multi-word anagrams, such as TOM CRUISE = SO I'M CUTER or GEORGE BUSH = HE BUGS GORE etc. This is where it gets a lot more interesting with the potential for plenty of amazing results.
So - Any ideas on how to go forward from here?