netmaestro wrote:Talk about cooking with gas. I looked at the code I had utilizing a SQLite database for the indexes and I thought something else might be faster. So I replaced it with a binary search procedure and it cut the search time by a factor of ten. An average search now takes from 0 - 0.6 milliseconds! I had to resort to QueryPerformanceCounter_() to measure it. No matter how long the search word or how many anagrams are found, I can't make a search take longer than 0.6 milliseconds. Updated code is in the download.
Now, truly, on to phrases. I'm never going to beat these times.
@netmaestro: I have another surprise for you, I've cut your times in
half!
I'll post the code here, instead of a link, for easier viewing of the methods. A casual note on the coding, I've added more structure to the code as it's length has increased. To make this version I took the procedures Search() and TicksHQ() from your code. I also rewrote my sortLetters() routine and build the index array anew each time the program runs. This way you only have to supply a dictionary file and you are good to go. It also means you can add to your dictionary file and you'll automatically get a working index.
That brings me to another point, I discovered your index file has 2 inaccuracies in it. To discover one of them, find the anagrams for sawblade using your program. The other is that you are missing indexes for the words AHA and AAH so you won't be able to find them in your program either even though they are in your dictionary file. Discovering this was a real aha moment.
To compare the speed results with yours I had to increase the reported time resolution from tenths to hundreths of a second. Yours reported 0.17 milliseconds for a search on "easter" and mine reported a time of 0.07 milliseconds. So to paraphrase you, now you are cooking with
more gas.

I think the speed improvements that are involved mainly have to do with eliminating more of the string functions, such as Left() and Mid(). I did this by using a structured index array instead of the combined method you had used, the storage used by the array is only 44% of it's former size (for both memory and disk space).
Here's the code:
Code: Select all
;-constants
#DictionaryFile = "sowpods2.txt"
Enumeration
;windows
#mainWindow
;gadgets
#Word_str = 0
#List_btn
#AnagramList_lvw
EndEnumeration
;-structures
Structure pattern
letters.s
index.i
EndStructure
;-global variables
Global Dim words.s(0) ;resized later in init procedure
Global Dim index.pattern(0) ;resized later in init procedure
;-procedures
Procedure handleError(p_result.l, p_text.s) ;display error message if result= 0 and then END program
;sample use HandleError(InitSprite(), "Could not initialize sprites.")
;Procedure borrowed from Kale in PureBasic forum
If p_result = 0
MessageRequester("Error", p_text, #PB_MessageRequester_Ok)
End
EndIf
EndProcedure
Procedure.i TicksHQ()
;Procedure borrowed from netmaestro in PureBasic forum
Static maxfreq.q
Protected T.q
If maxfreq=0
QueryPerformanceFrequency_(@maxfreq)
maxfreq=maxfreq/100000
EndIf
QueryPerformanceCounter_(@T)
ProcedureReturn T/maxfreq ;Result is in hundreths-of-a-millisecond
EndProcedure
Procedure.s sortLetters(*word.Character, length)
Protected Dim letters.c(length)
CopyMemory(*word, @letters(), length * SizeOf(Character))
SortArray(letters(), #PB_Sort_Ascending, 0, length - 1)
ProcedureReturn PeekS(@letters(), length)
EndProcedure
Procedure initDictionaryPatterns() ;returns #True if successful
Protected i, j, isFirst, numWords, fileNumber
Protected NewList patterns.pattern()
numWords = 200000
Redim words.s(numWords)
fileNumber = ReadFile(#PB_Any, #DictionaryFile) ; Read all legal words into array words()
If Not fileNumber
ProcedureReturn #False ;failed to initialize dictionary and patterns
EndIf
i = 1
While Not Eof(fileNumber)
words(i) = UCase(Trim(ReadString(fileNumber)))
i + 1
If i > numWords
numWords + 20000
Redim words.s(numWords)
EndIf
Wend
numWords = i - 1
CloseFile(fileNumber)
Redim words.s(numWords)
For i = 1 To numWords
AddElement(patterns())
patterns()\letters = sortLetters(@words(i), Len(words(i)))
patterns()\index = i
Next
SortStructuredList(patterns(), #PB_Sort_Ascending, OffsetOf(pattern\letters), #PB_Sort_String)
;build array of pattern index for only duplicate patterns (no singles)
Redim index.pattern(numWords)
i = 0
isFirst = #True
ForEach patterns()
If isFirst
index(i)\letters = patterns()\letters
index(i)\index = patterns()\index
j = i
isFirst = #False
ElseIf patterns()\letters = index(i)\letters
j + 1
index(j)\letters = patterns()\letters
index(j)\index = patterns()\index
ElseIf j = i ;test for a single
isFirst = #True
PreviousElement(patterns())
Else
i = j + 1
isFirst = #True
PreviousElement(patterns())
EndIf
Next
If j = i
i = i - 1
EndIf
Redim index.pattern(i)
ProcedureReturn #True ;success completion
EndProcedure
Procedure Search(Array patterns.pattern(1), key$)
Protected lo = 1
Protected hi = ArraySize(patterns())
Protected mid
While lo <= hi
mid = lo + (hi-lo) / 2
If key$ = patterns(mid)\letters ; match
While patterns(mid - 1)\letters = key$
mid - 1
Wend
ProcedureReturn mid
ElseIf key$ < patterns(mid)\letters
hi = mid - 1 ; search low end of array
Else
lo = mid + 1 ; search high end of array
EndIf
Wend
ProcedureReturn 0
EndProcedure
Procedure findAnagram(word.s, List wordList.s())
Protected matchLength = Len(word)
If matchLength = 0
ProcedureReturn 0
EndIf
word = UCase(word)
Protected matchLetters.s = sortLetters(@word, matchLength)
Protected patternIndex, wordIndex
ClearList(wordList())
patternIndex = Search(index(), matchLetters)
If patternIndex
While index(patternIndex)\letters = matchLetters
wordIndex = index(patternIndex)\index
If words(wordIndex) <> word
AddElement(wordList())
wordList() = words(wordIndex)
EndIf
patternIndex + 1
Wend
EndIf
ProcedureReturn ListSize(wordList())
EndProcedure
;-main
handleError(initDictionaryPatterns(), "Unable To initialize word List")
handleError(OpenWindow(#mainWindow, 0, 0, 290, 500, "Anagram Finder", #PB_Window_ScreenCentered | #PB_Window_SystemMenu), "Unable to open Window")
StringGadget(#Word_str, 10, 10, 160, 20, "", #PB_String_LowerCase)
ButtonGadget(#List_btn, 180, 10, 100, 20, "List")
ListViewGadget(#AnagramList_lvw, 10, 40, 270, 455)
Define event, beginTime, totalTime
NewList wordList.s()
Repeat
event = WaitWindowEvent(10)
Select event
Case #PB_Event_Gadget
Select EventGadget()
Case #Word_str
If EventType() = #PB_EventType_Focus
SendMessage_(GadgetID(#Word_str), #EM_SETSEL, 0, -1)
EndIf
Case #List_btn
ClearGadgetItems(#AnagramList_lvw)
Define StartTime = TicksHQ()
If findAnagram(GetGadgetText(#Word_str), wordList())
ForEach wordList()
AddGadgetItem(#AnagramList_lvw, -1, wordList())
Next
Define totalTime = TicksHQ() - StartTime
Else
AddGadgetItem(#AnagramList_lvw, -1, "< NONE FOUND >")
EndIf
AddGadgetItem(#AnagramList_lvw, -1, "")
AddGadgetItem(#AnagramList_lvw, -1, "Search time: " + Str(totalTime) + " hundreths of a millisecond")
EndSelect
EndSelect
Until event = #PB_Event_CloseWindow
I'm going to see what kind of results come back for phrases (see my previous post). It might be ready in a day or so. Refining it will be the more difficult part.