Page 1 of 1

Is there a quick PB alternative to Associative Arrays?

Posted: Mon Jun 08, 2009 2:14 pm
by akj
On this web page:
http://thraxil.org/users/anders/posts/2 ... nt-Lenses/
is posed the problem "Take the names of two U.S. States, mix them all together, then rearrange the letters to form the names of two other U.S. States. What states are these?"

The Python solution proposed there is amazingly efficient, mainly because of it's good use of an Associative Array (Dictionary). This got me thinking as to which tools/techniques (perhaps even SQLite) would result in the most efficient solution in PureBasic. How super-quick a solution can you find?

If you wish to have a try ...
So that benchmark comparisions are possible, I suggest you determine the ratio of the elapsed time of your solution with that of mine below (even though mine only finds one of the two pairs of U.S. states) and of all the proposed solutions we we can see which has the best time ratio with my version.

My version is:

Code: Select all

; U.S. States
; thraxil.org/users/anders/posts/2007/10/30/A-Simple-Programming-Puzzle-Seen-Through-Three-Different-Lenses

DataSection
Data.i 50
Data.s "alabama","alaska","arizona","arkansas","california","colorado"
Data.s "connecticut","delaware","florida","georgia","hawaii","idaho"
Data.s "illinois","indiana","iowa","kansas","kentucky","louisiana"
Data.s "maine","maryland","massachusetts","michigan","minnesota"
Data.s "mississippi","missouri","montana","nebraska","nevada"
Data.s "newhampshire","newjersey","newmexico","newyork","northcarolina"
Data.s "northdakota","ohio","oklahoma","oregon","pennsylvania","rhodeisland"
Data.s "southcarolina","southdakota","tennessee","texas","utah","vermont"
Data.s "virginia","washington","westvirginia","wisconsin","wyoming"
Data.s ""
EndDataSection

Declare.s Sort(txt$)

Define time, states, state, state1, state2, seen$=" ", key$, txt$
time = ElapsedMilliseconds()
; Get the data
Read states
Dim states$(states)
For state = 1 To states
  Read.s states$(state)
Next state
; Solve (find only the first solution)
For state1 = 2 To states
  For state2 = 1 To state1-1
    key$ = states$(state1)+states$(state2)
    key$ = Sort(key$)
    If FindString(seen$, " "+key$+" ", 1)
      time - ElapsedMilliseconds()
      txt$ = "Found:  "+states$(state1)+" and "+states$(state2)+#CRLF$
      txt$ + "The corresponding other two states are undetermined"+#CRLF$+#CRLF$
      txt$ + "Elapsed time = "+Str(-time)+" milliseconds"
      MessageRequester("U.S. States Solution", txt$)
      Break
    EndIf
    seen$ + key$+" "
  Next state2
Next state1
End

Procedure.s Sort(txt$)
; Sort individual characters within txt$
  Protected size, p
  size = Len(txt$)
  Dim a$(size)
  ; Copy txt$ to array
  For p = 1 To size
    a$(p) = Mid(txt$, p, 1)
  Next p
  SortArray(a$(), #PB_Sort_Ascending)
  ; Copy array to txt$
  txt$ = ""
  For p = 1 To size
    txt$ + a$(p)
  Next p
  ProcedureReturn txt$
EndProcedure

Posted: Mon Jun 08, 2009 3:17 pm
by Demivec
It's not a good idea to do a benchmark with the debugger running. :wink:

Posted: Mon Jun 08, 2009 3:29 pm
by akj
@Demivec:

True. I have now modified the program in the first post to use a message box instead.

Posted: Mon Jun 08, 2009 4:16 pm
by Mistrel
What you want is a hash list. This isn't built into PureBasic but there are working examples on the forum.

Posted: Mon Jun 08, 2009 4:29 pm
by Hroudtwolf

Posted: Mon Jun 08, 2009 5:58 pm
by Demivec
I increased the resolution of the timer.

Your version took ~515 tenths of a millisecond,
my version takes ~25 tenths of a millisecond.

My version also will report all possible matches as well as what they match with.
I reviewed the link you posted after I wrote the code below, it appears that I used a method similar to the "cluster_by" one mentioned later on that site.

Here's the code:

Code: Select all

DataSection
  Data.i 50
  Data.s "alabama","alaska","arizona","arkansas","california","colorado"
  Data.s "connecticut","delaware","florida","georgia","hawaii","idaho"
  Data.s "illinois","indiana","iowa","kansas","kentucky","louisiana"
  Data.s "maine","maryland","massachusetts","michigan","minnesota"
  Data.s "mississippi","missouri","montana","nebraska","nevada"
  Data.s "newhampshire","newjersey","newmexico","newyork","northcarolina"
  Data.s "northdakota","ohio","oklahoma","oregon","pennsylvania","rhodeisland"
  Data.s "southcarolina","southdakota","tennessee","texas","utah","vermont"
  Data.s "virginia","washington","westvirginia","wisconsin","wyoming"
  Data.s ""
EndDataSection

Declare.s Sort(*word.Character, wordLength) ;returns a string with the letters of a word sorted
Declare.i TicksHQ()

Define time, states, state, state1, state2, key$, txt$
time = TicksHQ()

; Get the data
Read states
Dim states$(states)
For state = 1 To states
  Read.s states$(state)
Next state

Structure statePair
  key$
  state_1.i
  state_2.i
EndStructure

; Solve
NewList statePairs.statePair()
For state1 = 1 To states - 1
  For state2 = state1 + 1 To states
    key$ = states$(state1)+states$(state2)
    Sort(@key$,Len(key$))
    AddElement(statePairs())
    statePairs()\key$ = key$
    statePairs()\state_1 = state1
    statePairs()\state_2 = state2
  Next state2
Next state1
SortStructuredList(statePairs(),#PB_Sort_Ascending,OffsetOf(statePair\key$),#PB_Sort_String)

ForEach statePairs()
  With statePairs()
    If \key$ = prevStatePair$
      If prevMatch
        txt$ = "       " + states$(\state_1) + " = " + states$(\state_1) + " + " + states$(\state_2) + #CRLF$
      Else
        txt$ = states$(prevState_1) + " + " + states$(prevState_2) + " = " + states$(\state_1) + " + " + states$(\state_2) + #CRLF$
        prevMatch = #True
      EndIf
    Else
      prevStatePair$ = \key$
      prevState_1 = \state_1
      prevState_2 = \state_2
      prevMatch = #False
    EndIf
  EndWith
Next
; Report elapsed time
time - TicksHQ()
txt$ + #CRLF$ + #CRLF$ + "Elapsed time = "+Str(-time)+" tenths of a millisecond" 
MessageRequester("Results",txt$)
End

Procedure.s Sort(*word.Character, wordLength) ;sorts the letters of a word without creating a new string
  Protected Dim letters.c(wordLength)
  Protected *letAdr = @letters()
  
  CopyMemoryString(*word, @*letAdr)
  SortArray(letters(), #PB_Sort_Ascending, 0, wordLength - 1)
  CopyMemoryString(@letters(), @*word)
EndProcedure

Procedure.i TicksHQ()
  ;Procedure borrowed from netmaestro in PureBasic forum
  Static maxfreq.q 
  Protected T.q 
  If maxfreq=0 
    QueryPerformanceFrequency_(@maxfreq) 
    maxfreq=maxfreq/10000
  EndIf 
  QueryPerformanceCounter_(@T) 
  ProcedureReturn T/maxfreq ;Result is in tenths-of-a-millisecond
EndProcedure 
Concerning anagrams, here's another post that relates: link.

Posted: Mon Jun 08, 2009 7:07 pm
by Trond
Running the program only once is silly for benchmarking when the program runs so fast. Better run it multiple times.

Also, there are only two states, not two pairs of states, so your program correctly finds everything there is to find. :)

I modified the original program to run multiple times (hope I didn't break it), here it is with timing code (run with debugger off):

Code: Select all

; U.S. States
; thraxil.org/users/anders/posts/2007/10/30/A-Simple-Programming-Puzzle-Seen-Through-Three-Different-Lenses

; ORIGINAL PROGRAM MODIFIED TO RUN SEVERAL TIMES FOR BENCHMARKING

DataSection
  DataStart:
Data.i 50
Data.s "alabama","alaska","arizona","arkansas","california","colorado"
Data.s "connecticut","delaware","florida","georgia","hawaii","idaho"
Data.s "illinois","indiana","iowa","kansas","kentucky","louisiana"
Data.s "maine","maryland","massachusetts","michigan","minnesota"
Data.s "mississippi","missouri","montana","nebraska","nevada"
Data.s "newhampshire","newjersey","newmexico","newyork","northcarolina"
Data.s "northdakota","ohio","oklahoma","oregon","pennsylvania","rhodeisland"
Data.s "southcarolina","southdakota","tennessee","texas","utah","vermont"
Data.s "virginia","washington","westvirginia","wisconsin","wyoming"
Data.s ""
EndDataSection

Declare.s Sort(txt$)

Procedure DoIt()
  
  Global states, state, state1, state2, seen$=" ", key$, txt$
  ;time = ElapsedMilliseconds()
  ; Get the data
  Restore DataStart
  Read states
  Dim states$(states)
  For state = 1 To states
    Read.s states$(state)
  Next state
  ; Solve (find only the first solution)
  For state1 = 2 To states
    For state2 = 1 To state1-1
      key$ = states$(state1)+states$(state2)
      key$ = Sort(key$)
      If FindString(seen$, " "+key$+" ", 1)
        txt$ = "Found:  "+states$(state1)+" and "+states$(state2)+#CRLF$
        txt$ + "The corresponding other two states are undetermined"+#CRLF$+#CRLF$
        ;txt$ + "Elapsed time = "+Str(-time)+" milliseconds"
        ;MessageRequester("U.S. States Solution", txt$)
        Break
      EndIf
      seen$ + key$+" "
    Next state2
  Next state1
EndProcedure

Procedure.s Sort(txt$)
; Sort individual characters within txt$
  Protected size, p
  size = Len(txt$)
  Dim a$(size)
  ; Copy txt$ to array
  For p = 1 To size
    a$(p) = Mid(txt$, p, 1)
  Next p
  SortArray(a$(), #PB_Sort_Ascending)
  ; Copy array to txt$
  txt$ = ""
  For p = 1 To size
    txt$ + a$(p)
  Next p
  ProcedureReturn txt$
EndProcedure


#Tries = 20

time = ElapsedMilliseconds()
For U = 0 To #Tries
  DoIt()
Next
MessageRequester("", Str(#Tries) + " iterations took:" + #CRLF$ + Str(ElapsedMilliseconds()-time))

My version which I made without reading anyone elses:

Code: Select all

DataSection
  DataStart:
  Data.s "alabama","alaska","arizona","arkansas","california","colorado"
  Data.s "connecticut","delaware","florida","georgia","hawaii","idaho"
  Data.s "illinois","indiana","iowa","kansas","kentucky","louisiana"
  Data.s "maine","maryland","massachusetts","michigan","minnesota"
  Data.s "mississippi","missouri","montana","nebraska","nevada"
  Data.s "newhampshire","newjersey","newmexico","newyork","northcarolina"
  Data.s "northdakota","ohio","oklahoma","oregon","pennsylvania","rhodeisland"
  Data.s "southcarolina","southdakota","tennessee","texas","utah","vermont"
  Data.s "virginia","washington","westvirginia","wisconsin","wyoming"
EndDataSection

Structure Charray
  C.c[0]
EndStructure

Procedure QuickSortStringAux(*String.Charray, Start.i, Nd.i)
  If Start < Nd
    Protected Pivot.c = *String\c[Nd]
    Protected I.i = Start
    Protected J.i = Nd
    While I <> J
      If *String\C[I] < Pivot
        I + 1
      Else
        *String\C[J] = *String\c[I]
        *String\C[I] = *String\c[J-1]
        J - 1
      EndIf
    Wend
    *String\c[J] = Pivot
    QuickSortStringAux(*String, Start, J-1)
    QuickSortStringAux(*String, J+1, Nd)
  EndIf
EndProcedure

Procedure.s QuickSortString(String.s)
  QuickSortStringAux(@String, 0, Len(String)-1)
  ProcedureReturn String
EndProcedure

Structure SPair
  S1.s
  S2.s
EndStructure

Procedure DoIt2()
  
  Dim States.s(49)
  Dim StatePairs.SPair(49*25)
  Protected I.i
  
  Restore DataStart
  For I = 0 To 49
    Read.s States(I)
  Next
  For I = 0 To 49
    For J = I+1 To 49
      StatePairs(K)\S1 = states(I) + ", " + states(J)
      StatePairs(K)\S2 = QuickSortString(states(I) + States(J))
      K + 1
    Next
  Next
  
  SortStructuredArray(StatePairs(), #PB_Sort_Ascending, OffsetOf(SPair\S2), #PB_Sort_String)
  
  For I = 0 To 49*25-1
    If StatePairs(I)\S2 = StatePairs(I+1)\S2
      ;MessageRequester("Found", StatePairs(I)\S1 + " -> " + StatePairs(I+1)\S1)
      Break
    EndIf
  Next
EndProcedure

#Tries = 500

time = ElapsedMilliseconds()
For U = 0 To #Tries
  DoIt2()
Next
MessageRequester("", Str(#Tries) + " iterations took:" + #CRLF$ + Str(ElapsedMilliseconds()-time))

My results are:
Original, 20 iterations: 1969 ms
Mine, 500 iterations: 1953 ms

I think my its my sort procedure which is faster.

How to use a hash table for this is beyond me, would be nice if someone could make a python version that solved the problem many times so we could properly time it.

Posted: Tue Jun 09, 2009 12:36 am
by Demivec
@Trond: I rearranged my previous version posted to match your run-it-500-times special. Yours took 2484 ms and mine took 1500ms. And yes, it seems your computer is a bit faster than mine.:wink:

My sort is at least twice as fast as yours. The code for it was made in a creative moment a month or so ago.8)

Trond wrote:Also, there are only two states, not two pairs of states, so your program correctly finds everything there is to find.
There are two pairs of states. If there was only one you wouldn't find a match. Both your version and mine, prints the solution as one pair of states that is equal to another pair of states (that's 2 pairs total). akj's only prints one.

Code: Select all

DataSection
  statesData:
  Data.i 50
  Data.s "alabama","alaska","arizona","arkansas","california","colorado"
  Data.s "connecticut","delaware","florida","georgia","hawaii","idaho"
  Data.s "illinois","indiana","iowa","kansas","kentucky","louisiana"
  Data.s "maine","maryland","massachusetts","michigan","minnesota"
  Data.s "mississippi","missouri","montana","nebraska","nevada"
  Data.s "newhampshire","newjersey","newmexico","newyork","northcarolina"
  Data.s "northdakota","ohio","oklahoma","oregon","pennsylvania","rhodeisland"
  Data.s "southcarolina","southdakota","tennessee","texas","utah","vermont"
  Data.s "virginia","washington","westvirginia","wisconsin","wyoming"
  Data.s ""
EndDataSection

Structure statePair
  key$
  state_1.i
  state_2.i
EndStructure

Procedure.s Sort(*word.Character, wordLength) ;sorts the letters of a word without creating a new string
  Protected Dim letters.c(wordLength)
  Protected *letAdr = @letters()
  
  CopyMemoryString(*word, @*letAdr)
  SortArray(letters(), #PB_Sort_Ascending, 0, wordLength - 1)
  CopyMemoryString(@letters(), @*word)
EndProcedure

Procedure doIt3()
  Define time, states, i, state1, state2, key$, txt$
  ; Get the data
  Restore statesData
  Read states
  Protected Dim states$(states)
  For i = 1 To states
    Read.s states$(i)
  Next i
  
  Protected NewList statePairs.statePair()
  For state1 = 1 To states - 1
    For state2 = state1 + 1 To states
      key$ = states$(state1)+states$(state2)
      Sort(@key$,Len(key$))
      AddElement(statePairs())
      statePairs()\key$ = key$
      statePairs()\state_1 = state1
      statePairs()\state_2 = state2
    Next state2
  Next state1
  SortStructuredList(statePairs(),#PB_Sort_Ascending,OffsetOf(statePair\key$),#PB_Sort_String)
  
  ForEach statePairs()
    With statePairs()
      If \key$ = prevStatePair$
        If prevMatch
          txt$ = "       " + states$(\state_1) + " = " + states$(\state_1) + " + " + states$(\state_2) + #CRLF$
        Else
          txt$ = states$(prevState_1) + " + " + states$(prevState_2) + " = " + states$(\state_1) + " + " + states$(\state_2) + #CRLF$
          prevMatch = #True
        EndIf
      Else
        prevStatePair$ = \key$
        prevState_1 = \state_1
        prevState_2 = \state_2
        prevMatch = #False
      EndIf
    EndWith
  Next
  ;MessageRequester("Results",txt$)
EndProcedure

#Tries = 500

Define time, u

time = ElapsedMilliseconds()
For u = 0 To #Tries
  doIt3()
Next
MessageRequester("", Str(#Tries) + " iterations took:" + #CRLF$ + Str(ElapsedMilliseconds()-time))

End

Posted: Tue Jun 09, 2009 9:07 am
by Trond
@Trond: I rearranged my previous version posted to match your run-it-500-times special. Yours took 2484 ms and mine took 1500ms. And yes, it seems your computer is a bit faster than mine.
Yours with debugger is about as fast as mine without here, which is very impressive.
Demivec wrote:My sort is at least twice as fast as yours. The code for it was made in a creative moment a month or so ago.8)
Yes, I saw it, very clever.
Trond wrote:Also, there are only two states, not two pairs of states, so your program correctly finds everything there is to find.
There are two pairs of states. If there was only one you wouldn't find a match. Both your version and mine, prints the solution as one pair of states that is equal to another pair of states (that's 2 pairs total). akj's only prints one.
You're right, I must have been asleep!

I think that my version is slower because I use a string to store all the possible state combinations when I just need to display two of the combinations. You only generate strings for the combinations you need based on integers.

Posted: Wed Jun 10, 2009 4:44 pm
by akj
I was hoping that something useful would arise from this posting and it most certainly has.
Many thanks Demivec and Trond for the REALLY useful string sorting routines.

Also thank you Hroudtwolf for the Associative Array postings.
I really hope that one day Associative Arrays will be native to PureBasic as they are the natural (and very fast) data structure to use for quite a wide range of programming problems.

Posted: Wed Jun 10, 2009 8:07 pm
by Trond
I really hope that one day Associative Arrays will be native to PureBasic as they are the natural (and very fast) data structure to use for quite a wide range of programming problems.
I still don't understand how to solve this problem with associative arrays, can you explain the idea quickly?

Posted: Thu Jun 11, 2009 11:52 am
by akj
@Trond:

The crucial concepts about Associative Arrays is that you can:

1. Store data (the names of two US states) under a key (the sorted anagram of those two concatenated state names). A flexible implementation of Associative Arrays will permit that key to be numeric or non-numeric.

2. Later, on trying to store the same key again (the sorted concatenation of two OTHER state names) you will immediately discover that duplication has occurred and you can then easily retrieve the original two state names from the Array to solve the problem.

Due [normally] to hashing, all the above is very fast.

Posted: Thu Jun 11, 2009 12:45 pm
by djes
Associative arrays are very useful! In PHP, you use them all the time, it's really powerful, especially when it comes to string handling and databases.

Posted: Thu Jun 11, 2009 4:03 pm
by Trond
akj wrote:@Trond:

The crucial concepts about Associative Arrays is that you can:

1. Store data (the names of two US states) under a key (the sorted anagram of those two concatenated state names). A flexible implementation of Associative Arrays will permit that key to be numeric or non-numeric.

2. Later, on trying to store the same key again (the sorted concatenation of two OTHER state names) you will immediately discover that duplication has occurred and you can then easily retrieve the original two state names from the Array to solve the problem.

Due [normally] to hashing, all the above is very fast.
Aha, so you still need the sorting. Thanks for the explanation.