Is there a quick PB alternative to Associative Arrays?

Everything else that doesn't fall into one of the other PB categories.
akj
Enthusiast
Enthusiast
Posts: 668
Joined: Mon Jun 09, 2003 10:08 pm
Location: Nottingham

Is there a quick PB alternative to Associative Arrays?

Post by akj »

On this web page:
http://thraxil.org/users/anders/posts/2 ... nt-Lenses/
is posed the problem "Take the names of two U.S. States, mix them all together, then rearrange the letters to form the names of two other U.S. States. What states are these?"

The Python solution proposed there is amazingly efficient, mainly because of it's good use of an Associative Array (Dictionary). This got me thinking as to which tools/techniques (perhaps even SQLite) would result in the most efficient solution in PureBasic. How super-quick a solution can you find?

If you wish to have a try ...
So that benchmark comparisions are possible, I suggest you determine the ratio of the elapsed time of your solution with that of mine below (even though mine only finds one of the two pairs of U.S. states) and of all the proposed solutions we we can see which has the best time ratio with my version.

My version is:

Code: Select all

; U.S. States
; thraxil.org/users/anders/posts/2007/10/30/A-Simple-Programming-Puzzle-Seen-Through-Three-Different-Lenses

DataSection
Data.i 50
Data.s "alabama","alaska","arizona","arkansas","california","colorado"
Data.s "connecticut","delaware","florida","georgia","hawaii","idaho"
Data.s "illinois","indiana","iowa","kansas","kentucky","louisiana"
Data.s "maine","maryland","massachusetts","michigan","minnesota"
Data.s "mississippi","missouri","montana","nebraska","nevada"
Data.s "newhampshire","newjersey","newmexico","newyork","northcarolina"
Data.s "northdakota","ohio","oklahoma","oregon","pennsylvania","rhodeisland"
Data.s "southcarolina","southdakota","tennessee","texas","utah","vermont"
Data.s "virginia","washington","westvirginia","wisconsin","wyoming"
Data.s ""
EndDataSection

Declare.s Sort(txt$)

Define time, states, state, state1, state2, seen$=" ", key$, txt$
time = ElapsedMilliseconds()
; Get the data
Read states
Dim states$(states)
For state = 1 To states
  Read.s states$(state)
Next state
; Solve (find only the first solution)
For state1 = 2 To states
  For state2 = 1 To state1-1
    key$ = states$(state1)+states$(state2)
    key$ = Sort(key$)
    If FindString(seen$, " "+key$+" ", 1)
      time - ElapsedMilliseconds()
      txt$ = "Found:  "+states$(state1)+" and "+states$(state2)+#CRLF$
      txt$ + "The corresponding other two states are undetermined"+#CRLF$+#CRLF$
      txt$ + "Elapsed time = "+Str(-time)+" milliseconds"
      MessageRequester("U.S. States Solution", txt$)
      Break
    EndIf
    seen$ + key$+" "
  Next state2
Next state1
End

Procedure.s Sort(txt$)
; Sort individual characters within txt$
  Protected size, p
  size = Len(txt$)
  Dim a$(size)
  ; Copy txt$ to array
  For p = 1 To size
    a$(p) = Mid(txt$, p, 1)
  Next p
  SortArray(a$(), #PB_Sort_Ascending)
  ; Copy array to txt$
  txt$ = ""
  For p = 1 To size
    txt$ + a$(p)
  Next p
  ProcedureReturn txt$
EndProcedure
Last edited by akj on Mon Jun 08, 2009 3:32 pm, edited 2 times in total.
Anthony Jordan
User avatar
Demivec
Addict
Addict
Posts: 4283
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Post by Demivec »

It's not a good idea to do a benchmark with the debugger running. :wink:
akj
Enthusiast
Enthusiast
Posts: 668
Joined: Mon Jun 09, 2003 10:08 pm
Location: Nottingham

Post by akj »

@Demivec:

True. I have now modified the program in the first post to use a message box instead.
Anthony Jordan
Mistrel
Addict
Addict
Posts: 3415
Joined: Sat Jun 30, 2007 8:04 pm

Post by Mistrel »

What you want is a hash list. This isn't built into PureBasic but there are working examples on the forum.
User avatar
Demivec
Addict
Addict
Posts: 4283
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Post by Demivec »

I increased the resolution of the timer.

Your version took ~515 tenths of a millisecond,
my version takes ~25 tenths of a millisecond.

My version also will report all possible matches as well as what they match with.
I reviewed the link you posted after I wrote the code below, it appears that I used a method similar to the "cluster_by" one mentioned later on that site.

Here's the code:

Code: Select all

DataSection
  Data.i 50
  Data.s "alabama","alaska","arizona","arkansas","california","colorado"
  Data.s "connecticut","delaware","florida","georgia","hawaii","idaho"
  Data.s "illinois","indiana","iowa","kansas","kentucky","louisiana"
  Data.s "maine","maryland","massachusetts","michigan","minnesota"
  Data.s "mississippi","missouri","montana","nebraska","nevada"
  Data.s "newhampshire","newjersey","newmexico","newyork","northcarolina"
  Data.s "northdakota","ohio","oklahoma","oregon","pennsylvania","rhodeisland"
  Data.s "southcarolina","southdakota","tennessee","texas","utah","vermont"
  Data.s "virginia","washington","westvirginia","wisconsin","wyoming"
  Data.s ""
EndDataSection

Declare.s Sort(*word.Character, wordLength) ;returns a string with the letters of a word sorted
Declare.i TicksHQ()

Define time, states, state, state1, state2, key$, txt$
time = TicksHQ()

; Get the data
Read states
Dim states$(states)
For state = 1 To states
  Read.s states$(state)
Next state

Structure statePair
  key$
  state_1.i
  state_2.i
EndStructure

; Solve
NewList statePairs.statePair()
For state1 = 1 To states - 1
  For state2 = state1 + 1 To states
    key$ = states$(state1)+states$(state2)
    Sort(@key$,Len(key$))
    AddElement(statePairs())
    statePairs()\key$ = key$
    statePairs()\state_1 = state1
    statePairs()\state_2 = state2
  Next state2
Next state1
SortStructuredList(statePairs(),#PB_Sort_Ascending,OffsetOf(statePair\key$),#PB_Sort_String)

ForEach statePairs()
  With statePairs()
    If \key$ = prevStatePair$
      If prevMatch
        txt$ = "       " + states$(\state_1) + " = " + states$(\state_1) + " + " + states$(\state_2) + #CRLF$
      Else
        txt$ = states$(prevState_1) + " + " + states$(prevState_2) + " = " + states$(\state_1) + " + " + states$(\state_2) + #CRLF$
        prevMatch = #True
      EndIf
    Else
      prevStatePair$ = \key$
      prevState_1 = \state_1
      prevState_2 = \state_2
      prevMatch = #False
    EndIf
  EndWith
Next
; Report elapsed time
time - TicksHQ()
txt$ + #CRLF$ + #CRLF$ + "Elapsed time = "+Str(-time)+" tenths of a millisecond" 
MessageRequester("Results",txt$)
End

Procedure.s Sort(*word.Character, wordLength) ;sorts the letters of a word without creating a new string
  Protected Dim letters.c(wordLength)
  Protected *letAdr = @letters()
  
  CopyMemoryString(*word, @*letAdr)
  SortArray(letters(), #PB_Sort_Ascending, 0, wordLength - 1)
  CopyMemoryString(@letters(), @*word)
EndProcedure

Procedure.i TicksHQ()
  ;Procedure borrowed from netmaestro in PureBasic forum
  Static maxfreq.q 
  Protected T.q 
  If maxfreq=0 
    QueryPerformanceFrequency_(@maxfreq) 
    maxfreq=maxfreq/10000
  EndIf 
  QueryPerformanceCounter_(@T) 
  ProcedureReturn T/maxfreq ;Result is in tenths-of-a-millisecond
EndProcedure 
Concerning anagrams, here's another post that relates: link.
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

Running the program only once is silly for benchmarking when the program runs so fast. Better run it multiple times.

Also, there are only two states, not two pairs of states, so your program correctly finds everything there is to find. :)

I modified the original program to run multiple times (hope I didn't break it), here it is with timing code (run with debugger off):

Code: Select all

; U.S. States
; thraxil.org/users/anders/posts/2007/10/30/A-Simple-Programming-Puzzle-Seen-Through-Three-Different-Lenses

; ORIGINAL PROGRAM MODIFIED TO RUN SEVERAL TIMES FOR BENCHMARKING

DataSection
  DataStart:
Data.i 50
Data.s "alabama","alaska","arizona","arkansas","california","colorado"
Data.s "connecticut","delaware","florida","georgia","hawaii","idaho"
Data.s "illinois","indiana","iowa","kansas","kentucky","louisiana"
Data.s "maine","maryland","massachusetts","michigan","minnesota"
Data.s "mississippi","missouri","montana","nebraska","nevada"
Data.s "newhampshire","newjersey","newmexico","newyork","northcarolina"
Data.s "northdakota","ohio","oklahoma","oregon","pennsylvania","rhodeisland"
Data.s "southcarolina","southdakota","tennessee","texas","utah","vermont"
Data.s "virginia","washington","westvirginia","wisconsin","wyoming"
Data.s ""
EndDataSection

Declare.s Sort(txt$)

Procedure DoIt()
  
  Global states, state, state1, state2, seen$=" ", key$, txt$
  ;time = ElapsedMilliseconds()
  ; Get the data
  Restore DataStart
  Read states
  Dim states$(states)
  For state = 1 To states
    Read.s states$(state)
  Next state
  ; Solve (find only the first solution)
  For state1 = 2 To states
    For state2 = 1 To state1-1
      key$ = states$(state1)+states$(state2)
      key$ = Sort(key$)
      If FindString(seen$, " "+key$+" ", 1)
        txt$ = "Found:  "+states$(state1)+" and "+states$(state2)+#CRLF$
        txt$ + "The corresponding other two states are undetermined"+#CRLF$+#CRLF$
        ;txt$ + "Elapsed time = "+Str(-time)+" milliseconds"
        ;MessageRequester("U.S. States Solution", txt$)
        Break
      EndIf
      seen$ + key$+" "
    Next state2
  Next state1
EndProcedure

Procedure.s Sort(txt$)
; Sort individual characters within txt$
  Protected size, p
  size = Len(txt$)
  Dim a$(size)
  ; Copy txt$ to array
  For p = 1 To size
    a$(p) = Mid(txt$, p, 1)
  Next p
  SortArray(a$(), #PB_Sort_Ascending)
  ; Copy array to txt$
  txt$ = ""
  For p = 1 To size
    txt$ + a$(p)
  Next p
  ProcedureReturn txt$
EndProcedure


#Tries = 20

time = ElapsedMilliseconds()
For U = 0 To #Tries
  DoIt()
Next
MessageRequester("", Str(#Tries) + " iterations took:" + #CRLF$ + Str(ElapsedMilliseconds()-time))

My version which I made without reading anyone elses:

Code: Select all

DataSection
  DataStart:
  Data.s "alabama","alaska","arizona","arkansas","california","colorado"
  Data.s "connecticut","delaware","florida","georgia","hawaii","idaho"
  Data.s "illinois","indiana","iowa","kansas","kentucky","louisiana"
  Data.s "maine","maryland","massachusetts","michigan","minnesota"
  Data.s "mississippi","missouri","montana","nebraska","nevada"
  Data.s "newhampshire","newjersey","newmexico","newyork","northcarolina"
  Data.s "northdakota","ohio","oklahoma","oregon","pennsylvania","rhodeisland"
  Data.s "southcarolina","southdakota","tennessee","texas","utah","vermont"
  Data.s "virginia","washington","westvirginia","wisconsin","wyoming"
EndDataSection

Structure Charray
  C.c[0]
EndStructure

Procedure QuickSortStringAux(*String.Charray, Start.i, Nd.i)
  If Start < Nd
    Protected Pivot.c = *String\c[Nd]
    Protected I.i = Start
    Protected J.i = Nd
    While I <> J
      If *String\C[I] < Pivot
        I + 1
      Else
        *String\C[J] = *String\c[I]
        *String\C[I] = *String\c[J-1]
        J - 1
      EndIf
    Wend
    *String\c[J] = Pivot
    QuickSortStringAux(*String, Start, J-1)
    QuickSortStringAux(*String, J+1, Nd)
  EndIf
EndProcedure

Procedure.s QuickSortString(String.s)
  QuickSortStringAux(@String, 0, Len(String)-1)
  ProcedureReturn String
EndProcedure

Structure SPair
  S1.s
  S2.s
EndStructure

Procedure DoIt2()
  
  Dim States.s(49)
  Dim StatePairs.SPair(49*25)
  Protected I.i
  
  Restore DataStart
  For I = 0 To 49
    Read.s States(I)
  Next
  For I = 0 To 49
    For J = I+1 To 49
      StatePairs(K)\S1 = states(I) + ", " + states(J)
      StatePairs(K)\S2 = QuickSortString(states(I) + States(J))
      K + 1
    Next
  Next
  
  SortStructuredArray(StatePairs(), #PB_Sort_Ascending, OffsetOf(SPair\S2), #PB_Sort_String)
  
  For I = 0 To 49*25-1
    If StatePairs(I)\S2 = StatePairs(I+1)\S2
      ;MessageRequester("Found", StatePairs(I)\S1 + " -> " + StatePairs(I+1)\S1)
      Break
    EndIf
  Next
EndProcedure

#Tries = 500

time = ElapsedMilliseconds()
For U = 0 To #Tries
  DoIt2()
Next
MessageRequester("", Str(#Tries) + " iterations took:" + #CRLF$ + Str(ElapsedMilliseconds()-time))

My results are:
Original, 20 iterations: 1969 ms
Mine, 500 iterations: 1953 ms

I think my its my sort procedure which is faster.

How to use a hash table for this is beyond me, would be nice if someone could make a python version that solved the problem many times so we could properly time it.
User avatar
Demivec
Addict
Addict
Posts: 4283
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Post by Demivec »

@Trond: I rearranged my previous version posted to match your run-it-500-times special. Yours took 2484 ms and mine took 1500ms. And yes, it seems your computer is a bit faster than mine.:wink:

My sort is at least twice as fast as yours. The code for it was made in a creative moment a month or so ago.8)

Trond wrote:Also, there are only two states, not two pairs of states, so your program correctly finds everything there is to find.
There are two pairs of states. If there was only one you wouldn't find a match. Both your version and mine, prints the solution as one pair of states that is equal to another pair of states (that's 2 pairs total). akj's only prints one.

Code: Select all

DataSection
  statesData:
  Data.i 50
  Data.s "alabama","alaska","arizona","arkansas","california","colorado"
  Data.s "connecticut","delaware","florida","georgia","hawaii","idaho"
  Data.s "illinois","indiana","iowa","kansas","kentucky","louisiana"
  Data.s "maine","maryland","massachusetts","michigan","minnesota"
  Data.s "mississippi","missouri","montana","nebraska","nevada"
  Data.s "newhampshire","newjersey","newmexico","newyork","northcarolina"
  Data.s "northdakota","ohio","oklahoma","oregon","pennsylvania","rhodeisland"
  Data.s "southcarolina","southdakota","tennessee","texas","utah","vermont"
  Data.s "virginia","washington","westvirginia","wisconsin","wyoming"
  Data.s ""
EndDataSection

Structure statePair
  key$
  state_1.i
  state_2.i
EndStructure

Procedure.s Sort(*word.Character, wordLength) ;sorts the letters of a word without creating a new string
  Protected Dim letters.c(wordLength)
  Protected *letAdr = @letters()
  
  CopyMemoryString(*word, @*letAdr)
  SortArray(letters(), #PB_Sort_Ascending, 0, wordLength - 1)
  CopyMemoryString(@letters(), @*word)
EndProcedure

Procedure doIt3()
  Define time, states, i, state1, state2, key$, txt$
  ; Get the data
  Restore statesData
  Read states
  Protected Dim states$(states)
  For i = 1 To states
    Read.s states$(i)
  Next i
  
  Protected NewList statePairs.statePair()
  For state1 = 1 To states - 1
    For state2 = state1 + 1 To states
      key$ = states$(state1)+states$(state2)
      Sort(@key$,Len(key$))
      AddElement(statePairs())
      statePairs()\key$ = key$
      statePairs()\state_1 = state1
      statePairs()\state_2 = state2
    Next state2
  Next state1
  SortStructuredList(statePairs(),#PB_Sort_Ascending,OffsetOf(statePair\key$),#PB_Sort_String)
  
  ForEach statePairs()
    With statePairs()
      If \key$ = prevStatePair$
        If prevMatch
          txt$ = "       " + states$(\state_1) + " = " + states$(\state_1) + " + " + states$(\state_2) + #CRLF$
        Else
          txt$ = states$(prevState_1) + " + " + states$(prevState_2) + " = " + states$(\state_1) + " + " + states$(\state_2) + #CRLF$
          prevMatch = #True
        EndIf
      Else
        prevStatePair$ = \key$
        prevState_1 = \state_1
        prevState_2 = \state_2
        prevMatch = #False
      EndIf
    EndWith
  Next
  ;MessageRequester("Results",txt$)
EndProcedure

#Tries = 500

Define time, u

time = ElapsedMilliseconds()
For u = 0 To #Tries
  doIt3()
Next
MessageRequester("", Str(#Tries) + " iterations took:" + #CRLF$ + Str(ElapsedMilliseconds()-time))

End
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

@Trond: I rearranged my previous version posted to match your run-it-500-times special. Yours took 2484 ms and mine took 1500ms. And yes, it seems your computer is a bit faster than mine.
Yours with debugger is about as fast as mine without here, which is very impressive.
Demivec wrote:My sort is at least twice as fast as yours. The code for it was made in a creative moment a month or so ago.8)
Yes, I saw it, very clever.
Trond wrote:Also, there are only two states, not two pairs of states, so your program correctly finds everything there is to find.
There are two pairs of states. If there was only one you wouldn't find a match. Both your version and mine, prints the solution as one pair of states that is equal to another pair of states (that's 2 pairs total). akj's only prints one.
You're right, I must have been asleep!

I think that my version is slower because I use a string to store all the possible state combinations when I just need to display two of the combinations. You only generate strings for the combinations you need based on integers.
akj
Enthusiast
Enthusiast
Posts: 668
Joined: Mon Jun 09, 2003 10:08 pm
Location: Nottingham

Post by akj »

I was hoping that something useful would arise from this posting and it most certainly has.
Many thanks Demivec and Trond for the REALLY useful string sorting routines.

Also thank you Hroudtwolf for the Associative Array postings.
I really hope that one day Associative Arrays will be native to PureBasic as they are the natural (and very fast) data structure to use for quite a wide range of programming problems.
Anthony Jordan
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

I really hope that one day Associative Arrays will be native to PureBasic as they are the natural (and very fast) data structure to use for quite a wide range of programming problems.
I still don't understand how to solve this problem with associative arrays, can you explain the idea quickly?
akj
Enthusiast
Enthusiast
Posts: 668
Joined: Mon Jun 09, 2003 10:08 pm
Location: Nottingham

Post by akj »

@Trond:

The crucial concepts about Associative Arrays is that you can:

1. Store data (the names of two US states) under a key (the sorted anagram of those two concatenated state names). A flexible implementation of Associative Arrays will permit that key to be numeric or non-numeric.

2. Later, on trying to store the same key again (the sorted concatenation of two OTHER state names) you will immediately discover that duplication has occurred and you can then easily retrieve the original two state names from the Array to solve the problem.

Due [normally] to hashing, all the above is very fast.
Anthony Jordan
User avatar
djes
Addict
Addict
Posts: 1806
Joined: Sat Feb 19, 2005 2:46 pm
Location: Pas-de-Calais, France

Post by djes »

Associative arrays are very useful! In PHP, you use them all the time, it's really powerful, especially when it comes to string handling and databases.
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Post by Trond »

akj wrote:@Trond:

The crucial concepts about Associative Arrays is that you can:

1. Store data (the names of two US states) under a key (the sorted anagram of those two concatenated state names). A flexible implementation of Associative Arrays will permit that key to be numeric or non-numeric.

2. Later, on trying to store the same key again (the sorted concatenation of two OTHER state names) you will immediately discover that duplication has occurred and you can then easily retrieve the original two state names from the Array to solve the problem.

Due [normally] to hashing, all the above is very fast.
Aha, so you still need the sorting. Thanks for the explanation.
Post Reply