Page 1 of 1

Fuzzy Search

Posted: Fri Apr 08, 2016 4:32 pm
by em_uk
Hi guys and gals.

I'm making an app that copies files to an HDF image. One of the things I would like to do is to be able to filter my source files.

To do this I'd like to use a fuzzy search, so for example I have a folder with 10,000 titles inside. I want to find all the "ocean" games, I type Ocean in to my filter box and it will filter the list to show any titles with Ocean in the title.

I'm using an Explorer form at the moment for my source but think I need to write something custom.

Any idea on how this would be achieved (the fuzzy search bit)

Thanks

Re: Fuzzy Search

Posted: Fri Apr 08, 2016 7:16 pm
by kenmo
A basic start...

Code: Select all

Path.s = GetTemporaryDirectory() + "Fuzzy/"
CreateDirectory(Path)

RandomSeed(0)
For i = 1 To 100
  File.s = Str(Random(1000)) + " " + Str(Random(1000)) + ".dat"
  File = ReplaceString(File, "9", " ocean ")
  If CreateFile(0, Path + File)
    CloseFile(0)
  EndIf
Next i

; (see below)
CreateRegularExpression(0, ".*ocean.*", #PB_RegularExpression_NoCase)

If ExamineDirectory(0, Path, "")
  While NextDirectoryEntry(0)
    File.s = DirectoryEntryName(0)
    
    ; Simple FindString
    If (FindString(File, "ocean", 1, #PB_String_NoCase))
      Debug File
    EndIf
    
    ; RegEx (more powerful)
    If (MatchRegularExpression(0, File))
      Debug File
    EndIf
    
  Wend
  FinishDirectory(0)
EndIf

Re: Fuzzy Search

Posted: Fri Apr 08, 2016 7:50 pm
by infratec
Why not:

Code: Select all

If ExamineDirectory(0, Path, "*ocean*.*")
  While NextDirectoryEntry(0)
    Debug DirectoryEntryName(0)
  Wend
  FinishDirectory(0)
EndIf
But this has nothing todo with Fuzzy :wink:

Bernd

Re: Fuzzy Search

Posted: Sat Apr 09, 2016 2:06 am
by kenmo
That works too. I didn't do that in my example because
(a) I wanted to show a RegEx match in the loop
(b) on Linux/Mac, I think ExamineDirectory matches case-sensitive, which I assume em_uk does not want(?)

Re: Fuzzy Search

Posted: Sat Apr 09, 2016 2:40 am
by Deluxe0321
A good way to perform a fuzzy search is to use the algo provided by Levenshtein.
Wiki: https://en.wikipedia.org/wiki/Levenshtein_distance
Code: https://github.com/acmeism/RosettaCodeD ... .purebasic

In combination with a prefilled list ( iterate all elements ) or life (check while iterating ), depending on how deep the structure is, the search itself is quite fast --> O(m*n).
You may have to tinker with the distance ( I commonly use a distance of <= 2 ---> means two different letters found in the compared string).

Re: Fuzzy Search

Posted: Sat Apr 09, 2016 9:33 am
by Keya
em, by "fuzzy search" do you actually mean complex search patterns (like regular expressions etc)? because the "search for ocean in the title" example you gave indicates it could be as simple as having the user enter a search word, and then simply enumerating through the list using FindString(nextword, wordtofind)

Re: Fuzzy Search

Posted: Sat Apr 09, 2016 11:36 am
by em_uk
Hi, thanks for the ideas so far.

Fuzzy search as in simple search terms, not complex regex, just so that titles can be filtered easily.

The source windows is an explorer gadget which the user would browse to a folder with the source files in, but say he wants to add just games by codemasters he could type "codemasters" into the filter and the explorer gadget would filter the rest out.

infratec solution seems to work quite well in my testing and means I wont have to create my own file explorer.

Thanks Keya for the quick framework, if my app needs finer control that will come in very handy.