SQLite case insensitive query with greek utf8 chars

For everything that's not in any way related to PureBasic. General chat etc...
User avatar
doctorized
Addict
Addict
Posts: 856
Joined: Fri Mar 27, 2009 9:41 am
Location: Athens, Greece

SQLite case insensitive query with greek utf8 chars

Post by doctorized »

I have a db with uft8 values in greek like:

Code: Select all

ΓΙΑΝΝΗΣ
Γιάννης
Γιαννης
(it is the word John in greek) and I want to write a query to be able to find all these 3 instances. In other words, I have words in upper case, words in lower case, words with or without tonos (= the mark ' on letter α, second row in my example). I tried adding " COLLATE NOCASE" at the end of the query and didn't help. Tried:

Code: Select all

SELECT * FROM Table WHERE UPPER(item) LIKE 'UPPER(%text%')
and now I can find ΓΙΑΝΝΗΣ when searching for "γιαννησ". The last letter "ς" is handled as different than "Σ" as the lower case of "Σ" is "σ" (when a word ends with "σ", we use "ς" instead). What can I do to deal with mess?
Marc56us
Addict
Addict
Posts: 1479
Joined: Sat Feb 08, 2014 3:26 pm

Re: SQLite case insensitive query with greek utf8 chars

Post by Marc56us »

and now I can find ΓΙΑΝΝΗΣ when searching for "γιαννησ". The last letter "ς" is handled as different than "Σ" as the lower case of "Σ" is "σ" (when a word ends with "σ", we use "ς" instead). What can I do to deal with mess?
First of all, if they are names, do not put a wildcard (%) at the beginning.
use text%

Then, to solve the problem, use the phonetics functions (Soundex)
They are different according to the databases and languages.
From memory there were also some PB tests. You should find examples in the forum.
User avatar
doctorized
Addict
Addict
Posts: 856
Joined: Fri Mar 27, 2009 9:41 am
Location: Athens, Greece

Re: SQLite case insensitive query with greek utf8 chars

Post by doctorized »

Marc56us wrote:
and now I can find ΓΙΑΝΝΗΣ when searching for "γιαννησ". The last letter "ς" is handled as different than "Σ" as the lower case of "Σ" is "σ" (when a word ends with "σ", we use "ς" instead). What can I do to deal with mess?
First of all, if they are names, do not put a wildcard (%) at the beginning.
use text%

Then, to solve the problem, use the phonetics functions (Soundex)
They are different according to the databases and languages.
From memory there were also some PB tests. You should find examples in the forum.
I use % at the beginning because the full name may begin with the first name or the last name. I checked for soundex but no demo code showed up. What keywords should I use?
Marc56us
Addict
Addict
Posts: 1479
Joined: Sat Feb 08, 2014 3:26 pm

Re: SQLite case insensitive query with greek utf8 chars

Post by Marc56us »

doctorized wrote: I checked for soundex but no demo code showed up.
What keywords should I use?
Purebasic soundex

Sample
https://www.rosettacode.org/wiki/Soundex#PureBasic

The alternative method is to add a phonetic field to the database.
User avatar
doctorized
Addict
Addict
Posts: 856
Joined: Fri Mar 27, 2009 9:41 am
Location: Athens, Greece

Re: SQLite case insensitive query with greek utf8 chars

Post by doctorized »

Marc56us wrote:
doctorized wrote: I checked for soundex but no demo code showed up.
What keywords should I use?
Purebasic soundex

Sample
https://www.rosettacode.org/wiki/Soundex#PureBasic

The alternative method is to add a phonetic field to the database.
I think that it is not what I need. I need some sort of collation but I do not know how to create one. I have found the way to call sqlite3_create_collation but don't know how to use it.
thanos
Enthusiast
Enthusiast
Posts: 422
Joined: Sat Jan 12, 2008 3:25 pm
Location: Greece
Contact:

Re: SQLite case insensitive query with greek utf8 chars

Post by thanos »

@doctorized
Although it's been a long time passed (I just read the post), did you find a solution to the issue?

Thanos
» myPersonal Banker :: Because you do not need to have a master degree in economics in order to organize your finances!
User avatar
doctorized
Addict
Addict
Posts: 856
Joined: Fri Mar 27, 2009 9:41 am
Location: Athens, Greece

Re: SQLite case insensitive query with greek utf8 chars

Post by doctorized »

thanos wrote:@doctorized
Although it's been a long time passed (I just read the post), did you find a solution to the issue?

Thanos
I apologize for the delay, the notification mail was in the spam folder and saw it now. Yes, I found a solution. Not quite sure who gave me the code.

Code: Select all

#PatCmp_MatchAll = '%'
#PatCmp_MatchOne = '_'

Procedure.i PatCmp(*str.Unicode, *pat.Unicode, Array lut.u(1), esc.l = 0)
 
  ; wildcard matching algorithm based on code from
  ; http://dodobyte.com/wildcard.html
 
  Protected *pat_, *str_
 
  ; check arguments
  If *pat = #Null Or *str = #Null
    ProcedureReturn #False
  EndIf
 
  ; while *str has characters
  While *str\u
    If *pat\u
     
      ; handle 'match all'
      If *pat\u = #PatCmp_MatchAll
        *pat + 2
        If *pat\u = 0
          ProcedureReturn #True
        EndIf
        *pat_ = *pat
        *str_ = *str
        Continue
      EndIf
     
      ; handle escape character
      If *pat\u = esc
        *pat + 2
        If *pat\u = 0
          ProcedureReturn #False
        EndIf
      EndIf
		;Debug Chr(lut(*pat\u))
		;Debug Chr(lut(*str\u))
		;Debug "====="
      ; lookup characters and compare
      If lut(*pat\u) = lut(*str\u) Or *pat\u = #PatCmp_MatchOne
        *pat + 2
        *str + 2
        Continue
      EndIf
     
    EndIf
   
    ; characters didn't match
    If *pat_ = #Null
      ProcedureReturn #False
    EndIf
    *str_ + 2
    *str = *str_
    *pat = *pat_
   
  Wend
 
  ; *str has no more characters
  ; check if pattern ends with 'match all'
  While *pat\u = #PatCmp_MatchAll
    *pat + 2
  Wend
  If *pat\u = 0
    ProcedureReturn #True
  Else
    ProcedureReturn #False
  EndIf
 
EndProcedure



;EnableExplicit

;- Constants

#SQLITE_UTF16 = 4
#SQLITE_UTF8          = 1 

;- Imports

ImportC ""
  sqlite3_create_function16.i(*db, zFunctionName.p-unicode, nArg, eTextRep, *pApp, *xFunc, *xStep, *xFinal)
  sqlite3_result_int(*sqlite3_context, int)
  sqlite3_value_text16.i(*sqlite3_value)
EndImport


;- Globals

Global Dim LUT_Like.u(65535)


;- Procedures

Procedure LUT_Init()
 
  Protected c1.s = "ΑΆαάΒΓΔΕΈεέΖΗΉηήΘΙΪιίϊΐΚΛΜΝΞΟΌοόΠΡΣσςΤΥΎΫυύϋΰΦΧΨΩΏωώ"
  Protected c2.s = "ααααβγδεεεεζηηηηθιιιιιικλμνξοοοοπρσσστυυυυυυυφχψωωωω"
 ;CheckDatabaseUpdate(0, "INSERT INTO persons (name, weight) VALUES ('γιαννησ 1', '10')")
 ;CheckDatabaseUpdate(0, "INSERT INTO persons (name, weight) VALUES ('Γιάννης 2', '5')")
 ;CheckDatabaseUpdate(0, "INSERT INTO persons (name, weight) VALUES ('3 Γιαννησ', '20')")
 ;CheckDatabaseUpdate(0, "INSERT INTO persons (name, weight) VALUES ('4 ΓΙΑΝΝΗΣ', '30')")
  Protected.Unicode *c1=@c1, *c2=@c2
  Protected.i c
 
  For c = 0 To 65535
    LUT_Like(c) = c
  Next
 
  For c = 'A' To 'Z'
    LUT_Like(c) = c + 32
  Next
   
  While *c1\u And *c2\u
    LUT_Like(*c1\u) = *c2\u
    *c1 + 2
    *c2 + 2
  Wend
 
EndProcedure





ProcedureC Like(*context, argc, *argv.Integer)
 
  Protected.Unicode *pat, *str, *esc
  Protected.l esc
 
  ; get pattern and string
  *pat = sqlite3_value_text16(*argv\i)
  *argv + SizeOf(Integer)
  *str = sqlite3_value_text16(*argv\i)
 
  ; three argument LIKE
  If argc = 3
    *argv + SizeOf(Integer)
    *esc = sqlite3_value_text16(*argv\i)
    If *esc = #Null
      ProcedureReturn
    EndIf
    esc = *esc\u
    *esc + 2
    If *esc\u
      ProcedureReturn
    EndIf
  EndIf   
 
  ; set result
  sqlite3_result_int(*context, PatCmp(*str, *pat, LUT_Like(), esc))
 
EndProcedure




;- Main code
CompilerIf #PB_Compiler_IsMainFile
	LUT_Init()
	Define DB.i, StartTime.i, EndTime.i, i.i
	
	UseSQLiteDatabase()
	
	;DB = OpenDatabase(#PB_Any, "c:\users\doctorized\desktop\libdata.file", "", "", #PB_Database_SQLite)
	DB = OpenDatabase(#PB_Any, ":memory:", "", "", #PB_Database_SQLite)
	If DB
	
	 DatabaseUpdate(DB, "CREATE TABLE students (fname text)")
	 ; CompilerIf Not #PB_Compiler_Debugger
	 ;   For i = 0 To 10000
	 ; CompilerEndIf
	  DatabaseUpdate(DB, "INSERT INTO students VALUES ('Μαιουσύ')")
	  DatabaseUpdate(DB, "INSERT INTO students VALUES ('Μαΐουςΰ')")
	  DatabaseUpdate(DB, "INSERT INTO students VALUES ('μαίουσυ')")
	  DatabaseUpdate(DB, "INSERT INTO students VALUES ('ΜΑΪΟΥΣΥ')")
	 ; CompilerIf Not #PB_Compiler_Debugger
	 ;   Next i
	 ; CompilerEndIf
	 
	  sqlite3_create_function16(DatabaseID(DB), "like", -1, #SQLITE_UTF8, #Null, @Like(), #Null, #Null)
	 
	  StartTime = ElapsedMilliseconds()
	  If DatabaseQuery(DB, "SELECT fname FROM Students WHERE fname LIKE '%μαιουσ%'")
	    While NextDatabaseRow(DB)
	      ;Debug GetDatabaseString(DB, 0)
	    Wend
	    FinishDatabaseQuery(DB)
	  Else
	    ;Debug DatabaseError()
	  EndIf
	  EndTime = ElapsedMilliseconds()
	 
	 
	    MessageRequester("Needed", Str(EndTime - StartTime))
	
	 
	  CloseDatabase(DB)
	EndIf
CompilerEndIf
thanos
Enthusiast
Enthusiast
Posts: 422
Joined: Sat Jan 12, 2008 3:25 pm
Location: Greece
Contact:

Re: SQLite case insensitive query with greek utf8 chars

Post by thanos »

@doctorized
I apologize for the same reason :D
The notification mail was in the spam folder.
I will check your code today.
I am using a quite simpler approach. It is not optimized but it is working well for many years:

Code: Select all

Procedure.s Upper4SQL(ColName$, Text$)
  Protected Result$
  Protected i
  Protected Pos
  Protected Char$
  Protected CapitalLetters$   = "ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩΣΆΈΉΊΎΏABCDEFGHIJKLMNOPQRSTUVWXYZ"
  Protected SmallLetters$     = "αβγδεζηθικλμνξοπρστυφχψωςάέήίύώabcdefghijklmnopqrstuvwxyz"
  Protected NewList Letters.s()

  For i = 1 To Len(Text$)
    Char$ = Mid(Text$, i, 1)
    Pos   = FindString(SmallLetters$, Char$)

    AddElement(Letters())
    If (Pos > 0)
      Letters() = "'" + Char$ + "','" + Mid(CapitalLetters$, Pos, 1) + "'"
    Else
      Letters() = "'" + Char$ + "','" + Char$ + "'"
    EndIf
  Next
  ;~ Start to ConCat the string
  Result$ = ""
  ForEach Letters()
    Result$ + "Replace("
  Next

  Result$ + ColName$

  ForEach Letters()
    Result$ + ", " + Letters() + ")"
  Next

  ProcedureReturn Result$
EndProcedure

; Example
Define sField.s = "LastName"
Define sQuery.s = Upper4SQL(sField, "βαΚ")
Define sSQL.s   = "SELECT * FROM Clients WHERE " + sQuery + " LIKE '%ΒΑΚ%' ORDER BY LastName COLLATE NOCASE;"
Debug(sSQL)
It displays:
SELECT * FROM Clients WHERE Replace(Replace(Replace(LastName, 'β','Β'), 'α','Α'), 'Κ','Κ') LIKE '%ΒΑΚ%' ORDER BY LastName COLLATE NOCASE;

Check if it is useful to you.
» myPersonal Banker :: Because you do not need to have a master degree in economics in order to organize your finances!
Post Reply