It is currently Sat Sep 26, 2020 1:01 pm

All times are UTC + 1 hour




Post new topic Reply to topic  [ 7 posts ] 
Author Message
 Post subject: SQLite case insensitive query with greek utf8 chars
PostPosted: Sat Nov 23, 2019 8:47 pm 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Fri Mar 27, 2009 9:41 am
Posts: 752
Location: Athens, Greece
I have a db with uft8 values in greek like:
Code:
ΓΙΑΝΝΗΣ
Γιάννης
Γιαννης
(it is the word John in greek) and I want to write a query to be able to find all these 3 instances. In other words, I have words in upper case, words in lower case, words with or without tonos (= the mark ' on letter α, second row in my example). I tried adding " COLLATE NOCASE" at the end of the query and didn't help. Tried:
Code:
SELECT * FROM Table WHERE UPPER(item) LIKE 'UPPER(%text%')
and now I can find ΓΙΑΝΝΗΣ when searching for "γιαννησ". The last letter "ς" is handled as different than "Σ" as the lower case of "Σ" is "σ" (when a word ends with "σ", we use "ς" instead). What can I do to deal with mess?


Top
 Profile  
Reply with quote  
 Post subject: Re: SQLite case insensitive query with greek utf8 chars
PostPosted: Mon Nov 25, 2019 7:20 am 
Offline
Addict
Addict

Joined: Sat Feb 08, 2014 3:26 pm
Posts: 925
Quote:
and now I can find ΓΙΑΝΝΗΣ when searching for "γιαννησ". The last letter "ς" is handled as different than "Σ" as the lower case of "Σ" is "σ" (when a word ends with "σ", we use "ς" instead). What can I do to deal with mess?

First of all, if they are names, do not put a wildcard (%) at the beginning.
use text%

Then, to solve the problem, use the phonetics functions (Soundex)
They are different according to the databases and languages.
From memory there were also some PB tests. You should find examples in the forum.

_________________
(English is not my native language, I use an online translator.)


Top
 Profile  
Reply with quote  
 Post subject: Re: SQLite case insensitive query with greek utf8 chars
PostPosted: Mon Nov 25, 2019 2:53 pm 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Fri Mar 27, 2009 9:41 am
Posts: 752
Location: Athens, Greece
Marc56us wrote:
Quote:
and now I can find ΓΙΑΝΝΗΣ when searching for "γιαννησ". The last letter "ς" is handled as different than "Σ" as the lower case of "Σ" is "σ" (when a word ends with "σ", we use "ς" instead). What can I do to deal with mess?

First of all, if they are names, do not put a wildcard (%) at the beginning.
use text%

Then, to solve the problem, use the phonetics functions (Soundex)
They are different according to the databases and languages.
From memory there were also some PB tests. You should find examples in the forum.
I use % at the beginning because the full name may begin with the first name or the last name. I checked for soundex but no demo code showed up. What keywords should I use?


Top
 Profile  
Reply with quote  
 Post subject: Re: SQLite case insensitive query with greek utf8 chars
PostPosted: Mon Nov 25, 2019 4:28 pm 
Offline
Addict
Addict

Joined: Sat Feb 08, 2014 3:26 pm
Posts: 925
doctorized wrote:
I checked for soundex but no demo code showed up.
What keywords should I use?
Purebasic soundex

Sample
https://www.rosettacode.org/wiki/Soundex#PureBasic

The alternative method is to add a phonetic field to the database.

_________________
(English is not my native language, I use an online translator.)


Top
 Profile  
Reply with quote  
 Post subject: Re: SQLite case insensitive query with greek utf8 chars
PostPosted: Tue Nov 26, 2019 9:19 pm 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Fri Mar 27, 2009 9:41 am
Posts: 752
Location: Athens, Greece
Marc56us wrote:
doctorized wrote:
I checked for soundex but no demo code showed up.
What keywords should I use?
Purebasic soundex

Sample
https://www.rosettacode.org/wiki/Soundex#PureBasic

The alternative method is to add a phonetic field to the database.

I think that it is not what I need. I need some sort of collation but I do not know how to create one. I have found the way to call sqlite3_create_collation but don't know how to use it.


Top
 Profile  
Reply with quote  
 Post subject: Re: SQLite case insensitive query with greek utf8 chars
PostPosted: Thu Aug 20, 2020 5:37 pm 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Sat Jan 12, 2008 3:25 pm
Posts: 345
Location: Greece
@doctorized
Although it's been a long time passed (I just read the post), did you find a solution to the issue?

Thanos

_________________
» myPersonal Banker :: Because you do not need to have a master degree in economics in order to organize your finances!


Top
 Profile  
Reply with quote  
 Post subject: Re: SQLite case insensitive query with greek utf8 chars
PostPosted: Thu Sep 10, 2020 4:43 pm 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Fri Mar 27, 2009 9:41 am
Posts: 752
Location: Athens, Greece
thanos wrote:
@doctorized
Although it's been a long time passed (I just read the post), did you find a solution to the issue?

Thanos

I apologize for the delay, the notification mail was in the spam folder and saw it now. Yes, I found a solution. Not quite sure who gave me the code.
Code:
#PatCmp_MatchAll = '%'
#PatCmp_MatchOne = '_'

Procedure.i PatCmp(*str.Unicode, *pat.Unicode, Array lut.u(1), esc.l = 0)
 
  ; wildcard matching algorithm based on code from
  ; http://dodobyte.com/wildcard.html
 
  Protected *pat_, *str_
 
  ; check arguments
  If *pat = #Null Or *str = #Null
    ProcedureReturn #False
  EndIf
 
  ; while *str has characters
  While *str\u
    If *pat\u
     
      ; handle 'match all'
      If *pat\u = #PatCmp_MatchAll
        *pat + 2
        If *pat\u = 0
          ProcedureReturn #True
        EndIf
        *pat_ = *pat
        *str_ = *str
        Continue
      EndIf
     
      ; handle escape character
      If *pat\u = esc
        *pat + 2
        If *pat\u = 0
          ProcedureReturn #False
        EndIf
      EndIf
      ;Debug Chr(lut(*pat\u))
      ;Debug Chr(lut(*str\u))
      ;Debug "====="
      ; lookup characters and compare
      If lut(*pat\u) = lut(*str\u) Or *pat\u = #PatCmp_MatchOne
        *pat + 2
        *str + 2
        Continue
      EndIf
     
    EndIf
   
    ; characters didn't match
    If *pat_ = #Null
      ProcedureReturn #False
    EndIf
    *str_ + 2
    *str = *str_
    *pat = *pat_
   
  Wend
 
  ; *str has no more characters
  ; check if pattern ends with 'match all'
  While *pat\u = #PatCmp_MatchAll
    *pat + 2
  Wend
  If *pat\u = 0
    ProcedureReturn #True
  Else
    ProcedureReturn #False
  EndIf
 
EndProcedure



;EnableExplicit

;- Constants

#SQLITE_UTF16 = 4
#SQLITE_UTF8          = 1

;- Imports

ImportC ""
  sqlite3_create_function16.i(*db, zFunctionName.p-unicode, nArg, eTextRep, *pApp, *xFunc, *xStep, *xFinal)
  sqlite3_result_int(*sqlite3_context, int)
  sqlite3_value_text16.i(*sqlite3_value)
EndImport


;- Globals

Global Dim LUT_Like.u(65535)


;- Procedures

Procedure LUT_Init()
 
  Protected c1.s = "ΑΆαάΒΓΔΕΈεέΖΗΉηήΘΙΪιίϊΐΚΛΜΝΞΟΌοόΠΡΣσςΤΥΎΫυύϋΰΦΧΨΩΏωώ"
  Protected c2.s = "ααααβγδεεεεζηηηηθιιιιιικλμνξοοοοπρσσστυυυυυυυφχψωωωω"
 ;CheckDatabaseUpdate(0, "INSERT INTO persons (name, weight) VALUES ('γιαννησ 1', '10')")
 ;CheckDatabaseUpdate(0, "INSERT INTO persons (name, weight) VALUES ('Γιάννης 2', '5')")
 ;CheckDatabaseUpdate(0, "INSERT INTO persons (name, weight) VALUES ('3 Γιαννησ', '20')")
 ;CheckDatabaseUpdate(0, "INSERT INTO persons (name, weight) VALUES ('4 ΓΙΑΝΝΗΣ', '30')")
  Protected.Unicode *c1=@c1, *c2=@c2
  Protected.i c
 
  For c = 0 To 65535
    LUT_Like(c) = c
  Next
 
  For c = 'A' To 'Z'
    LUT_Like(c) = c + 32
  Next
   
  While *c1\u And *c2\u
    LUT_Like(*c1\u) = *c2\u
    *c1 + 2
    *c2 + 2
  Wend
 
EndProcedure





ProcedureC Like(*context, argc, *argv.Integer)
 
  Protected.Unicode *pat, *str, *esc
  Protected.l esc
 
  ; get pattern and string
  *pat = sqlite3_value_text16(*argv\i)
  *argv + SizeOf(Integer)
  *str = sqlite3_value_text16(*argv\i)
 
  ; three argument LIKE
  If argc = 3
    *argv + SizeOf(Integer)
    *esc = sqlite3_value_text16(*argv\i)
    If *esc = #Null
      ProcedureReturn
    EndIf
    esc = *esc\u
    *esc + 2
    If *esc\u
      ProcedureReturn
    EndIf
  EndIf   
 
  ; set result
  sqlite3_result_int(*context, PatCmp(*str, *pat, LUT_Like(), esc))
 
EndProcedure




;- Main code
CompilerIf #PB_Compiler_IsMainFile
   LUT_Init()
   Define DB.i, StartTime.i, EndTime.i, i.i
   
   UseSQLiteDatabase()
   
   ;DB = OpenDatabase(#PB_Any, "c:\users\doctorized\desktop\libdata.file", "", "", #PB_Database_SQLite)
   DB = OpenDatabase(#PB_Any, ":memory:", "", "", #PB_Database_SQLite)
   If DB
   
    DatabaseUpdate(DB, "CREATE TABLE students (fname text)")
    ; CompilerIf Not #PB_Compiler_Debugger
    ;   For i = 0 To 10000
    ; CompilerEndIf
     DatabaseUpdate(DB, "INSERT INTO students VALUES ('Μαιουσύ')")
     DatabaseUpdate(DB, "INSERT INTO students VALUES ('Μαΐουςΰ')")
     DatabaseUpdate(DB, "INSERT INTO students VALUES ('μαίουσυ')")
     DatabaseUpdate(DB, "INSERT INTO students VALUES ('ΜΑΪΟΥΣΥ')")
    ; CompilerIf Not #PB_Compiler_Debugger
    ;   Next i
    ; CompilerEndIf
   
     sqlite3_create_function16(DatabaseID(DB), "like", -1, #SQLITE_UTF8, #Null, @Like(), #Null, #Null)
   
     StartTime = ElapsedMilliseconds()
     If DatabaseQuery(DB, "SELECT fname FROM Students WHERE fname LIKE '%μαιουσ%'")
       While NextDatabaseRow(DB)
         ;Debug GetDatabaseString(DB, 0)
       Wend
       FinishDatabaseQuery(DB)
     Else
       ;Debug DatabaseError()
     EndIf
     EndTime = ElapsedMilliseconds()
   
   
       MessageRequester("Needed", Str(EndTime - StartTime))
   
   
     CloseDatabase(DB)
   EndIf
CompilerEndIf


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  

 


Powered by phpBB © 2008 phpBB Group
subSilver+ theme by Canver Software, sponsor Sanal Modifiye