Page 1 of 1
mbsicmp_() Ä <> ä ?!
Posted: Tue Feb 16, 2016 11:06 am
by nco2k
can someone please explain why mbsicmp_() keeps failing?
Code: Select all
Import "MSVCRT.LIB"
CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
setlocale_(Category, Locale.p-ascii) As "_setlocale"
stricmp_(*String1, *String2) As "__stricmp"
wcsicmp_(*String1, *String2) As "__wcsicmp"
mbsicmp_(*String1, *String2) As "__mbsicmp"
CompilerElse
setlocale_(Category, Locale.p-ascii) As "setlocale"
stricmp_(*String1, *String2) As "_stricmp"
wcsicmp_(*String1, *String2) As "_wcsicmp"
mbsicmp_(*String1, *String2) As "_mbsicmp"
CompilerEndIf
EndImport
*Ascii1 = AllocateMemory(10)
PokeS(*Ascii1, "Ä", -1, #PB_Ascii)
*Ascii2 = AllocateMemory(10)
PokeS(*Ascii2, "ä", -1, #PB_Ascii)
*Unicode1 = AllocateMemory(10)
PokeS(*Unicode1, "Ä", -1, #PB_Unicode)
*Unicode2 = AllocateMemory(10)
PokeS(*Unicode2, "ä", -1, #PB_Unicode)
*UTF81 = AllocateMemory(10)
PokeS(*UTF81, "Ä", -1, #PB_UTF8)
*UTF82 = AllocateMemory(10)
PokeS(*UTF82, "ä", -1, #PB_UTF8)
setlocale_(0, "")
Debug stricmp_(*Ascii1, *Ascii2); = 0 (equal)
Debug wcsicmp_(*Unicode1, *Unicode2); = 0 (equal)
Debug mbsicmp_(*UTF81, *UTF82); <> 0 (not equal) why?
c ya,
nco2k
Re: mbsicmp_() Ä <> ä ?!
Posted: Tue Feb 16, 2016 8:50 pm
by Michael Vogel
Try...
Code: Select all
Debug mbsicmp_("ä","Ä")
; setlocale_(#Null,"de_DE.utf8"); Linux
; setlocale_(#Null,"de_DE"); OSX
setlocale_(#Null,"german"); Windows
Debug mbsicmp_("ä","Ä")
Re: mbsicmp_() Ä <> ä ?!
Posted: Wed Feb 17, 2016 3:44 am
by nco2k
thats not working. you are using a pb unicode/ascii string instead of a utf8 string. if it works, then its pretty much coincidence. use a buffer instead and you will see.
c ya,
nco2k
Re: mbsicmp_() Ä <> ä ?!
Posted: Wed Feb 17, 2016 12:07 pm
by Michael Vogel
nco2k wrote:thats not working. you are using a pb unicode/ascii string instead of a utf8 string. if it works, then its pretty much coincidence. use a buffer instead and you will see.
c ya,
nco2k
Could you give me an example code?
Re: mbsicmp_() Ä <> ä ?!
Posted: Wed Feb 17, 2016 12:31 pm
by nco2k
"ä" and "Ä" is a pb ascii or unicode string, depending on your compiler setting.
stricmp_() expects an ascii string.
wcsicmp_() expects an unicode string.
mbsicmp_() expects an utf8 string.
but you are passing a pointer to an ascii/unicode string instead.
compile in unicode-mode and it will return 0 (equal) which is wrong:
Code: Select all
setlocale_(#Null, "german")
Debug mbsicmp_("äö","ää")
and here is the reason why:
Code: Select all
"äö" in unicode:
$E4, $00, $F6, $00, $00, $00
"ää" in unicode:
$E4, $00, $E4, $00, $00, $00
unicode strings are terminated by 2 null bytes. utf8 strings are terminated by 1 null byte. the function will stop as soon as it hits the first zero and think the strings are equal, since it didnt even get to the missmatching character.
c ya,
nco2k
Re: mbsicmp_() Ä <> ä ?!
Posted: Wed Feb 17, 2016 5:54 pm
by Michael Vogel
oh, I see..
I still try to find a solution, but with no success - even failed to find correct values for different LC... constants (so I just tried 0 for setlocale which seems to return a non-zero value when everything is fine).
Re: mbsicmp_() Ä <> ä ?!
Posted: Fri Apr 22, 2016 4:55 pm
by nco2k
here are the constants:
Code: Select all
#LC_ALL = 0
#LC_COLLATE = 1
#LC_CTYPE = 2
#LC_MONETARY = 3
#LC_NUMERIC = 4
#LC_TIME = 5
#MB_CP_SBCS = 0
#MB_CP_OEM = -2
#MB_CP_ANSI = -3
#MB_CP_LOCALE = -4
i even tried setmbcp_(), but result is always the same.
Code: Select all
setlocale_(#LC_CTYPE, "")
setmbcp_(#MB_CP_LOCALE)
Debug mbsicmp_(*UTF81, *UTF82)
what am i missing?
c ya,
nco2k
Re: mbsicmp_() Ä <> ä ?!
Posted: Sat Apr 23, 2016 9:46 pm
by Thunder93
nco2k. You think there's problem with PB PokeS() regarding working with UTF8?
Here's what It looks like in the memory after poked the two characters.
;Ää[NULL]
; Character Table
; 00000000 195 132 195 164 0 0 0 0 0 0
Code: Select all
Debug Chr(195)+Chr(132)
Debug Chr(195)+Chr(164)
Re: mbsicmp_() Ä <> ä ?!
Posted: Sat Apr 23, 2016 10:07 pm
by nco2k
Thunder93 wrote:You think there's problem with PB PokeS() regarding working with UTF8?
no? this thread has nothing to do with Peek/PokeS(), CompareMemoryString() or any other purebasic function. i simply asked a question about mbsicmp_().
c ya,
nco2k
Re: mbsicmp_() Ä <> ä ?!
Posted: Sun May 08, 2016 9:03 pm
by JHPJHP
Hi nco2k,
From what I read UTF8 is not directly supported using multibyte-character strings, see what you make of the following.
http://www.globalyzer.com/gzserver/help ... ricmp.html
mbcsicmp wrote:The arguments of _mbsicmp are multibyte-character strings
...
I18n Issues
... On Windows platforms, call _mbcsicmp or _wcsicmp. On ANSI UTF-8 platforms, convert the UTF-8 strings to wide character strings and then call wcscasecmp on the wide strings.
... For Windows MBCS support, ensure that the multibyte code page is set correctly. See _setmbcp for information on setting up the multibyte code page.
https://msdn.microsoft.com/en-us/library/883tf19a.aspx
setmbcp wrote:... or other operating-system-supported code page (except UTF-7 and UTF-8, which are not supported).
Re: mbsicmp_() Ä <> ä ?!
Posted: Thu May 19, 2016 3:15 pm
by nco2k
thanks, that might actually explain it.
c ya,
nco2k