Page 1 of 1

mbsicmp_() Ä <> ä ?!

Posted: Tue Feb 16, 2016 11:06 am
by nco2k
can someone please explain why mbsicmp_() keeps failing?

Code: Select all

Import "MSVCRT.LIB"
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    setlocale_(Category, Locale.p-ascii) As "_setlocale"
    stricmp_(*String1, *String2) As "__stricmp"
    wcsicmp_(*String1, *String2) As "__wcsicmp"
    mbsicmp_(*String1, *String2) As "__mbsicmp"
  CompilerElse
    setlocale_(Category, Locale.p-ascii) As "setlocale"
    stricmp_(*String1, *String2) As "_stricmp"
    wcsicmp_(*String1, *String2) As "_wcsicmp"
    mbsicmp_(*String1, *String2) As "_mbsicmp"
  CompilerEndIf
EndImport

*Ascii1 = AllocateMemory(10)
PokeS(*Ascii1, "Ä", -1, #PB_Ascii)
*Ascii2 = AllocateMemory(10)
PokeS(*Ascii2, "ä", -1, #PB_Ascii)

*Unicode1 = AllocateMemory(10)
PokeS(*Unicode1, "Ä", -1, #PB_Unicode)
*Unicode2 = AllocateMemory(10)
PokeS(*Unicode2, "ä", -1, #PB_Unicode)

*UTF81 = AllocateMemory(10)
PokeS(*UTF81, "Ä", -1, #PB_UTF8)
*UTF82 = AllocateMemory(10)
PokeS(*UTF82, "ä", -1, #PB_UTF8)

setlocale_(0, "")
Debug stricmp_(*Ascii1, *Ascii2); = 0 (equal)
Debug wcsicmp_(*Unicode1, *Unicode2); = 0 (equal)
Debug mbsicmp_(*UTF81, *UTF82); <> 0 (not equal) why?
c ya,
nco2k

Re: mbsicmp_() Ä <> ä ?!

Posted: Tue Feb 16, 2016 8:50 pm
by Michael Vogel
Try...

Code: Select all

Debug mbsicmp_("ä","Ä")
; setlocale_(#Null,"de_DE.utf8");	Linux
; setlocale_(#Null,"de_DE");		OSX
setlocale_(#Null,"german"); 		Windows
Debug mbsicmp_("ä","Ä")

Re: mbsicmp_() Ä <> ä ?!

Posted: Wed Feb 17, 2016 3:44 am
by nco2k
thats not working. you are using a pb unicode/ascii string instead of a utf8 string. if it works, then its pretty much coincidence. use a buffer instead and you will see.

c ya,
nco2k

Re: mbsicmp_() Ä <> ä ?!

Posted: Wed Feb 17, 2016 12:07 pm
by Michael Vogel
nco2k wrote:thats not working. you are using a pb unicode/ascii string instead of a utf8 string. if it works, then its pretty much coincidence. use a buffer instead and you will see.

c ya,
nco2k
Could you give me an example code?

Re: mbsicmp_() Ä <> ä ?!

Posted: Wed Feb 17, 2016 12:31 pm
by nco2k
"ä" and "Ä" is a pb ascii or unicode string, depending on your compiler setting.

stricmp_() expects an ascii string.
wcsicmp_() expects an unicode string.
mbsicmp_() expects an utf8 string.

but you are passing a pointer to an ascii/unicode string instead.

compile in unicode-mode and it will return 0 (equal) which is wrong:

Code: Select all

setlocale_(#Null, "german")
Debug mbsicmp_("äö","ää")
and here is the reason why:

Code: Select all

"äö" in unicode:
$E4, $00, $F6, $00, $00, $00

"ää" in unicode:
$E4, $00, $E4, $00, $00, $00
unicode strings are terminated by 2 null bytes. utf8 strings are terminated by 1 null byte. the function will stop as soon as it hits the first zero and think the strings are equal, since it didnt even get to the missmatching character.

c ya,
nco2k

Re: mbsicmp_() Ä <> ä ?!

Posted: Wed Feb 17, 2016 5:54 pm
by Michael Vogel
oh, I see..
I still try to find a solution, but with no success - even failed to find correct values for different LC... constants (so I just tried 0 for setlocale which seems to return a non-zero value when everything is fine).

Re: mbsicmp_() Ä <> ä ?!

Posted: Fri Apr 22, 2016 4:55 pm
by nco2k
here are the constants:

Code: Select all

#LC_ALL = 0
#LC_COLLATE = 1
#LC_CTYPE = 2
#LC_MONETARY = 3
#LC_NUMERIC = 4
#LC_TIME = 5

#MB_CP_SBCS = 0
#MB_CP_OEM = -2
#MB_CP_ANSI = -3
#MB_CP_LOCALE = -4
i even tried setmbcp_(), but result is always the same.

Code: Select all

setlocale_(#LC_CTYPE, "")
setmbcp_(#MB_CP_LOCALE)
Debug mbsicmp_(*UTF81, *UTF82)
what am i missing? :?

c ya,
nco2k

Re: mbsicmp_() Ä <> ä ?!

Posted: Sat Apr 23, 2016 9:46 pm
by Thunder93
nco2k. You think there's problem with PB PokeS() regarding working with UTF8?

Here's what It looks like in the memory after poked the two characters.

;Ää[NULL]
; Character Table
; 00000000 195 132 195 164 0 0 0 0 0 0

Code: Select all

Debug Chr(195)+Chr(132)
Debug Chr(195)+Chr(164)

Re: mbsicmp_() Ä <> ä ?!

Posted: Sat Apr 23, 2016 10:07 pm
by nco2k
Thunder93 wrote:You think there's problem with PB PokeS() regarding working with UTF8?
no? this thread has nothing to do with Peek/PokeS(), CompareMemoryString() or any other purebasic function. i simply asked a question about mbsicmp_().

c ya,
nco2k

Re: mbsicmp_() Ä <> ä ?!

Posted: Sun May 08, 2016 9:03 pm
by JHPJHP
Hi nco2k,

From what I read UTF8 is not directly supported using multibyte-character strings, see what you make of the following.

http://www.globalyzer.com/gzserver/help ... ricmp.html
mbcsicmp wrote:The arguments of _mbsicmp are multibyte-character strings
...
I18n Issues
... On Windows platforms, call _mbcsicmp or _wcsicmp. On ANSI UTF-8 platforms, convert the UTF-8 strings to wide character strings and then call wcscasecmp on the wide strings.
... For Windows MBCS support, ensure that the multibyte code page is set correctly. See _setmbcp for information on setting up the multibyte code page.
https://msdn.microsoft.com/en-us/library/883tf19a.aspx
setmbcp wrote:... or other operating-system-supported code page (except UTF-7 and UTF-8, which are not supported).

Re: mbsicmp_() Ä <> ä ?!

Posted: Thu May 19, 2016 3:15 pm
by nco2k
thanks, that might actually explain it.

c ya,
nco2k