mbsicmp_() Ä <> ä ?!

Windows specific forum
User avatar
nco2k
Addict
Addict
Posts: 1344
Joined: Mon Sep 15, 2003 5:55 am

mbsicmp_() Ä <> ä ?!

Post by nco2k »

can someone please explain why mbsicmp_() keeps failing?

Code: Select all

Import "MSVCRT.LIB"
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    setlocale_(Category, Locale.p-ascii) As "_setlocale"
    stricmp_(*String1, *String2) As "__stricmp"
    wcsicmp_(*String1, *String2) As "__wcsicmp"
    mbsicmp_(*String1, *String2) As "__mbsicmp"
  CompilerElse
    setlocale_(Category, Locale.p-ascii) As "setlocale"
    stricmp_(*String1, *String2) As "_stricmp"
    wcsicmp_(*String1, *String2) As "_wcsicmp"
    mbsicmp_(*String1, *String2) As "_mbsicmp"
  CompilerEndIf
EndImport

*Ascii1 = AllocateMemory(10)
PokeS(*Ascii1, "Ä", -1, #PB_Ascii)
*Ascii2 = AllocateMemory(10)
PokeS(*Ascii2, "ä", -1, #PB_Ascii)

*Unicode1 = AllocateMemory(10)
PokeS(*Unicode1, "Ä", -1, #PB_Unicode)
*Unicode2 = AllocateMemory(10)
PokeS(*Unicode2, "ä", -1, #PB_Unicode)

*UTF81 = AllocateMemory(10)
PokeS(*UTF81, "Ä", -1, #PB_UTF8)
*UTF82 = AllocateMemory(10)
PokeS(*UTF82, "ä", -1, #PB_UTF8)

setlocale_(0, "")
Debug stricmp_(*Ascii1, *Ascii2); = 0 (equal)
Debug wcsicmp_(*Unicode1, *Unicode2); = 0 (equal)
Debug mbsicmp_(*UTF81, *UTF82); <> 0 (not equal) why?
c ya,
nco2k
If OSVersion() = #PB_OS_Windows_ME : End : EndIf
User avatar
Michael Vogel
Addict
Addict
Posts: 2666
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Re: mbsicmp_() Ä <> ä ?!

Post by Michael Vogel »

Try...

Code: Select all

Debug mbsicmp_("ä","Ä")
; setlocale_(#Null,"de_DE.utf8");	Linux
; setlocale_(#Null,"de_DE");		OSX
setlocale_(#Null,"german"); 		Windows
Debug mbsicmp_("ä","Ä")
User avatar
nco2k
Addict
Addict
Posts: 1344
Joined: Mon Sep 15, 2003 5:55 am

Re: mbsicmp_() Ä <> ä ?!

Post by nco2k »

thats not working. you are using a pb unicode/ascii string instead of a utf8 string. if it works, then its pretty much coincidence. use a buffer instead and you will see.

c ya,
nco2k
If OSVersion() = #PB_OS_Windows_ME : End : EndIf
User avatar
Michael Vogel
Addict
Addict
Posts: 2666
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Re: mbsicmp_() Ä <> ä ?!

Post by Michael Vogel »

nco2k wrote:thats not working. you are using a pb unicode/ascii string instead of a utf8 string. if it works, then its pretty much coincidence. use a buffer instead and you will see.

c ya,
nco2k
Could you give me an example code?
User avatar
nco2k
Addict
Addict
Posts: 1344
Joined: Mon Sep 15, 2003 5:55 am

Re: mbsicmp_() Ä <> ä ?!

Post by nco2k »

"ä" and "Ä" is a pb ascii or unicode string, depending on your compiler setting.

stricmp_() expects an ascii string.
wcsicmp_() expects an unicode string.
mbsicmp_() expects an utf8 string.

but you are passing a pointer to an ascii/unicode string instead.

compile in unicode-mode and it will return 0 (equal) which is wrong:

Code: Select all

setlocale_(#Null, "german")
Debug mbsicmp_("äö","ää")
and here is the reason why:

Code: Select all

"äö" in unicode:
$E4, $00, $F6, $00, $00, $00

"ää" in unicode:
$E4, $00, $E4, $00, $00, $00
unicode strings are terminated by 2 null bytes. utf8 strings are terminated by 1 null byte. the function will stop as soon as it hits the first zero and think the strings are equal, since it didnt even get to the missmatching character.

c ya,
nco2k
If OSVersion() = #PB_OS_Windows_ME : End : EndIf
User avatar
Michael Vogel
Addict
Addict
Posts: 2666
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Re: mbsicmp_() Ä <> ä ?!

Post by Michael Vogel »

oh, I see..
I still try to find a solution, but with no success - even failed to find correct values for different LC... constants (so I just tried 0 for setlocale which seems to return a non-zero value when everything is fine).
User avatar
nco2k
Addict
Addict
Posts: 1344
Joined: Mon Sep 15, 2003 5:55 am

Re: mbsicmp_() Ä <> ä ?!

Post by nco2k »

here are the constants:

Code: Select all

#LC_ALL = 0
#LC_COLLATE = 1
#LC_CTYPE = 2
#LC_MONETARY = 3
#LC_NUMERIC = 4
#LC_TIME = 5

#MB_CP_SBCS = 0
#MB_CP_OEM = -2
#MB_CP_ANSI = -3
#MB_CP_LOCALE = -4
i even tried setmbcp_(), but result is always the same.

Code: Select all

setlocale_(#LC_CTYPE, "")
setmbcp_(#MB_CP_LOCALE)
Debug mbsicmp_(*UTF81, *UTF82)
what am i missing? :?

c ya,
nco2k
If OSVersion() = #PB_OS_Windows_ME : End : EndIf
User avatar
Thunder93
Addict
Addict
Posts: 1788
Joined: Tue Mar 21, 2006 12:31 am
Location: Canada

Re: mbsicmp_() Ä <> ä ?!

Post by Thunder93 »

nco2k. You think there's problem with PB PokeS() regarding working with UTF8?

Here's what It looks like in the memory after poked the two characters.

;Ää[NULL]
; Character Table
; 00000000 195 132 195 164 0 0 0 0 0 0

Code: Select all

Debug Chr(195)+Chr(132)
Debug Chr(195)+Chr(164)
ʽʽSuccess is almost totally dependent upon drive and persistence. The extra energy required to make another effort or try another approach is the secret of winning.ʾʾ --Dennis Waitley
User avatar
nco2k
Addict
Addict
Posts: 1344
Joined: Mon Sep 15, 2003 5:55 am

Re: mbsicmp_() Ä <> ä ?!

Post by nco2k »

Thunder93 wrote:You think there's problem with PB PokeS() regarding working with UTF8?
no? this thread has nothing to do with Peek/PokeS(), CompareMemoryString() or any other purebasic function. i simply asked a question about mbsicmp_().

c ya,
nco2k
If OSVersion() = #PB_OS_Windows_ME : End : EndIf
JHPJHP
Addict
Addict
Posts: 2129
Joined: Sat Oct 09, 2010 3:47 am
Contact:

Re: mbsicmp_() Ä <> ä ?!

Post by JHPJHP »

Hi nco2k,

From what I read UTF8 is not directly supported using multibyte-character strings, see what you make of the following.

http://www.globalyzer.com/gzserver/help ... ricmp.html
mbcsicmp wrote:The arguments of _mbsicmp are multibyte-character strings
...
I18n Issues
... On Windows platforms, call _mbcsicmp or _wcsicmp. On ANSI UTF-8 platforms, convert the UTF-8 strings to wide character strings and then call wcscasecmp on the wide strings.
... For Windows MBCS support, ensure that the multibyte code page is set correctly. See _setmbcp for information on setting up the multibyte code page.
https://msdn.microsoft.com/en-us/library/883tf19a.aspx
setmbcp wrote:... or other operating-system-supported code page (except UTF-7 and UTF-8, which are not supported).
User avatar
nco2k
Addict
Addict
Posts: 1344
Joined: Mon Sep 15, 2003 5:55 am

Re: mbsicmp_() Ä <> ä ?!

Post by nco2k »

thanks, that might actually explain it.

c ya,
nco2k
If OSVersion() = #PB_OS_Windows_ME : End : EndIf
Post Reply