I thought this could be of interest to anyone working with Unicode.
I encountered a problem earlier related to the fact that different Unicode code-points can, linguistically, represent the same character(s).
For example, the strings Chr($00C4) and Chr($0041) + Chr($0308) (in Unicode), when rendered on screen, produce the same character (Ä) but are clearly encoded differently. A string comparison of these two strings will yield a #False result which, in some circumstances, would be undesirable.
To see this, run the following code (enable the Unicode compiler option) :
Code: Select all
a1$ = Chr($00C4)
a2$ = Chr($0041) + Chr($0308)
If a1$ = a2$
MessageRequester("", "a1$ = " + a1$ + ", a2$ = " + a2$ + #LF$ + #LF$ + "a1$ = a2$ returns #True!")
Else
MessageRequester("", "a1$ = " + a1$ + ", a2$ = " + a2$ + #LF$ + #LF$ + "a1$ = a2$ returns #False!")
EndIf
Try the following to see this in action (enable the Unicode compiler option - Vista / Win 7 only) :
Code: Select all
Enumeration
#NormalizationOther = 0
#NormalizationC = 1
#NormalizationD = 2
#NormalizationKC = 5
#NormalizationKD = 6
EndEnumeration
;Need to load the NormalizeString() function.
Prototype.i protNormalizeString(NormForm, lpSrcString, cwSrcLength, lpDstString, cwDstLength)
Global NormalizeString.protNormalizeString
If OpenLibrary(1, "Normaliz.dll") = 0
MessageRequester("Unicode normalization...", "Could not load the NormalizeString() function.")
End
EndIf
NormalizeString = GetFunction(1, "NormalizeString")
a1$ = Chr($00C4)
a2$ = Chr($0041) + Chr($0308)
estimatedLength = NormalizeString(#NormalizationC, @a2$, -1, 0, 0)
If estimatedLength
newa2$ = Space(estimatedLength)
NormalizeString(#NormalizationC, @a2$, -1, @newa2$, estimatedLength)
If a1$ = newa2$
MessageRequester("Unicode normalization...", "a1$ = " + a1$ + ", newa2$ (normalized version of a2$) = " + newa2$ + #LF$ + #LF$ + "a1$ = newa2$ returns #True!")
Else
MessageRequester("Unicode normalization...", "a1$ = " + a1$ + ", newa2$ (normalized version of a2$) = " + newa2$ + #LF$ + #LF$ + "a1$ = newa2$ returns #False!")
EndIf
EndIf
CloseLibrary(1)
