Page 1 of 1

CatchXML(): Encoding can be misunderstood

Posted: Sun Apr 17, 2016 4:42 pm
by Sicro
PB help wrote:

Code: Select all

Result = CatchXML(#XML, *Address, Size [, Flags [, Encoding]])
Encoding (optional)
The encoding to use when loading the XML tree (this overwrites the encoding set in the XML declaration). Valid values are:
#PB_UTF8 (default)
#PB_Ascii
#PB_Unicode
With the parameter "Encoding" not you define in which encoding the characters from the memory are to be read, as it is the case with PeekS(), but with which coding SaveXML() has to save them later.
CatchXML() reads the string from the memory always as ascii in ascii mode and always in unicode in unicode mode, the encoding parameter doesn't change that.

The description should be improved so that it is not misunderstood.

Re: CatchXML(): Encoding can be misunderstood

Posted: Mon Apr 18, 2016 3:00 am
by Demivec
Sicro wrote:CatchXML() reads the string from the memory always as ascii in ascii mode and always in unicode in unicode mode, the encoding parameter doesn't change that.

The description should be improved so that it is not misunderstood.
The parameter makes it possible to read Unicode encoded data while in ASCII mode and then convert and store it as ASCII and to read ASCII encoded data while in Unicode mode and then convert and store it as Unicode.

The encoding parameter is useful. For instance, if the parameter was not used while reading Unicode data in ASCII mode it would abort as soon as a Unicode character contained a zero-byte (a frequent possibility).


In a program compiled in ASCII mode you do not have to store or create text in ASCII encoding also. You can read Unicode or UTF-8 encoded data from a network or file, or it can be created using other methods such as POKES(). Any of those may be the source for the buffer of data that CatchXML() uses. A similar statement can be made regarding programs compiled in Unicode mode.


Because the strings are converted into the encoding used by the compilation (ASCII/Unicode) it is a good reason to switch to using Unicode compilations sooner rather than later.

Re: CatchXML(): Encoding can be misunderstood

Posted: Sat May 07, 2016 3:33 pm
by Sicro
Yes, I know the usefulness of the encoding parameter.

Meanwhile, I have realized that the encoding parameter affects the reading of the string from the memory.

At first I was confused and thought the encoding parameter is only important for the SaveXML(), because in the following code the output in all the three cases is correct.
Hint: #PB_Unicode uses for each character two bytes and #PB_UTF8 uses one to four bytes for each character.

Code: Select all

CompilerIf Not #PB_Compiler_Unicode
  CompilerError "Run it in unicode mode!"
CompilerEndIf

XML$ = "<text>Hello?" + Chr(1000) + Chr(1200) + "</text>"
#XML = 0
CatchXML(#XML, @XML$, StringByteLength(XML$, #PB_Unicode), 0, #PB_Ascii)
Debug "CatchXML(#PB_Ascii):   " + ComposeXML(#XML)
FreeXML(#XML)
CatchXML(#XML, @XML$, StringByteLength(XML$, #PB_Unicode), 0, #PB_UTF8)
Debug "CatchXML(#PB_UTF8):    " + ComposeXML(#XML)
FreeXML(#XML)
CatchXML(#XML, @XML$, StringByteLength(XML$, #PB_Unicode), 0, #PB_Unicode)
Debug "CatchXML(#PB_Unicode): " + ComposeXML(#XML)
FreeXML(#XML)

Debug "---------------"
Debug "StringByteLength():            " + Str(StringByteLength(XML$))
Debug "StringByteLength(#PB_Ascii):   " + Str(StringByteLength(XML$, #PB_Ascii))
Debug "StringByteLength(#PB_UTF8):    " + Str(StringByteLength(XML$, #PB_UTF8))
Debug "StringByteLength(#PB_Unicode): " + Str(StringByteLength(XML$, #PB_Unicode))

Debug "----------------"
Debug "PeekS(#PB_Ascii):   " + PeekS(@XML$, -1, #PB_Ascii)
Debug "PeekS(#PB_UTF8):    " + PeekS(@XML$, -1, #PB_UTF8)
Debug "PeekS(#PB_Unicode): " + PeekS(@XML$, -1, #PB_Unicode)

Code: Select all

[20:18:28] CatchXML(#PB_Ascii):   <?xml version="1.0" encoding="UTF-16"?><text>Hello?ϨҰ</text>
[20:18:28] CatchXML(#PB_UTF8):    <?xml version="1.0" encoding="UTF-16"?><text>Hello?ϨҰ</text>
[20:18:28] CatchXML(#PB_Unicode): <?xml version="1.0" encoding="UTF-16"?><text>Hello?ϨҰ</text>
[20:18:28] ---------------
[20:18:28] StringByteLength():            42
[20:18:28] StringByteLength(#PB_Ascii):   19
[20:18:28] StringByteLength(#PB_UTF8):    23
[20:18:28] StringByteLength(#PB_Unicode): 42
[20:18:28] ----------------
[20:18:28] PeekS(#PB_Ascii):   <
[20:18:28] PeekS(#PB_UTF8):    <
[20:18:28] PeekS(#PB_Unicode): <text>Hello?ϨҰ</text>
As you can see, CatchXML() doesn't work like PeekS().

Re: CatchXML(): Encoding can be misunderstood

Posted: Sat Apr 21, 2018 9:49 pm
by Andre
I'm not familiar with this, sorry.

Could you provide a suggestion, what should be exactly written/added to the docs, please?
Else this would be something for Fred/freak I think...