CatchXML(): Encoding can be misunderstood

Found an issue in the documentation ? Please report it here !

Moderator: Documentation Editors

User avatar
Sicro
Enthusiast
Enthusiast
Posts: 538
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

CatchXML(): Encoding can be misunderstood

Post by Sicro »

PB help wrote:

Code: Select all

Result = CatchXML(#XML, *Address, Size [, Flags [, Encoding]])
Encoding (optional)
The encoding to use when loading the XML tree (this overwrites the encoding set in the XML declaration). Valid values are:
#PB_UTF8 (default)
#PB_Ascii
#PB_Unicode
With the parameter "Encoding" not you define in which encoding the characters from the memory are to be read, as it is the case with PeekS(), but with which coding SaveXML() has to save them later.
CatchXML() reads the string from the memory always as ascii in ascii mode and always in unicode in unicode mode, the encoding parameter doesn't change that.

The description should be improved so that it is not misunderstood.
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
User avatar
Demivec
Addict
Addict
Posts: 4086
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: CatchXML(): Encoding can be misunderstood

Post by Demivec »

Sicro wrote:CatchXML() reads the string from the memory always as ascii in ascii mode and always in unicode in unicode mode, the encoding parameter doesn't change that.

The description should be improved so that it is not misunderstood.
The parameter makes it possible to read Unicode encoded data while in ASCII mode and then convert and store it as ASCII and to read ASCII encoded data while in Unicode mode and then convert and store it as Unicode.

The encoding parameter is useful. For instance, if the parameter was not used while reading Unicode data in ASCII mode it would abort as soon as a Unicode character contained a zero-byte (a frequent possibility).


In a program compiled in ASCII mode you do not have to store or create text in ASCII encoding also. You can read Unicode or UTF-8 encoded data from a network or file, or it can be created using other methods such as POKES(). Any of those may be the source for the buffer of data that CatchXML() uses. A similar statement can be made regarding programs compiled in Unicode mode.


Because the strings are converted into the encoding used by the compilation (ASCII/Unicode) it is a good reason to switch to using Unicode compilations sooner rather than later.
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 538
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

Re: CatchXML(): Encoding can be misunderstood

Post by Sicro »

Yes, I know the usefulness of the encoding parameter.

Meanwhile, I have realized that the encoding parameter affects the reading of the string from the memory.

At first I was confused and thought the encoding parameter is only important for the SaveXML(), because in the following code the output in all the three cases is correct.
Hint: #PB_Unicode uses for each character two bytes and #PB_UTF8 uses one to four bytes for each character.

Code: Select all

CompilerIf Not #PB_Compiler_Unicode
  CompilerError "Run it in unicode mode!"
CompilerEndIf

XML$ = "<text>Hello?" + Chr(1000) + Chr(1200) + "</text>"
#XML = 0
CatchXML(#XML, @XML$, StringByteLength(XML$, #PB_Unicode), 0, #PB_Ascii)
Debug "CatchXML(#PB_Ascii):   " + ComposeXML(#XML)
FreeXML(#XML)
CatchXML(#XML, @XML$, StringByteLength(XML$, #PB_Unicode), 0, #PB_UTF8)
Debug "CatchXML(#PB_UTF8):    " + ComposeXML(#XML)
FreeXML(#XML)
CatchXML(#XML, @XML$, StringByteLength(XML$, #PB_Unicode), 0, #PB_Unicode)
Debug "CatchXML(#PB_Unicode): " + ComposeXML(#XML)
FreeXML(#XML)

Debug "---------------"
Debug "StringByteLength():            " + Str(StringByteLength(XML$))
Debug "StringByteLength(#PB_Ascii):   " + Str(StringByteLength(XML$, #PB_Ascii))
Debug "StringByteLength(#PB_UTF8):    " + Str(StringByteLength(XML$, #PB_UTF8))
Debug "StringByteLength(#PB_Unicode): " + Str(StringByteLength(XML$, #PB_Unicode))

Debug "----------------"
Debug "PeekS(#PB_Ascii):   " + PeekS(@XML$, -1, #PB_Ascii)
Debug "PeekS(#PB_UTF8):    " + PeekS(@XML$, -1, #PB_UTF8)
Debug "PeekS(#PB_Unicode): " + PeekS(@XML$, -1, #PB_Unicode)

Code: Select all

[20:18:28] CatchXML(#PB_Ascii):   <?xml version="1.0" encoding="UTF-16"?><text>Hello?ϨҰ</text>
[20:18:28] CatchXML(#PB_UTF8):    <?xml version="1.0" encoding="UTF-16"?><text>Hello?ϨҰ</text>
[20:18:28] CatchXML(#PB_Unicode): <?xml version="1.0" encoding="UTF-16"?><text>Hello?ϨҰ</text>
[20:18:28] ---------------
[20:18:28] StringByteLength():            42
[20:18:28] StringByteLength(#PB_Ascii):   19
[20:18:28] StringByteLength(#PB_UTF8):    23
[20:18:28] StringByteLength(#PB_Unicode): 42
[20:18:28] ----------------
[20:18:28] PeekS(#PB_Ascii):   <
[20:18:28] PeekS(#PB_UTF8):    <
[20:18:28] PeekS(#PB_Unicode): <text>Hello?ϨҰ</text>
As you can see, CatchXML() doesn't work like PeekS().
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
User avatar
Andre
PureBasic Team
PureBasic Team
Posts: 2056
Joined: Fri Apr 25, 2003 6:14 pm
Location: Germany (Saxony, Deutscheinsiedel)
Contact:

Re: CatchXML(): Encoding can be misunderstood

Post by Andre »

I'm not familiar with this, sorry.

Could you provide a suggestion, what should be exactly written/added to the docs, please?
Else this would be something for Fred/freak I think...
Bye,
...André
(PureBasicTeam::Docs & Support - PureArea.net | Order:: PureBasic | PureVisionXP)
Post Reply