ExportXML() missing characters

Just starting out? Need help? Post your questions and find answers here.
User avatar
NicTheQuick
Addict
Addict
Posts: 1504
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

ExportXML() missing characters

Post by NicTheQuick »

Hi,

I am working on a simple XML merger which reads lots of XML files and merges them into one. At the end I have to write the XML into a memory buffer which looks like this:

Code: Select all

size = ExportXMLSize(*mData)
*buffer = AllocateMemory(size)
If ExportXML(*mData, *buffer, size)
	Debug PeekS(*buffer, -1, #PB_UTF8)
	AddPackMemory(*mZip, *buffer, size, "Data.xml")
EndIf
FreeMemory(*buffer)
But in the end there is something missing. The resulting Data.xml should end in this:

Code: Select all

</Recipe></Recipes></hixz>
But this happens:

Code: Select all

</Recipe></Recipes
When I use #PB_XML_StringFormat to write a BOM, or when I decide to ignore the declaration using #PB_XML_NoDeclaration the same happens.

I also tried to always make the buffer 1000 bytes bigger, but it did not change anything:

Code: Select all

size = ExportXMLSize(*mData) + 1000
There is something wrong with ExportXML(), maybe issues with UTF-8 or something completely different. Unfortunality there seems to be no workaround for this. If examples were needed I can send them to the developers. I was not able to reproduce the issue with small examples.

Maybe some interesting things I do:
- I use CopyXMLNode() to copy nodes from other XML trees to the *mData tree
- I change attributes on some nodes in the source tree before I copy them to *mData
- *mData is a completely fresh tree, created with CreateXML(#PB_Any, #PB_UTF8)
- The other trees were created using CatchXML() and defaul encoding
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
Fred
Administrator
Administrator
Posts: 18162
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Re: [PB5.7 B4] ExportXML() missing characters

Post by Fred »

You should AllocateMemory() with size+1, or put the 'size' parameter to PeekS() instead of -1, as it's looking for the null char. ExportXMLSize() doesn't include the null char IIRC. Do you have a code to reproduce it ?
User avatar
NicTheQuick
Addict
Addict
Posts: 1504
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: [PB5.7 B4] ExportXML() missing characters

Post by NicTheQuick »

PeekS() is not the main problem here, I used it for debugging only. But I get your point.

The created file inside the ZIP container also was truncated.

I try to create a bigger example which shows the issue.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
NicTheQuick
Addict
Addict
Posts: 1504
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: [PB5.7 B4] ExportXML() missing characters

Post by NicTheQuick »

I tried a random XML generator but the issue is not reproducible like this.

Code: Select all

*xml = CreateXML(#PB_Any, #PB_UTF8)

Procedure addRandomChildren(*node, depth.i = 1, currentDepth.i = 0)
	currentDepth + 1
	If depth <= currentDepth
		ProcedureReturn
	EndIf
	
	Protected i.i = Random(6) + 1
	While i
		SetXMLAttribute(*node, "attr" + i, "value" + i)
		i - 1
	Wend
	
	i.i = Random(6) + 1
	While i
		addRandomChildren(CreateXMLNode(*node, "div" + i), depth, currentDepth)
		i - 1
	Wend
EndProcedure

*rootNode = CreateXMLNode(RootXMLNode(*xml), "pb")

addRandomChildren(*rootNode, 10)

Define size.i = ExportXMLSize(*xml)
Debug size
Define *buffer = AllocateMemory(size)
If ExportXML(*xml, *buffer, size)
	Debug Right(PeekS(*buffer, size, #PB_UTF8), 150)
EndIf
FreeMemory(*buffer)
At the moment I don't know if I am allowed to post the XML files I am working with into the forum. Then I can show you the issue better.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
NicTheQuick
Addict
Addict
Posts: 1504
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: [PB5.7 B4] ExportXML() missing characters

Post by NicTheQuick »

Hi Fred,

in this archive you find some zipped recipes and a small purebasic program which opens all of these files and loads the Data.xml inside. Then all these XML trees are traversed, some parts will be changed and some copied to a fresh created XML file which is a merged version of all the recipes. In the end (search for "BUG") the merged XML tree will be written to a memory block and zipped again. For debugging purposes I use Peeks() to see what the XML looks like. And there you will see the error at the end, hopefully.

Here the link to the data: https://cloud.goeddel.net/index.php/s/TLQz2WGzrMApKJQ

And here the resulting XML tree I get on my machine. Because there are 2500+ lines I removed the middle.

Code: Select all

hixz Source="Merged Cookbook" FileVersion="1.0" date="2018-12-17"><Cookbooks/><CookbookChapters/><Recipes><Recipe Name="Beef Wellington" ID="70803" CookbookID="52" CookbookChapterID="0" Comments="Looking for an impressive dish for the holidays?  It takes a little time to put this dish together, but it&apos;s simple to do...and the results are outstanding and delicious." Servings="10" PreparationTime="15" OvenTemperatureF="425" OvenTemperatureC="218" Source="Puff Pastry.com" WebPage="https://www.puffpastry.com/recipe/beef-wellington/" UserData4="Green beans amandine. For dessert serve cheesecake topped with sliced strawberries." CreateDate="2018-12-10" ColorFlag="<None>">
      <RecipeIngredients>
        <RecipeIngredient Quantity="2 1/2" Unit="" Ingredient="pounds center cut beef tenderloin" Heading="N" LinkType="Unlinked" IngredientID=""/>
        <RecipeIngredient Quantity="1/2" Unit="teaspoon" Ingredient="ground black pepper (optional)" Heading="N" LinkType="Unlinked" IngredientID=""/>
        <RecipeIngredient Quantity="1" Unit="" Ingredient="egg" Heading="N" LinkType="Unlinked" IngredientID=""/>
        <RecipeIngredient Quantity="1" Unit="tablespoon" Ingredient="water" Heading="N" LinkType="Unlinked" IngredientID=""/>
        <RecipeIngredient Quantity="1" Unit="tablespoon" Ingredient="butter" Heading="N" LinkType="Unlinked" IngredientID=""/>
        <RecipeIngredient Quantity="2" Unit="cups" Ingredient="finely chopped mushrooms" Heading="N" LinkType="Unlinked" IngredientID=""/>
        <RecipeIngredient Quantity="1" Unit="" Ingredient="medium onion, finely chopped (about 1/2 cup)" Heading="N" LinkType="Unlinked" IngredientID=""/>
        <RecipeIngredient Quantity="2" Unit="tablespoons" Ingredient="all-purpose flour" Heading="N" LinkType="Unlinked" IngredientID=""/>
        <RecipeIngredient Quantity="1/2" Unit="" Ingredient="of a 17.3-ounce package Pepperidge Farm® Puff Pastry Sheets (1 sheet), thawed" Heading="N" LinkType="Unlinked" IngredientID=""/>
      </RecipeIngredients>
      <RecipeProcedures>
        <RecipeProcedure Heading="N">
          <ProcedureText>Heat the oven to 425°F.  Place the beef into a lightly greased roasting pan. Season with the black pepper, if desired.  Roast for 30 minutes or until an instant-read thermometer inserted into the beef reads 130°F.  Cover the pan and refrigerate for 1 hour.</ProcedureText>
        </RecipeProcedure>
....
<snip>
....
      </RecipeIngredients>
      <RecipeProcedures>
        <RecipeProcedure Heading="N">
          <ProcedureText>Preheat oven to 425°. Place potatoes in a large pot and pour in water to cover by 2". Season water generously with salt and bring to a simmer over medium-high heat. Reduce heat and simmer gently until potatoes are tender on the outside but still very firm in the center, 8–10 minutes.</ProcedureText>
        </RecipeProcedure>
        <RecipeProcedure Heading="N">
          <ProcedureText>Pour off water in pot, holding potatoes back. Toss potatoes just enough to rough up their outsides and give them a floury starchy coating (do not toss so vigorously that they fall apart); season with salt.</ProcedureText>
        </RecipeProcedure>
        <RecipeProcedure Heading="N">
          <ProcedureText>Meanwhile, combine both oils in a large roasting pan and heat in oven 10 minutes.</ProcedureText>
        </RecipeProcedure>
        <RecipeProcedure Heading="N">
          <ProcedureText>Carefully remove pan from oven; add potatoes, turning each one to coat and moisten exterior. Return pan to oven and roast potatoes, turning every 10 minutes, for 30 minutes.</ProcedureText>
        </RecipeProcedure>
      </RecipeProcedures>
      <RecipeImages>
        <RecipeImage FileName="image93.png" FileType="PNG" Description="" OriginalFileName="Recipe70300.png"/>
      </RecipeImages>
      <SourceImage FileName="image92.JPG" FileType="JPG" OriginalFileName="RecipeSource70300.JPG"/>
    </Recipe></Recipes
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
#NULL
Addict
Addict
Posts: 1497
Joined: Thu Aug 30, 2007 11:54 pm
Location: right here

Re: [PB5.7 B4] ExportXML() missing characters

Post by #NULL »

Not sure if this is of any help but I see the PeekS() in the previous post is given a byte size instead of a char length in which case the format flag should include #PB_ByteLength?
User avatar
NicTheQuick
Addict
Addict
Posts: 1504
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: [PB5.7 B4] ExportXML() missing characters

Post by NicTheQuick »

#NULL wrote:Not sure if this is of any help but I see the PeekS() in the previous post is given a byte size instead of a char length in which case the format flag should include #PB_ByteLength
Good point but also doesn't solve the issue because the content of the resulting file is also truncated.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
#NULL
Addict
Addict
Posts: 1497
Joined: Thu Aug 30, 2007 11:54 pm
Location: right here

Re: [PB5.7 B4] ExportXML() missing characters

Post by #NULL »

I wonder why the PeekS only works with UTF8. That's the format to read and PeekS returns utf16, i.e. standard PB string, right? So why does ExportXML() encode as utf8?
User avatar
NicTheQuick
Addict
Addict
Posts: 1504
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: [PB5.7 B4] ExportXML() missing characters

Post by NicTheQuick »

#NULL wrote:I wonder why the PeekS only works with UTF8. That's the format to read and PeekS returns utf16, i.e. standard PB string, right? So why does ExportXML() encode as utf8?
You can define the encoding with CreateXML(). All operations will then work with this encoding internally. ExportXML() then exports the XML tree in the given encoding. It would make no sense to encode it in the format the program was compiled with because normally you want to write it to a file.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
#NULL
Addict
Addict
Posts: 1497
Joined: Thu Aug 30, 2007 11:54 pm
Location: right here

Re: [PB5.7 B4] ExportXML() missing characters

Post by #NULL »

I understand. I missed the first line.
#NULL
Addict
Addict
Posts: 1497
Joined: Thu Aug 30, 2007 11:54 pm
Location: right here

Re: [PB5.7 B4] ExportXML() missing characters

Post by #NULL »

I can reproduce it with the following. Output differs if you remove any of the line separator HTML entities in the Comments attribute. According to the linux 'file' command the original Data.xml had mixed line separators CRLF an LF. But I don't know if that was because of HTML entities or if it matters at all. The XML file is utf-8 with BOM.

Code: Select all

<?xml version="1.0" encoding="utf-8"?>
<hixz Source="Living Cookbook 4.0" FileVersion="1.0" date="2013-09-21">
  <Cookbooks>
    <Cookbook Name="Imported" ID="29" />
  </Cookbooks>
  <CookbookChapters>
    <CookbookChapter Name="Bisques, Chili, Chowders, Gumbows, Soups & Stews" ID="364" CookbookID="29" ParentChapterID="0" />
    <CookbookChapter Name="Stews *" ID="365" CookbookID="29" ParentChapterID="364" />
  </CookbookChapters>
  <Recipes>
    <Recipe Name="Chickpea Stew" ID="11007" CookbookID="29" CookbookChapterID="365" Comments="rice.&#xD;&#xA;Garam masala " Source="Williams-Sonoma" WebPage="http://www.williams-sonoma.com/recipe/chickpea-stew.html?cm_src=RECIPESEARCH" CreateDate="2012-02-19" ColorFlag="<None>">
    </Recipe>
  </Recipes>
</hixz>

Code: Select all

xml = LoadXML(#PB_Any, "/home/user/Downloads/Recipes/Bisques, Chili, Chowders, Gumbows, Soups & Stews.fdxz_FILES/Data_2.xml")

Define size.i = ExportXMLSize(xml) ; + 100
Debug size
Define *buffer = AllocateMemory(size + 100)
Debug *buffer
If ExportXML(xml, *buffer, size)
  ;ShowMemoryViewer(*buffer, 10)
  ;Debug Right(PeekS(*buffer, size, #PB_UTF8), 150)
  ;Debug PeekS(*buffer, size, #PB_Unicode | #PB_ByteLength)
  Debug PeekS(*buffer, -1, #PB_UTF8)
  ;Debug PeekS(*buffer)
EndIf
FreeMemory(*buffer)
User avatar
NicTheQuick
Addict
Addict
Posts: 1504
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: [PB5.7 B4] ExportXML() missing characters

Post by NicTheQuick »

Nice. Thank you. Then this should be a good minimal example for Fred to check.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
normeus
Enthusiast
Enthusiast
Posts: 470
Joined: Fri Apr 20, 2012 8:09 pm
Contact:

Re: [PB5.7 B4] ExportXML() missing characters

Post by normeus »

in windows 7, pb5.62x86 it works if you export using "#PB_XML_NoDeclaration" ( the sample code from #NULL works anyway):

Code: Select all

If ExportXML(xml, *buffer, size,#PB_XML_NoDeclaration)
The input file was a text file UTF-8 no Boom marker.

Norm.
google Translate;Makes my jokes fall flat- Fait mes blagues tombent à plat- Machte meine Witze verpuffen- Eh cumpari ci vo sunari
User avatar
NicTheQuick
Addict
Addict
Posts: 1504
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: [PB5.7 B4] ExportXML() missing characters

Post by NicTheQuick »

I am testing on Ubuntu 18.04 x64.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
NicTheQuick
Addict
Addict
Posts: 1504
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: [PB5.7 B4] ExportXML() missing characters

Post by NicTheQuick »

Fred, can you confirm the bug?
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
Post Reply