Page 1 of 2
ExportXML() missing characters
Posted: Mon Dec 17, 2018 5:31 pm
by NicTheQuick
Hi,
I am working on a simple XML merger which reads lots of XML files and merges them into one. At the end I have to write the XML into a memory buffer which looks like this:
Code: Select all
size = ExportXMLSize(*mData)
*buffer = AllocateMemory(size)
If ExportXML(*mData, *buffer, size)
Debug PeekS(*buffer, -1, #PB_UTF8)
AddPackMemory(*mZip, *buffer, size, "Data.xml")
EndIf
FreeMemory(*buffer)
But in the end there is something missing. The resulting Data.xml should end in this:
But this happens:
When I use #PB_XML_StringFormat to write a BOM, or when I decide to ignore the declaration using #PB_XML_NoDeclaration the same happens.
I also tried to always make the buffer 1000 bytes bigger, but it did not change anything:
Code: Select all
size = ExportXMLSize(*mData) + 1000
There is something wrong with ExportXML(), maybe issues with UTF-8 or something completely different. Unfortunality there seems to be no workaround for this. If examples were needed I can send them to the developers. I was not able to reproduce the issue with small examples.
Maybe some interesting things I do:
- I use CopyXMLNode() to copy nodes from other XML trees to the *mData tree
- I change attributes on some nodes in the source tree before I copy them to *mData
- *mData is a completely fresh tree, created with CreateXML(#PB_Any, #PB_UTF8)
- The other trees were created using CatchXML() and defaul encoding
Re: [PB5.7 B4] ExportXML() missing characters
Posted: Mon Dec 17, 2018 5:35 pm
by Fred
You should AllocateMemory() with size+1, or put the 'size' parameter to PeekS() instead of -1, as it's looking for the null char. ExportXMLSize() doesn't include the null char IIRC. Do you have a code to reproduce it ?
Re: [PB5.7 B4] ExportXML() missing characters
Posted: Mon Dec 17, 2018 5:47 pm
by NicTheQuick
PeekS() is not the main problem here, I used it for debugging only. But I get your point.
The created file inside the ZIP container also was truncated.
I try to create a bigger example which shows the issue.
Re: [PB5.7 B4] ExportXML() missing characters
Posted: Tue Dec 18, 2018 10:31 am
by NicTheQuick
I tried a random XML generator but the issue is not reproducible like this.
Code: Select all
*xml = CreateXML(#PB_Any, #PB_UTF8)
Procedure addRandomChildren(*node, depth.i = 1, currentDepth.i = 0)
currentDepth + 1
If depth <= currentDepth
ProcedureReturn
EndIf
Protected i.i = Random(6) + 1
While i
SetXMLAttribute(*node, "attr" + i, "value" + i)
i - 1
Wend
i.i = Random(6) + 1
While i
addRandomChildren(CreateXMLNode(*node, "div" + i), depth, currentDepth)
i - 1
Wend
EndProcedure
*rootNode = CreateXMLNode(RootXMLNode(*xml), "pb")
addRandomChildren(*rootNode, 10)
Define size.i = ExportXMLSize(*xml)
Debug size
Define *buffer = AllocateMemory(size)
If ExportXML(*xml, *buffer, size)
Debug Right(PeekS(*buffer, size, #PB_UTF8), 150)
EndIf
FreeMemory(*buffer)
At the moment I don't know if I am allowed to post the XML files I am working with into the forum. Then I can show you the issue better.
Re: [PB5.7 B4] ExportXML() missing characters
Posted: Tue Dec 18, 2018 12:16 pm
by NicTheQuick
Hi Fred,
in this archive you find some zipped recipes and a small purebasic program which opens all of these files and loads the Data.xml inside. Then all these XML trees are traversed, some parts will be changed and some copied to a fresh created XML file which is a merged version of all the recipes. In the end (search for "BUG") the merged XML tree will be written to a memory block and zipped again. For debugging purposes I use Peeks() to see what the XML looks like. And there you will see the error at the end, hopefully.
Here the link to the data:
https://cloud.goeddel.net/index.php/s/TLQz2WGzrMApKJQ
And here the resulting XML tree I get on my machine. Because there are 2500+ lines I removed the middle.
Code: Select all
hixz Source="Merged Cookbook" FileVersion="1.0" date="2018-12-17"><Cookbooks/><CookbookChapters/><Recipes><Recipe Name="Beef Wellington" ID="70803" CookbookID="52" CookbookChapterID="0" Comments="Looking for an impressive dish for the holidays? It takes a little time to put this dish together, but it's simple to do...and the results are outstanding and delicious." Servings="10" PreparationTime="15" OvenTemperatureF="425" OvenTemperatureC="218" Source="Puff Pastry.com" WebPage="https://www.puffpastry.com/recipe/beef-wellington/" UserData4="Green beans amandine. For dessert serve cheesecake topped with sliced strawberries." CreateDate="2018-12-10" ColorFlag="<None>">
<RecipeIngredients>
<RecipeIngredient Quantity="2 1/2" Unit="" Ingredient="pounds center cut beef tenderloin" Heading="N" LinkType="Unlinked" IngredientID=""/>
<RecipeIngredient Quantity="1/2" Unit="teaspoon" Ingredient="ground black pepper (optional)" Heading="N" LinkType="Unlinked" IngredientID=""/>
<RecipeIngredient Quantity="1" Unit="" Ingredient="egg" Heading="N" LinkType="Unlinked" IngredientID=""/>
<RecipeIngredient Quantity="1" Unit="tablespoon" Ingredient="water" Heading="N" LinkType="Unlinked" IngredientID=""/>
<RecipeIngredient Quantity="1" Unit="tablespoon" Ingredient="butter" Heading="N" LinkType="Unlinked" IngredientID=""/>
<RecipeIngredient Quantity="2" Unit="cups" Ingredient="finely chopped mushrooms" Heading="N" LinkType="Unlinked" IngredientID=""/>
<RecipeIngredient Quantity="1" Unit="" Ingredient="medium onion, finely chopped (about 1/2 cup)" Heading="N" LinkType="Unlinked" IngredientID=""/>
<RecipeIngredient Quantity="2" Unit="tablespoons" Ingredient="all-purpose flour" Heading="N" LinkType="Unlinked" IngredientID=""/>
<RecipeIngredient Quantity="1/2" Unit="" Ingredient="of a 17.3-ounce package Pepperidge Farm® Puff Pastry Sheets (1 sheet), thawed" Heading="N" LinkType="Unlinked" IngredientID=""/>
</RecipeIngredients>
<RecipeProcedures>
<RecipeProcedure Heading="N">
<ProcedureText>Heat the oven to 425°F. Place the beef into a lightly greased roasting pan. Season with the black pepper, if desired. Roast for 30 minutes or until an instant-read thermometer inserted into the beef reads 130°F. Cover the pan and refrigerate for 1 hour.</ProcedureText>
</RecipeProcedure>
....
<snip>
....
</RecipeIngredients>
<RecipeProcedures>
<RecipeProcedure Heading="N">
<ProcedureText>Preheat oven to 425°. Place potatoes in a large pot and pour in water to cover by 2". Season water generously with salt and bring to a simmer over medium-high heat. Reduce heat and simmer gently until potatoes are tender on the outside but still very firm in the center, 8–10 minutes.</ProcedureText>
</RecipeProcedure>
<RecipeProcedure Heading="N">
<ProcedureText>Pour off water in pot, holding potatoes back. Toss potatoes just enough to rough up their outsides and give them a floury starchy coating (do not toss so vigorously that they fall apart); season with salt.</ProcedureText>
</RecipeProcedure>
<RecipeProcedure Heading="N">
<ProcedureText>Meanwhile, combine both oils in a large roasting pan and heat in oven 10 minutes.</ProcedureText>
</RecipeProcedure>
<RecipeProcedure Heading="N">
<ProcedureText>Carefully remove pan from oven; add potatoes, turning each one to coat and moisten exterior. Return pan to oven and roast potatoes, turning every 10 minutes, for 30 minutes.</ProcedureText>
</RecipeProcedure>
</RecipeProcedures>
<RecipeImages>
<RecipeImage FileName="image93.png" FileType="PNG" Description="" OriginalFileName="Recipe70300.png"/>
</RecipeImages>
<SourceImage FileName="image92.JPG" FileType="JPG" OriginalFileName="RecipeSource70300.JPG"/>
</Recipe></Recipes
Re: [PB5.7 B4] ExportXML() missing characters
Posted: Tue Dec 18, 2018 2:01 pm
by #NULL
Not sure if this is of any help but I see the PeekS() in the previous post is given a byte size instead of a char length in which case the format flag should include #PB_ByteLength?
Re: [PB5.7 B4] ExportXML() missing characters
Posted: Tue Dec 18, 2018 2:21 pm
by NicTheQuick
#NULL wrote:Not sure if this is of any help but I see the PeekS() in the previous post is given a byte size instead of a char length in which case the format flag should include #PB_ByteLength
Good point but also doesn't solve the issue because the content of the resulting file is also truncated.
Re: [PB5.7 B4] ExportXML() missing characters
Posted: Tue Dec 18, 2018 2:35 pm
by #NULL
I wonder why the PeekS only works with UTF8. That's the format to read and PeekS returns utf16, i.e. standard PB string, right? So why does ExportXML() encode as utf8?
Re: [PB5.7 B4] ExportXML() missing characters
Posted: Tue Dec 18, 2018 2:42 pm
by NicTheQuick
#NULL wrote:I wonder why the PeekS only works with UTF8. That's the format to read and PeekS returns utf16, i.e. standard PB string, right? So why does ExportXML() encode as utf8?
You can define the encoding with CreateXML(). All operations will then work with this encoding internally. ExportXML() then exports the XML tree in the given encoding. It would make no sense to encode it in the format the program was compiled with because normally you want to write it to a file.
Re: [PB5.7 B4] ExportXML() missing characters
Posted: Tue Dec 18, 2018 2:48 pm
by #NULL
I understand. I missed the first line.
Re: [PB5.7 B4] ExportXML() missing characters
Posted: Tue Dec 18, 2018 4:38 pm
by #NULL
I can reproduce it with the following. Output differs if you remove any of the line separator HTML entities in the Comments attribute. According to the linux 'file' command the original Data.xml had mixed line separators CRLF an LF. But I don't know if that was because of HTML entities or if it matters at all. The XML file is utf-8 with BOM.
Code: Select all
<?xml version="1.0" encoding="utf-8"?>
<hixz Source="Living Cookbook 4.0" FileVersion="1.0" date="2013-09-21">
<Cookbooks>
<Cookbook Name="Imported" ID="29" />
</Cookbooks>
<CookbookChapters>
<CookbookChapter Name="Bisques, Chili, Chowders, Gumbows, Soups & Stews" ID="364" CookbookID="29" ParentChapterID="0" />
<CookbookChapter Name="Stews *" ID="365" CookbookID="29" ParentChapterID="364" />
</CookbookChapters>
<Recipes>
<Recipe Name="Chickpea Stew" ID="11007" CookbookID="29" CookbookChapterID="365" Comments="rice.
Garam masala " Source="Williams-Sonoma" WebPage="http://www.williams-sonoma.com/recipe/chickpea-stew.html?cm_src=RECIPESEARCH" CreateDate="2012-02-19" ColorFlag="<None>">
</Recipe>
</Recipes>
</hixz>
Code: Select all
xml = LoadXML(#PB_Any, "/home/user/Downloads/Recipes/Bisques, Chili, Chowders, Gumbows, Soups & Stews.fdxz_FILES/Data_2.xml")
Define size.i = ExportXMLSize(xml) ; + 100
Debug size
Define *buffer = AllocateMemory(size + 100)
Debug *buffer
If ExportXML(xml, *buffer, size)
;ShowMemoryViewer(*buffer, 10)
;Debug Right(PeekS(*buffer, size, #PB_UTF8), 150)
;Debug PeekS(*buffer, size, #PB_Unicode | #PB_ByteLength)
Debug PeekS(*buffer, -1, #PB_UTF8)
;Debug PeekS(*buffer)
EndIf
FreeMemory(*buffer)
Re: [PB5.7 B4] ExportXML() missing characters
Posted: Tue Dec 18, 2018 5:10 pm
by NicTheQuick
Nice. Thank you. Then this should be a good minimal example for Fred to check.
Re: [PB5.7 B4] ExportXML() missing characters
Posted: Tue Dec 18, 2018 8:08 pm
by normeus
in windows 7, pb5.62x86 it works if you export using "#PB_XML_NoDeclaration" ( the sample code from #NULL works anyway):
Code: Select all
If ExportXML(xml, *buffer, size,#PB_XML_NoDeclaration)
The input file was a text file UTF-8 no Boom marker.
Norm.
Re: [PB5.7 B4] ExportXML() missing characters
Posted: Tue Dec 18, 2018 10:18 pm
by NicTheQuick
I am testing on Ubuntu 18.04 x64.
Re: [PB5.7 B4] ExportXML() missing characters
Posted: Sat Dec 22, 2018 2:20 pm
by NicTheQuick
Fred, can you confirm the bug?