Hi folks,
is it possible to load the XML as stream? Because it seems that
LoadXML() is always loading the whole file. I've a 1.8GB file with
data from openstreetmap and would like to parse ways and nodes
from buildings. But LoadXML crashes after 6 hours.
Unfortunately, the amount of data can not be reduced, unless I load
the file with ReadFile () and parse everything with FindString () etc. ...
What is recommended to do with bigger XML files?
XML and big files
XML and big files
"Daddy, I'll run faster, then it is not so far..."
Re: XML and big files
You can open the file with ReadFile and load your blocks with ReadData(), for example only 10MBs
Then you can use CatchXML(#XML, *Adresse, Laenge [, Flags [, Kodierung]]) with the flag:
#PB_XML_StreamStart
#PB_XML_StreamNext
#PB_XML_StreamEnd
Then you can use CatchXML(#XML, *Adresse, Laenge [, Flags [, Kodierung]]) with the flag:
#PB_XML_StreamStart
#PB_XML_StreamNext
#PB_XML_StreamEnd
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and more ― Typeface - Sprite-based font include/module
Lizard - Script language for symbolic calculations and more ― Typeface - Sprite-based font include/module
Re: XML and big files
Cooool!!!
Dankeschön STARGÅTE!

"Daddy, I'll run faster, then it is not so far..."
Re: XML and big files
Note that while this reads the input in blocks, the previously scanned data still remains in memory so at the end you have the entire XML file in memory. With such a large file that is not practical.
You can use the expat parser directly: It is a "streaming parser", which means you register callbacks for the information that you need and then you can parse the file in blocks and don't need to keep the entire file around.
The expat functions are available directly in PB with a "pb_" prefix.
The documentation is available here: http://expat.cvs.sourceforge.net/viewvc ... rence.html
Here is a quick example:
You can use the expat parser directly: It is a "streaming parser", which means you register callbacks for the information that you need and then you can parse the file in blocks and don't need to keep the entire file around.
The expat functions are available directly in PB with a "pb_" prefix.
The documentation is available here: http://expat.cvs.sourceforge.net/viewvc ... rence.html
Here is a quick example:
Code: Select all
; Expat returns UTF8-Strings in ascii mode and unicode strings in unicode mode
CompilerIf #PB_Compiler_Unicode
Macro PeekExpat(ptr)
PeekS(ptr)
EndMacro
CompilerElse
Macro PeekExpat(ptr)
PeekS(ptr, -1, #PB_UTF8)
EndMacro
CompilerEndIf
ProcedureC StartElementHandler(user_data, *name, *args)
Debug "Start: " + PeekExpat(*name)
; Attribute values are an array of pointers with alternating name and value entries
; Terminated by null pointer
*arg.INTEGER = *args
While *arg\i <> 0
Name$ = PeekExpat(*arg\i)
*arg + SizeOf(Integer)
Value$ = PeekExpat(*arg\i)
*arg + SizeOf(Integer)
Debug " " + Name$ + "=" + Value$
Wend
EndProcedure
ProcedureC EndElementHandler(user_data, *name)
Debug "End: " + PeekExpat(*name)
EndProcedure
If ReadFile(0, "c:\test.xml")
; initialize parser
Parser = pb_XML_ParserCreate_(0)
pb_XML_SetStartElementHandler_(Parser, @StartElementHandler())
pb_XML_SetEndElementHandler_(Parser, @EndElementHandler())
; block size for streaming. this is very small as an example. Use something larger like 1Mb here for real files!
BufferSize = 20
*Buffer = AllocateMemory(BufferSize)
While Not Eof(0)
BytesRead = ReadData(0, *Buffer, BufferSize)
If BytesRead > 0
If pb_XML_Parse_(Parser, *Buffer, BytesRead, #False) = #XML_STATUS_ERROR
; parser error (message is in ascii)
Debug "Parser Error (Line " + Str(pb_XML_GetCurrentLineNumber_(Parser)) + "): " + PeekS(pb_XML_ErrorString_(pb_XML_GetErrorCode_(Parser)), -1, #PB_Ascii)
Break
EndIf
EndIf
Wend
; important: finish the parsing process
pb_XML_Parse_(Parser, *Buffer, 0, #True)
pb_XML_ParserFree_(Parser)
FreeMemory(*Buffer)
CloseFile(0)
Else
Debug "Cannot open file"
EndIf
quidquid Latine dictum sit altum videtur
Re: XML and big files
Thank you freak, it fits to my needs 

"Daddy, I'll run faster, then it is not so far..."
Re: XML and big files
Thank you Freak for this trick, very helpful (and very much needed when dealing with buf xml), small question: is this cross platform?