XML structure generator

Share your advanced PureBasic knowledge/code with the community.
User avatar
Lunasole
Addict
Addict
Posts: 1091
Joined: Mon Oct 26, 2015 2:55 am
Location: UA
Contact:

XML structure generator

Post by Lunasole »

Hi.
It's dirty, quickly-made and ... so on ^^
But anyway it is better than declaring structures manually.

* How to use?
1) Save code to a some file
2) Put your XML into a "xml.txt" file around
3) Run code and copy required structures declarations from debug
4) Use ExtractXMLStructure() with those declarations to make XML parsing some easier

Also see the code for something to change, etc.
For now It can't build one structure describing whole XML tree, just produces lot of separated structures.

PS. Also maybe there was some better made already, just didn't found and was simplier to make my own than search a lot

UPDATE: the code below is outdated, here is UI tool instead:
http://geocities.ws/lunasole/data/_4pb/xmlstgen/l

Code: Select all

EnableExplicit

;{ XML and such stuff }
	
	Structure ITEM 
		Name$
		Params$
	EndStructure
	
	; Returns type of Data contained in Dat$
	; RETURN:		detected type, represented as string with PB code
	Procedure$ XmlContentType(Dat$, NoDetect = #False)
		; if disabled detection
		If NoDetect
			ProcedureReturn ".s"
		EndIf
		
		; analyze data and try to detect it's type
		Protected t, L = Len(dat$)
		Protected c$
		
		Protected isFloat, isInt, isStr
		Protected tVal.q
		
		For T = 1 To L
			c$ = Mid(Dat$, T, 1)
			
			Select c$
				Case "0", "1", "2", "3", "4", "5", "6", "7", "8", "9":
					isInt + 1
				Case ".", ",":
					; the float value expected to start from a digit, not from ".", ","
					If isInt >= 1
						isFloat + 1
					Else
						isStr + 1
					EndIf
				Case "-":
					; - can be only the first char, if value is numeric
					If T = 1 
						isInt + 1
					Else
						isStr + 1
					EndIf
					
				Default
					isStr + 1
			EndSelect
			
		Next T
		
		; the value can't be int or float if it is string
		If isStr
			ProcedureReturn ".s"
			
			; the value can be float if it has numbers (optionally with single "-" sign) and only one separator ("," or ".")
		ElseIf isInt And isFloat = 1
			;ProcedureReturn ".d"
			ProcedureReturn ".f"
			
			; if contains only number, it is int value
		ElseIf isInt
			;ProcedureReturn ".l"
			ProcedureReturn ".q"
			
			; emty value, let it be string
		Else
			ProcedureReturn ".s"
		EndIf
	EndProcedure
	
	
	; Recursively goes through XML tree and collects data types
	; Node			any node of XML object to start from it
	; ChildID		used internally
	; Types()		list to receive results
	; NoDetect  	if True, string type is used for all values (instead of attempt to detect data type)
	; RETURN:		none, list modified
	Procedure XmlScanRecursive (Node, ChildID, List Types.ITEM(), NoDetect = #False)
		Static XmlDbgRegExp, Dim RegRes.s(0)
		Protected CCount = XMLChildCount (Node), NNode = NextXMLNode (Node), NType = XMLNodeType (Node) 
		
		If NType = #PB_XML_Root 
			; init
			ClearList(Types())
			XmlDbgRegExp = CreateRegularExpression(#PB_Any, "(?s)([^ \f\n\r\t\v]|.*).*[^ \f\n\r\t\v]") ; probably this still needed ^^
		Else
			; check if it is list
			; 		Define isList 
			; 		If ExamineXMLAttributes(Node)
			; 			While NextXMLAttribute(Node)
			; 				If LCase(XMLAttributeName(Node)) = "list" And LCase(XMLAttributeValue(Node)) = "true"
			;  					Debug GetXMLNodeName (Node) + Str(CCount)
			; 					Break
			; 				EndIf
			; 			Wend
			; 		EndIf
			
			If CCount
				; add new type
				AddElement(Types())
				Types()\Name$ = "XML_" + UCase(GetXMLNodeName (Node))
			Else
				; add new param
				ExtractRegularExpression(XmlDbgRegExp, GetXMLNodeText (Node), RegRes())
				Types()\Params$ + #TAB$ + GetXMLNodeName (Node) + XmlContentType(RegRes(0), NoDetect) + #CRLF$
			EndIf
		EndIf
		
		; counting items on current level
		ChildID + 1
		If CCount >= ChildID 
			; go deeper
			PushListPosition(Types())
			XmlScanRecursive (ChildXMLNode (Node), ChildID, Types(), NoDetect)
			PopListPosition(Types())
		EndIf
		; continue at current deepth
		If NNode
			XmlScanRecursive(NNode, 0, Types(), NoDetect)
		EndIf
		
		; fin
		If NType = #PB_XML_Root
			FreeRegularExpression(XmlDbgRegExp)
		EndIf
	EndProcedure
	
;}


; main
Define NewList Res.ITEM()
Define NewMap Tmp()

; load and analyze xml from file
Define  tXML = LoadXML(#PB_Any, "D:\1.txt")
XmlScanRecursive (RootXMLNode (tXML), 0, Res(), #False)
FreeXML(tXML)

; display found structure declarations
; TODO: nested structures, lists/arrays and so on
Define tHash$
SortStructuredList(Res(), #PB_Sort_Ascending, OffsetOf(ITEM\Name$), #PB_String)
ForEach Res()
	If Res()\Name$ And Res()\Params$
		; also ignore duplicates
		tHash$ = Res()\Name$ + Chr(1) + Res()\Params$
		If FindMapElement(Tmp(), tHash$) = 0
			Debug "Structure " + Res()\Name$
			Debug Res()\Params$ + "EndStructure" + #CRLF$
			Tmp(tHash$) = #True
		EndIf
	EndIf
Next Res()
Last edited by Lunasole on Sat Apr 15, 2017 4:57 am, edited 3 times in total.
"W̷i̷s̷h̷i̷n̷g o̷n a s̷t̷a̷r"
User avatar
Lunasole
Addict
Addict
Posts: 1091
Joined: Mon Oct 26, 2015 2:55 am
Location: UA
Contact:

Re: XML structure generator

Post by Lunasole »

Fixed bug with declaration duplicates removal.

I'm however still too lazy to make a finished functional tool from this code (because using it too rarely), so nothing to do ^^
"W̷i̷s̷h̷i̷n̷g o̷n a s̷t̷a̷r"
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5342
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: XML structure generator

Post by Kwai chang caine »

Cool, justly i try to analyse en use EXCEL XLSX XML and i found your code. :shock:
A little bit late :oops:
And it give an very interesting result
Thanks for sharing 8)

For

Code: Select all

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
	<fonts count="6">
		<font>
			<sz val="11"/>
			<color theme="1"/>
			<name val="Calibri"/>
			<family val="2"/>
			<scheme val="minor"/>
		</font>
		<font>
			<sz val="11"/>
			<color theme="0"/>
			<name val="Calibri"/>
			<family val="2"/>
			<scheme val="minor"/>
		</font>
		<font>
			<sz val="11"/>
			<color rgb="FFFFFF00"/>
			<name val="Calibri"/>
			<family val="2"/>
			<scheme val="minor"/>
		</font>
		<font>
			<sz val="11"/>
			<color rgb="FFFF0000"/>
			<name val="Calibri"/>
			<family val="2"/>
			<scheme val="minor"/>
		</font>
		<font>
			<sz val="11"/>
			<color theme="6" tint="-0.499984740745262"/>
			<name val="Calibri"/>
			<family val="2"/>
			<scheme val="minor"/>
		</font>
		<font>
			<sz val="11"/>
			<color theme="9" tint="-0.499984740745262"/>
			<name val="Calibri"/>
			<family val="2"/>
			<scheme val="minor"/>
		</font>
	</fonts>
	<fills count="7">
		<fill>
			<patternFill patternType="none"/>
		</fill>
		<fill>
			<patternFill patternType="gray125"/>
		</fill>
		<fill>
			<patternFill patternType="solid">
				<fgColor rgb="FFFFFF00"/>
				<bgColor indexed="64"/>
			</patternFill>
		</fill>
		<fill>
			<patternFill patternType="solid">
				<fgColor theme="3"/>
				<bgColor indexed="64"/>
			</patternFill>
		</fill>
		<fill>
			<patternFill patternType="solid">
				<fgColor rgb="FFFF0000"/>
				<bgColor indexed="64"/>
			</patternFill>
		</fill>
		<fill>
			<patternFill patternType="solid">
				<fgColor theme="9" tint="0.39997558519241921"/>
				<bgColor indexed="64"/>
			</patternFill>
		</fill>
		<fill>
			<patternFill patternType="solid">
				<fgColor theme="6"/>
				<bgColor indexed="64"/>
			</patternFill>
		</fill>
	</fills>
	<borders count="1">
		<border>
			<left/>
			<right/>
			<top/>
			<bottom/>
			<diagonal/>
		</border>
	</borders>
	<cellStyleXfs count="1">
		<xf numFmtId="0" fontId="0" fillId="0" borderId="0"/>
	</cellStyleXfs>
	<cellXfs count="8">
		<xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0"/>
		<xf numFmtId="0" fontId="0" fillId="2" borderId="0" xfId="0" applyFill="1" applyAlignment="1">
			<alignment horizontal="center"/>
		</xf>
		<xf numFmtId="0" fontId="1" fillId="3" borderId="0" xfId="0" applyFont="1" applyFill="1" applyAlignment="1">
			<alignment vertical="center"/>
		</xf>
		<xf numFmtId="0" fontId="3" fillId="3" borderId="0" xfId="0" applyFont="1" applyFill="1"/>
		<xf numFmtId="0" fontId="4" fillId="5" borderId="0" xfId="0" applyFont="1" applyFill="1" applyAlignment="1">
			<alignment horizontal="center"/>
		</xf>
		<xf numFmtId="0" fontId="5" fillId="6" borderId="0" xfId="0" applyFont="1" applyFill="1" applyAlignment="1">
			<alignment vertical="center"/>
		</xf>
		<xf numFmtId="0" fontId="2" fillId="4" borderId="0" xfId="0" applyFont="1" applyFill="1" applyAlignment="1">
			<alignment horizontal="center" vertical="center"/>
		</xf>
		<xf numFmtId="2" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
	</cellXfs>
	<cellStyles count="1">
		<cellStyle name="Normal" xfId="0" builtinId="0"/>
	</cellStyles>
	<dxfs count="0"/>
	<tableStyles count="0" defaultTableStyle="TableStyleMedium9" defaultPivotStyle="PivotStyleLight16"/>
</styleSheet>

Code: Select all

Structure XML_WORKSHEET
	dimension.s
EndStructure

Structure XML_SHEETVIEWS
	sheetFormatPr.s
EndStructure

Structure XML_MERGECELLS
	mergeCell.q
	pageMargins.q
	pageSetup.q
EndStructure

Structure XML_C
	v.q
EndStructure

Structure XML_C
	c.q
	c.q
EndStructure
ImageThe happiness is a road...
Not a destination
User avatar
Lunasole
Addict
Addict
Posts: 1091
Joined: Mon Oct 26, 2015 2:55 am
Location: UA
Contact:

Re: XML structure generator

Post by Lunasole »

Kwai chang caine wrote:Cool, justly i try to analyse en use EXCEL XLSX XML and i found your code. :shock:
A little bit late :oops:
And it give an very interesting result
Thanks for sharing 8)

For
Hi. Probably you missed something, for XML that you posted it returns following:

Code: Select all

Structure XML_BORDER
	left.s
	right.s
	top.s
	bottom.s
	diagonal.s
EndStructure

Structure XML_CELLSTYLES
	cellStyle.s
	dxfs.s
	tableStyles.s
EndStructure

Structure XML_CELLSTYLEXFS
	xf.s
EndStructure

Structure XML_CELLXFS
	xf.s
EndStructure

Structure XML_FILL
	patternFill.s
EndStructure

Structure XML_FONT
	sz.s
	color.s
	name.s
	family.s
	scheme.s
EndStructure

Structure XML_PATTERNFILL
	fgColor.s
	bgColor.s
EndStructure

Structure XML_XF
	alignment.s
EndStructure

Structure XML_XF
	alignment.s
	xf.s
EndStructure

Which is OK, that code just forms structures in one pass (after that you need to improve them manually -- like fixing fields of XML_XF here, and then also use result manually), not the whole XML tree recreation, etc, which would be nice & cool ^^
I just wrote that dirty thing to ease a bit some web-site XML responces parsing.

PS. Also with your MS XML problem is that it has a lot of data in XML attributes, instead of regular values. As for me It looks complicated to parse such a crap, anyway my code do not handle this
"W̷i̷s̷h̷i̷n̷g o̷n a s̷t̷a̷r"
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5342
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: XML structure generator

Post by Kwai chang caine »

Thanks for your answer :wink:
ImageThe happiness is a road...
Not a destination
User avatar
Lunasole
Addict
Addict
Posts: 1091
Joined: Mon Oct 26, 2015 2:55 am
Location: UA
Contact:

Re: XML structure generator

Post by Lunasole »

Recently I've replaced that dirty code with some less dirty GUI tool :)
It still has nothing to do with XML attributes, probably is not a bug-free and cannot parse XML which has more than one "main" node, (that's PB limitation and how it should be by standart), but is useful in many cases.

http://geocities.ws/lunasole/data/_4pb/xmlstgen/l
"W̷i̷s̷h̷i̷n̷g o̷n a s̷t̷a̷r"
User avatar
Lunasole
Addict
Addict
Posts: 1091
Joined: Mon Oct 26, 2015 2:55 am
Location: UA
Contact:

Re: XML structure generator

Post by Lunasole »

Well I was angry when yesterday got tired of my own dirty code in that tool (to be exact, it annoyed me when returned incorrect results for one file), so decided to spend some time and make it better ^_^

Just uploaded v.1.0.0.7 where XML processing is made more thoughtful, also now it generates set of nested structures representing whole tree (instead of separated structures previously), provides extra info about arrays and attributes (useful sometimes), and just should work correct with any XML, unlike previous version.

Here is for example result for file "PB\Examples\Sources\Data\ui.xml"

Input:

Code: Select all

<?xml version="1.0"?>

<!-- Window -->
<window id="0" name="hello" text="Window" minwidth="auto" minheight="auto" flags="#PB_Window_SizeGadget | #PB_Window_MaximizeGadget | #PB_Window_MinimizeGadget">
  <hbox expand="item:2">
    <vbox expand="no">
      <checkbox name="OneInstanceCheckbox" text="Run only one instance ?" disabled="yes" Flags=""/>
      <progressbar height="25"/>
      <trackbar invisible="no" Flags="#PB_TrackBar_Ticks" height="25"/>
      <option text="option 1" name="option1"/>
      <option text="option 2"/>
      <checkbox name="EnableAlphaBlendingCheckbox" text="Enable alpha-blending" Flags="" onevent="EnableAlphaBlendingEvent()"/>
      <option text="scale x2" name="scale"/>
      <option text="scale x3"/>
    </vbox>
    <editor name="editor" width="200" height="50"/>
  </hbox>
</window>
Output:

Code: Select all

; /window/hbox/vbox
; attributes: expand
Structure XML_VBOX
	checkbox.s        ; [] attributes: text,flags,disabled,name,onevent
	option.s          ; [] attributes: text,name
	progressbar.s     ;    attributes: height
	trackbar.s        ;    attributes: height,flags,invisible
EndStructure

; /window/hbox
; attributes: expand
Structure XML_HBOX
	editor.s          ;    attributes: height,width,name
	vbox.XML_VBOX     ;    attributes: expand
EndStructure

; /window
; attributes: id,text,flags,name,minwidth,minheight
Structure XML_WINDOW
	hbox.XML_HBOX     ;    attributes: expand
EndStructure

; /
Structure XML_MAIN
	window.XML_WINDOW ;    attributes: id,text,flags,name,minwidth,minheight
EndStructure

For me this stuff is usable when parsing unknown XML (especially large), I'll be glad if someone else tell about this tool usage (or about better way to do what it does ^^)
"W̷i̷s̷h̷i̷n̷g o̷n a s̷t̷a̷r"
dagcrack
Addict
Addict
Posts: 1868
Joined: Sun Mar 07, 2004 8:47 am
Location: Argentina
Contact:

Re: XML structure generator

Post by dagcrack »

Very handy thanks for sharing!
Seems to have issues with certain files where equally named structures with different elements are repeated, I could supply with sample files if you'd be willing to revisit the code. Otherwise very useful as-is for most schemas.
! Black holes are where God divided by zero !
My little blog!
(Not for the faint hearted!)
Post Reply