Page 1 of 1

XML structure generator

Posted: Thu Dec 22, 2016 3:20 pm
by Lunasole
Hi.
It's dirty, quickly-made and ... so on ^^
But anyway it is better than declaring structures manually.

* How to use?
1) Save code to a some file
2) Put your XML into a "xml.txt" file around
3) Run code and copy required structures declarations from debug
4) Use ExtractXMLStructure() with those declarations to make XML parsing some easier

Also see the code for something to change, etc.
For now It can't build one structure describing whole XML tree, just produces lot of separated structures.

PS. Also maybe there was some better made already, just didn't found and was simplier to make my own than search a lot

UPDATE: the code below is outdated, here is UI tool instead:
http://geocities.ws/lunasole/data/_4pb/xmlstgen/l

Code: Select all

EnableExplicit

;{ XML and such stuff }
	
	Structure ITEM 
		Name$
		Params$
	EndStructure
	
	; Returns type of Data contained in Dat$
	; RETURN:		detected type, represented as string with PB code
	Procedure$ XmlContentType(Dat$, NoDetect = #False)
		; if disabled detection
		If NoDetect
			ProcedureReturn ".s"
		EndIf
		
		; analyze data and try to detect it's type
		Protected t, L = Len(dat$)
		Protected c$
		
		Protected isFloat, isInt, isStr
		Protected tVal.q
		
		For T = 1 To L
			c$ = Mid(Dat$, T, 1)
			
			Select c$
				Case "0", "1", "2", "3", "4", "5", "6", "7", "8", "9":
					isInt + 1
				Case ".", ",":
					; the float value expected to start from a digit, not from ".", ","
					If isInt >= 1
						isFloat + 1
					Else
						isStr + 1
					EndIf
				Case "-":
					; - can be only the first char, if value is numeric
					If T = 1 
						isInt + 1
					Else
						isStr + 1
					EndIf
					
				Default
					isStr + 1
			EndSelect
			
		Next T
		
		; the value can't be int or float if it is string
		If isStr
			ProcedureReturn ".s"
			
			; the value can be float if it has numbers (optionally with single "-" sign) and only one separator ("," or ".")
		ElseIf isInt And isFloat = 1
			;ProcedureReturn ".d"
			ProcedureReturn ".f"
			
			; if contains only number, it is int value
		ElseIf isInt
			;ProcedureReturn ".l"
			ProcedureReturn ".q"
			
			; emty value, let it be string
		Else
			ProcedureReturn ".s"
		EndIf
	EndProcedure
	
	
	; Recursively goes through XML tree and collects data types
	; Node			any node of XML object to start from it
	; ChildID		used internally
	; Types()		list to receive results
	; NoDetect  	if True, string type is used for all values (instead of attempt to detect data type)
	; RETURN:		none, list modified
	Procedure XmlScanRecursive (Node, ChildID, List Types.ITEM(), NoDetect = #False)
		Static XmlDbgRegExp, Dim RegRes.s(0)
		Protected CCount = XMLChildCount (Node), NNode = NextXMLNode (Node), NType = XMLNodeType (Node) 
		
		If NType = #PB_XML_Root 
			; init
			ClearList(Types())
			XmlDbgRegExp = CreateRegularExpression(#PB_Any, "(?s)([^ \f\n\r\t\v]|.*).*[^ \f\n\r\t\v]") ; probably this still needed ^^
		Else
			; check if it is list
			; 		Define isList 
			; 		If ExamineXMLAttributes(Node)
			; 			While NextXMLAttribute(Node)
			; 				If LCase(XMLAttributeName(Node)) = "list" And LCase(XMLAttributeValue(Node)) = "true"
			;  					Debug GetXMLNodeName (Node) + Str(CCount)
			; 					Break
			; 				EndIf
			; 			Wend
			; 		EndIf
			
			If CCount
				; add new type
				AddElement(Types())
				Types()\Name$ = "XML_" + UCase(GetXMLNodeName (Node))
			Else
				; add new param
				ExtractRegularExpression(XmlDbgRegExp, GetXMLNodeText (Node), RegRes())
				Types()\Params$ + #TAB$ + GetXMLNodeName (Node) + XmlContentType(RegRes(0), NoDetect) + #CRLF$
			EndIf
		EndIf
		
		; counting items on current level
		ChildID + 1
		If CCount >= ChildID 
			; go deeper
			PushListPosition(Types())
			XmlScanRecursive (ChildXMLNode (Node), ChildID, Types(), NoDetect)
			PopListPosition(Types())
		EndIf
		; continue at current deepth
		If NNode
			XmlScanRecursive(NNode, 0, Types(), NoDetect)
		EndIf
		
		; fin
		If NType = #PB_XML_Root
			FreeRegularExpression(XmlDbgRegExp)
		EndIf
	EndProcedure
	
;}


; main
Define NewList Res.ITEM()
Define NewMap Tmp()

; load and analyze xml from file
Define  tXML = LoadXML(#PB_Any, "D:\1.txt")
XmlScanRecursive (RootXMLNode (tXML), 0, Res(), #False)
FreeXML(tXML)

; display found structure declarations
; TODO: nested structures, lists/arrays and so on
Define tHash$
SortStructuredList(Res(), #PB_Sort_Ascending, OffsetOf(ITEM\Name$), #PB_String)
ForEach Res()
	If Res()\Name$ And Res()\Params$
		; also ignore duplicates
		tHash$ = Res()\Name$ + Chr(1) + Res()\Params$
		If FindMapElement(Tmp(), tHash$) = 0
			Debug "Structure " + Res()\Name$
			Debug Res()\Params$ + "EndStructure" + #CRLF$
			Tmp(tHash$) = #True
		EndIf
	EndIf
Next Res()

Re: XML structure generator

Posted: Sun Dec 25, 2016 11:59 pm
by Lunasole
Fixed bug with declaration duplicates removal.

I'm however still too lazy to make a finished functional tool from this code (because using it too rarely), so nothing to do ^^

Re: XML structure generator

Posted: Tue Feb 07, 2017 1:48 pm
by Kwai chang caine
Cool, justly i try to analyse en use EXCEL XLSX XML and i found your code. :shock:
A little bit late :oops:
And it give an very interesting result
Thanks for sharing 8)

For

Code: Select all

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
	<fonts count="6">
		<font>
			<sz val="11"/>
			<color theme="1"/>
			<name val="Calibri"/>
			<family val="2"/>
			<scheme val="minor"/>
		</font>
		<font>
			<sz val="11"/>
			<color theme="0"/>
			<name val="Calibri"/>
			<family val="2"/>
			<scheme val="minor"/>
		</font>
		<font>
			<sz val="11"/>
			<color rgb="FFFFFF00"/>
			<name val="Calibri"/>
			<family val="2"/>
			<scheme val="minor"/>
		</font>
		<font>
			<sz val="11"/>
			<color rgb="FFFF0000"/>
			<name val="Calibri"/>
			<family val="2"/>
			<scheme val="minor"/>
		</font>
		<font>
			<sz val="11"/>
			<color theme="6" tint="-0.499984740745262"/>
			<name val="Calibri"/>
			<family val="2"/>
			<scheme val="minor"/>
		</font>
		<font>
			<sz val="11"/>
			<color theme="9" tint="-0.499984740745262"/>
			<name val="Calibri"/>
			<family val="2"/>
			<scheme val="minor"/>
		</font>
	</fonts>
	<fills count="7">
		<fill>
			<patternFill patternType="none"/>
		</fill>
		<fill>
			<patternFill patternType="gray125"/>
		</fill>
		<fill>
			<patternFill patternType="solid">
				<fgColor rgb="FFFFFF00"/>
				<bgColor indexed="64"/>
			</patternFill>
		</fill>
		<fill>
			<patternFill patternType="solid">
				<fgColor theme="3"/>
				<bgColor indexed="64"/>
			</patternFill>
		</fill>
		<fill>
			<patternFill patternType="solid">
				<fgColor rgb="FFFF0000"/>
				<bgColor indexed="64"/>
			</patternFill>
		</fill>
		<fill>
			<patternFill patternType="solid">
				<fgColor theme="9" tint="0.39997558519241921"/>
				<bgColor indexed="64"/>
			</patternFill>
		</fill>
		<fill>
			<patternFill patternType="solid">
				<fgColor theme="6"/>
				<bgColor indexed="64"/>
			</patternFill>
		</fill>
	</fills>
	<borders count="1">
		<border>
			<left/>
			<right/>
			<top/>
			<bottom/>
			<diagonal/>
		</border>
	</borders>
	<cellStyleXfs count="1">
		<xf numFmtId="0" fontId="0" fillId="0" borderId="0"/>
	</cellStyleXfs>
	<cellXfs count="8">
		<xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0"/>
		<xf numFmtId="0" fontId="0" fillId="2" borderId="0" xfId="0" applyFill="1" applyAlignment="1">
			<alignment horizontal="center"/>
		</xf>
		<xf numFmtId="0" fontId="1" fillId="3" borderId="0" xfId="0" applyFont="1" applyFill="1" applyAlignment="1">
			<alignment vertical="center"/>
		</xf>
		<xf numFmtId="0" fontId="3" fillId="3" borderId="0" xfId="0" applyFont="1" applyFill="1"/>
		<xf numFmtId="0" fontId="4" fillId="5" borderId="0" xfId="0" applyFont="1" applyFill="1" applyAlignment="1">
			<alignment horizontal="center"/>
		</xf>
		<xf numFmtId="0" fontId="5" fillId="6" borderId="0" xfId="0" applyFont="1" applyFill="1" applyAlignment="1">
			<alignment vertical="center"/>
		</xf>
		<xf numFmtId="0" fontId="2" fillId="4" borderId="0" xfId="0" applyFont="1" applyFill="1" applyAlignment="1">
			<alignment horizontal="center" vertical="center"/>
		</xf>
		<xf numFmtId="2" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1"/>
	</cellXfs>
	<cellStyles count="1">
		<cellStyle name="Normal" xfId="0" builtinId="0"/>
	</cellStyles>
	<dxfs count="0"/>
	<tableStyles count="0" defaultTableStyle="TableStyleMedium9" defaultPivotStyle="PivotStyleLight16"/>
</styleSheet>

Code: Select all

Structure XML_WORKSHEET
	dimension.s
EndStructure

Structure XML_SHEETVIEWS
	sheetFormatPr.s
EndStructure

Structure XML_MERGECELLS
	mergeCell.q
	pageMargins.q
	pageSetup.q
EndStructure

Structure XML_C
	v.q
EndStructure

Structure XML_C
	c.q
	c.q
EndStructure

Re: XML structure generator

Posted: Wed Feb 08, 2017 12:34 am
by Lunasole
Kwai chang caine wrote:Cool, justly i try to analyse en use EXCEL XLSX XML and i found your code. :shock:
A little bit late :oops:
And it give an very interesting result
Thanks for sharing 8)

For
Hi. Probably you missed something, for XML that you posted it returns following:

Code: Select all

Structure XML_BORDER
	left.s
	right.s
	top.s
	bottom.s
	diagonal.s
EndStructure

Structure XML_CELLSTYLES
	cellStyle.s
	dxfs.s
	tableStyles.s
EndStructure

Structure XML_CELLSTYLEXFS
	xf.s
EndStructure

Structure XML_CELLXFS
	xf.s
EndStructure

Structure XML_FILL
	patternFill.s
EndStructure

Structure XML_FONT
	sz.s
	color.s
	name.s
	family.s
	scheme.s
EndStructure

Structure XML_PATTERNFILL
	fgColor.s
	bgColor.s
EndStructure

Structure XML_XF
	alignment.s
EndStructure

Structure XML_XF
	alignment.s
	xf.s
EndStructure

Which is OK, that code just forms structures in one pass (after that you need to improve them manually -- like fixing fields of XML_XF here, and then also use result manually), not the whole XML tree recreation, etc, which would be nice & cool ^^
I just wrote that dirty thing to ease a bit some web-site XML responces parsing.

PS. Also with your MS XML problem is that it has a lot of data in XML attributes, instead of regular values. As for me It looks complicated to parse such a crap, anyway my code do not handle this

Re: XML structure generator

Posted: Thu Feb 09, 2017 2:28 pm
by Kwai chang caine
Thanks for your answer :wink:

Re: XML structure generator

Posted: Tue Apr 04, 2017 2:09 pm
by Lunasole
Recently I've replaced that dirty code with some less dirty GUI tool :)
It still has nothing to do with XML attributes, probably is not a bug-free and cannot parse XML which has more than one "main" node, (that's PB limitation and how it should be by standart), but is useful in many cases.

http://geocities.ws/lunasole/data/_4pb/xmlstgen/l

Re: XML structure generator

Posted: Sat Apr 15, 2017 4:50 am
by Lunasole
Well I was angry when yesterday got tired of my own dirty code in that tool (to be exact, it annoyed me when returned incorrect results for one file), so decided to spend some time and make it better ^_^

Just uploaded v.1.0.0.7 where XML processing is made more thoughtful, also now it generates set of nested structures representing whole tree (instead of separated structures previously), provides extra info about arrays and attributes (useful sometimes), and just should work correct with any XML, unlike previous version.

Here is for example result for file "PB\Examples\Sources\Data\ui.xml"

Input:

Code: Select all

<?xml version="1.0"?>

<!-- Window -->
<window id="0" name="hello" text="Window" minwidth="auto" minheight="auto" flags="#PB_Window_SizeGadget | #PB_Window_MaximizeGadget | #PB_Window_MinimizeGadget">
  <hbox expand="item:2">
    <vbox expand="no">
      <checkbox name="OneInstanceCheckbox" text="Run only one instance ?" disabled="yes" Flags=""/>
      <progressbar height="25"/>
      <trackbar invisible="no" Flags="#PB_TrackBar_Ticks" height="25"/>
      <option text="option 1" name="option1"/>
      <option text="option 2"/>
      <checkbox name="EnableAlphaBlendingCheckbox" text="Enable alpha-blending" Flags="" onevent="EnableAlphaBlendingEvent()"/>
      <option text="scale x2" name="scale"/>
      <option text="scale x3"/>
    </vbox>
    <editor name="editor" width="200" height="50"/>
  </hbox>
</window>
Output:

Code: Select all

; /window/hbox/vbox
; attributes: expand
Structure XML_VBOX
	checkbox.s        ; [] attributes: text,flags,disabled,name,onevent
	option.s          ; [] attributes: text,name
	progressbar.s     ;    attributes: height
	trackbar.s        ;    attributes: height,flags,invisible
EndStructure

; /window/hbox
; attributes: expand
Structure XML_HBOX
	editor.s          ;    attributes: height,width,name
	vbox.XML_VBOX     ;    attributes: expand
EndStructure

; /window
; attributes: id,text,flags,name,minwidth,minheight
Structure XML_WINDOW
	hbox.XML_HBOX     ;    attributes: expand
EndStructure

; /
Structure XML_MAIN
	window.XML_WINDOW ;    attributes: id,text,flags,name,minwidth,minheight
EndStructure

For me this stuff is usable when parsing unknown XML (especially large), I'll be glad if someone else tell about this tool usage (or about better way to do what it does ^^)

Re: XML structure generator

Posted: Tue Mar 01, 2022 9:09 pm
by dagcrack
Very handy thanks for sharing!
Seems to have issues with certain files where equally named structures with different elements are repeated, I could supply with sample files if you'd be willing to revisit the code. Otherwise very useful as-is for most schemas.