XML-like data parser

Share your advanced PureBasic knowledge/code with the community.
LuckyLuke
Enthusiast
Enthusiast
Posts: 181
Joined: Fri Jun 06, 2003 2:41 pm
Location: Belgium

XML-like data parser

Post by LuckyLuke »

Code updated for 5.20+

This function parses some XML-like data.
Still needs some error checking, changes to make it XML compliant, etc...
Feel free to improve this code and share it with the fantastic PureBasic community.

Code: Select all

; Based on Poor Man XML parser! by Marco Pontello 2003
;
; (c) LuckyLuke,07/07/2005
;
Structure _Tag
  vName.s       ;Tag name
  vLevel.l      ;Tag level
  vValue.s      ;Tag value
  pParentTag.l  ;Pointer to parent tag
  pAttr.l       ;Pointer to first attribute
EndStructure

Structure _Attribute
  pTag.l    ;Pointer to tag (index of linkedlist tag)
  vName.s   ;Attribute name
  vValue.s  ;Attribute value
EndStructure

Global NewList lTag._Tag()
Global NewList lAttr._Attribute()

Procedure ParseXml(aXml.s)
  vTmp.s
  vChar.s
  vTmpAttr.s
  vPos.l=1  ;Global position in aXml
  vLevel.l=0
  vOldLevel.l
  vPrevLevel.l=0
  vParentTag.l=0
  vPrevParentTag.l=0
  While 1=1
    vTmp="" ;Reset tmp
    i = FindString(aXml,"<", vPos)
    j = FindString(aXml,">", vPos)
    If i>=j Or i=0
      i = j
    EndIf 
    If i=0
      Break
    EndIf   
    vChar = Mid(aXml,i,1) ;First char (> or <) from selection
    vTmp = Mid(aXml, vPos, i - vPos)
    vPos = i + 1
    vTmp = Trim(vTmp)
   
    If vTmp<>""
      If vChar=">"  ;End of tag
        If Left(vTmp,1)<>"?" And Right(vTmp,1)<>"?"
          If Left(vTmp,1)<>"!" And Right(vTmp,1)<>"/"
            vOldLevel = vLevel
            If Left(vTmp,1)="/" ;End Tag
              vLevel = vLevel - 1
            Else   
              AddElement(lTag())
              If CountString(vTmp," ") > 0
                lTag()\pAttr = ListSize(lAttr()) + 1
              Else
                lTag()\pAttr = 0             
              EndIf
              ;Attributes
              For k=2 To CountString(vTmp," ") + 1
                vTmpAttr = StringField(vTmp, k, " ")
                AddElement(lAttr())             
                lAttr()\pTag = ListSize(lTag())           
                lAttr()\vName = StringField(vTmpAttr,1,"=")
                lAttr()\vValue = StringField(vTmpAttr,2,"=")           
              Next               
              vTmp = StringField(vTmp, 1, " ")               
              vLevel = vLevel + 1   
              lTag()\vName = vTmp
              lTag()\vLevel=vLevel 
              lTag()\pParentTag=vParentTag
            EndIf
            If vOldLevel<=vLevel
              If vLevel > vPrevLevel
                vParentTag = vPrevParentTag
                vPrevLevel = vLevel
                vPrevParentTag = ListSize(lTag())
              EndIf
            EndIf
            lTag()\pParentTag=vParentTag
          EndIf
        EndIf         
      Else
        lTag()\vValue = vTmp
      EndIf
    EndIf
  Wend
EndProcedure

;Example
xml.s = "<XML><DATA id=1 name=test><NAME>PureBasic</NAME><VERSION>00.00.01</VERSION></DATA></XML>"
;Call function
ParseXml(xml)

;Show values
i=0
ForEach lTag()
  i=i+1
;  Debug  "Tag (Parent) : " + Str(lTag()\pParentTag)
  Debug  "Tag (Name)   : " + lTag()\vName
  Debug  "Tag (Value)  : " + lTag()\vValue
;  Debug  "Tag (Level)  : " + Str(lTag()\vLevel) 
  ;Tag has attributes ?
  If lTag()\pAttr <> 0
    ForEach lAttr()
      If lAttr()\pTag = i
        Debug "Attribute (Name) : " + lAttr()\vName + " = " + lAttr()\vValue
      EndIf 
    Next   
  EndIf 
Next
User avatar
Kiffi
Addict
Addict
Posts: 1504
Joined: Tue Mar 02, 2004 1:20 pm
Location: Amphibios 9

Post by Kiffi »

Good job, LuckyLuke!

here are some proposals from a poor english speaking guy ;)

1. Attributes always should be enclosed in " (Chr(34)) oder ' (Chr(39))
Delimiters (see also: [1]), so if there is a tag like this:

Code: Select all

<tagname attributename="attributevalue" />
the value of the attribute is attributevalue and not "attributevalue"

2. XML-Tags can contain [CDATA]-Sections (see also: [2]). In this sections
opening- and closing-brackets are allowed, so your Code try to parse this
brackets a normal XML-Tags.

Code: Select all

<tagname><![CDATA[<this and that>]]></tagname>
Thanks for sharing your code & Greetings ... Kiffi

[1]: http://www.w3.org/TR/2004/REC-xml-20040204/#dt-attr
[2]: http://www.w3.org/TR/2004/REC-xml-20040204/#attdecls

[Edit]

Here are my first suggestion for improvement.

i have changed your method to retrieve attributenames and -values in the following way:

Code: Select all

[...]
 lAttr()\vName  = GetAttributeName(vTmpAttr) 
 lAttr()\vValue = GetAttributeValue(vTmpAttr)
[...]
and here are the procedures for getting the attributenames and -values:

Code: Select all

Procedure.s GetAttributeName(sAttr.s)
  ; sAttr contains a string like this: myAttr=myValue, myAttr='myValue' or myAttr="myValue"
  sAttr = Trim(StringField(sAttr,1,"="))
  ProcedureReturn sAttr
EndProcedure

Procedure.s GetAttributeValue(sAttr.s)
  ; sAttr contains a string like this: myAttr=myValue, myAttr='myValue' or myAttr="myValue"
  sAttr = Trim(StringField(sAttr,2,"="))
  If Left(sAttr,1)=Chr(34) Or Left(sAttr,1)=Chr(39)
    sAttr=Mid(sAttr,2,Len(sAttr)-1)
  EndIf
  If Right(sAttr,1)=Chr(34) Or Right(sAttr,1)=Chr(39)
    sAttr=Left(sAttr,Len(sAttr)-1)
  EndIf
  ProcedureReturn sAttr
EndProcedure
Post Reply