Seite 1 von 1

HTML code sicherer machen (Tags entfernen etc.)

Verfasst: 26.02.2013 11:14
von Kukulkan
Hallo,

ich hab gerade auf einen Forenbeitrag hin einen Teil zur Sicherung von HTML Code aus unserem Produkt rauskopiert. Ich denke es könnte hilfreich sein.

Es kann:
- Einzelne Tags aus html entfernen.
- Tag-Blöcke entfernen (incl. dem Code dazwischen).
- Ausführende Attribute entfernen (durch Unsinn ersetzen und damit entschärfen).

Kann dazu verwendet werden um HTML Code vor der Anzeige etwas zu entschärfen. Ich bin mir sicher dass es nicht alles Übel verhindern kann aber besser als einfach alle Tags zu entfernen...

Code: Alles auswählen

; Strip tags from html code
; Secure html code by removing tags and tag blocks.
; (w) 2012 V. Schmid
; PureBasic V4.61 and 5.0
; Windows, Mac, Linux
; Works single byte and unicode

EnableExplicit

; generate some random code (50% numeric, 50% chars with upper/lowercase)
Procedure.s GenerateCode(CodeLength.i)
  Protected output.s = "", x.i
  For x.i = 1 To CodeLength.i
    If Random(100) < 50
      ; number
      output.s = output.s + Str(Random(9))
    Else
      ; char
      Protected r.i = Random(25) + 65 ; A-Z
      If Random(100) < 50
        output.s = output.s + Chr(r.i)
      Else
        output.s = output.s + LCase(Chr(r.i))
      EndIf
    EndIf
  Next
  ProcedureReturn output.s
EndProcedure


; Case insensitive version of FindString() function
; Start position is optional (default is 1).
Procedure.i FindStringCIS(strOriginal.s, strToFind.s, lngStartPos.i = 1)
  ProcedureReturn FindString(UCase(strOriginal.s), UCase(strToFind.s), lngStartPos.i)
EndProcedure

; Remove all html tags that are defined using TagsToRemove.s.
; Divide tags using pipe (|).
; Example: "form|input|script"
; 
; This function also allows to strip complete sentences between
; opening and closing tags like <script> Or <style>.
; Example: "script|style|noframes"
Procedure.s RemoveHTMLTags(OriginalHTML.s, TagsToRemove.s, BlocksToRemove.s = "")
  
  ; prepare the string
  Protected s.s = OriginalHTML.s, Element.s, ex.s, x.i
  Protected len1.i, len2.i, SearchPos.i, pos1.i, pos2.i
  
  ; remove tags with content between them
  If Len(BlocksToRemove.s) > 0
    For x.i = 1 To CountString(BlocksToRemove.s, "|") + 1
      Element.s = StringField(BlocksToRemove.s, x.i, "|")
      len1.i = Len("<" + Element.s)
      SearchPos.i = 1
      While FindStringCIS(s.s, "<" + Element.s, SearchPos.i) > 0
          pos1.i = FindStringCIS(s.s,"<" + Element.s, SearchPos.i)
          pos2.i = FindStringCIS(s.s, Element.s + ">", pos1.i + len1.i)
          If pos2.i > pos1.i
            len2.i = pos2.i - pos1.i + len1.i
            ex.s   = Mid(s.s, pos1.i, len2.i)
            s.s    = ReplaceString(s.s, ex.s, "", #PB_String_NoCase)
            SearchPos.i = 1 ; next search from the beginning
          Else
            SearchPos.i = pos1.i + 1 ; next search follows
          EndIf
      Wend
    Next ; x.i
  EndIf
  
  ; remove single tags
  If Len(TagsToRemove.s) > 0
    For x.i = 1 To CountString(TagsToRemove.s, "|") + 1
      Element.s = StringField(TagsToRemove.s, x.i, "|")
      ; start tag
      While FindStringCIS(s.s, "<" + Element.s, 1) > 0
        pos1.i = FindStringCIS(s.s,"<" + Element.s, 1)
        pos2.i = FindStringCIS(s.s, ">", pos1.i)
        len2.i = pos2.i - pos1.i + 1
        ex.s   = Mid(s.s, pos1.i, len2.i)
        s.s    = ReplaceString(s.s, ex.s, "", #PB_String_NoCase)
      Wend
      ; end tag
      While FindStringCIS(s.s, "</" + Element.s, 1) > 0
        pos1.i = FindStringCIS(s.s,"</" + Element.s, 1)
        pos2.i = FindStringCIS(s.s, ">", pos1.i)
        len2.i = pos2.i - pos1.i + 1
        ex.s   = Mid(s.s, pos1.i, len2.i)
        s.s    = ReplaceString(s.s, ex.s, "", #PB_String_NoCase)
      Wend
    Next ; x.i
  EndIf
  
  ProcedureReturn s.s
  
EndProcedure


; Replaces all dangerous attributes from the html-tags in this document.
; Removes onload, onerror etc...
Procedure.s SecureHTMLAttributes(OriginalHTML.s)
  ; this attributes will get removed completely
  Protected Attributes.s = "onabort,onblur,onchange,onclick,ondblclick,onerror,onfocus,onkeydown,onkeypress,onkeyup,onload,onmousedown,onmousemove,onmouseout,onmouseover,onmouseup,onreset,onselect,onsubmit,onunload,javascript:,javascript :,eval,script:"
  
  Protected output.s = ""
  Protected Code.s   = GenerateCode(6) ; used to replace the attribute
  Protected EndPos.i = 1
  Repeat
    Protected i.i = FindString(OriginalHTML.s, "<", EndPos.i)
    If i.i > 0
      output.s = output.s + Mid(OriginalHTML.s, EndPos.i, i.i - EndPos.i)
      EndPos.i = FindString(OriginalHTML.s, ">", i.i + 1)
      Protected found.s = Mid(OriginalHTML.s, i.i, EndPos.i - i.i + 1)
      ; check the content of the tag
      Protected j.i
      For j.i = 1 To CountString(Attributes.s, ",") + 1
        Protected Element.s = StringField(Attributes.s, j.i, ",")
        found.s = ReplaceString(found.s, Element.s, "s" + Code.s, #PB_String_NoCase)
      Next
      output.s = output.s + found.s
      EndPos.i = EndPos.i + 1
    EndIf
    
  Until i.i < 1 Or i.i > Len(OriginalHTML.s)
  
  ; add the rest (if some text is behind the last tag)
  If EndPos.i <= Len(OriginalHTML.s)
    output.s = output.s + Mid(OriginalHTML.s, Endpos.i) ; add the rest (no tags inside)
  EndIf
  
  ProcedureReturn output.s  
EndProcedure

; EXAMPLE CALLS

Define html.s = "<html><body><div>div content</div><script>alert('test');</script><img src="+Chr(34)+"abc.jpg"+Chr(34)+" onblur="+Chr(34)+"alert('bad boy');"+Chr(34)+"></body></html>"
Debug "Original: " + html.s

Define nodiv.s = RemoveHTMLTags(html.s, "", "div")
Debug "no div blocks: " + nodiv.s

Define noimg.s = RemoveHTMLTags(html.s, "img", "")
Debug "no img tags: " + noimg.s

Define secure.s = SecureHTMLAttributes(html.s)
secure.s = RemoveHTMLTags(secure.s, "applet|script|object|iframe", "script|noframes|noscript|object|iframe")
Debug "secured: " + secure.s
Wenn es Verbesserungen gibt, dann nur her damit :-)

Grüße,

Kukulkan

Re: HTML code sicherer machen (Tags entfernen etc.)

Verfasst: 26.02.2013 16:26
von Thorium
Als Proxy nützlich, wäre ein interessantes PB-Projekt.