Strip away all HTML - Regular Expression
Posted: Mon Sep 14, 2009 4:37 pm
Hi, I posted this as an answer to another thread, but its a useful trick and shows how powerful Regular Expressions are.
Basically, this short routine strips away all the HTML leaving only the content.
RESULT1$ Only strips off the HTML (so there may be space problems)
RESULT2$ Replaces the HTML with a single space, then trims away any double spaces
Basically, this short routine strips away all the HTML leaving only the content.
RESULT1$ Only strips off the HTML (so there may be space problems)
RESULT2$ Replaces the HTML with a single space, then trims away any double spaces
Code: Select all
CreateRegularExpression(0, "\<[^\<]+\>")
STRING$="hELLO<P ALIGN=left HEIGHT=22>Hello World</p> wORLD <Br>" ; Yes I know this is bullsh*t HTML
RESULT1$=ReplaceRegularExpression(0, STRING$,"")
RESULT2$=trim(ReplaceString(ReplaceString(ReplaceRegularExpression(0, STRING$," ")," "," ")," "," "))
Debug RESULT1$
Debug RESULT2$
End