ReplaceRegularExpression

Just starting out? Need help? Post your questions and find answers here.
lionel_om
User
User
Posts: 31
Joined: Wed Jul 18, 2007 4:14 pm
Location: France

ReplaceRegularExpression

Post by lionel_om »

Hi all,

I got an issue : I'll like to use advanced regex replacements.
Normally when we create a Regex, we can create groups/catches (with parenthesis). Then in Replace functions we can replace a part of the string by a group catch.
In PHP we use "\\x" with x = index of the group. But it doesn't work in PB.

Here is my code :

Code: Select all

Procedure.s Ereg_Replace(Text$, Pattern$, Replace$ = "", Options.l = #PB_RegularExpression_DotAll |  #PB_RegularExpression_Extended |  #PB_RegularExpression_AnyNewLine)
  hRegex = CreateRegularExpression(#PB_Any, Pattern$, Options)
  If hRegex
    Text$ = ReplaceRegularExpression(hRegex, Text$, Replace$)
    FreeRegularExpression(hRegex)
  Else
    Debug "Can't create a Regex with this pattern : " + Pattern$
  EndIf
  ProcedureReturn Text$
EndProcedure

; HTML code : removes tags properties (id, class, name, onXXX, ...)
Text$ = "<a onclick='test'></a>"
Text$ = Ereg_Replace(Text$, "<([a-zA-Z]+)\ *[^>]+>", "<\\1>")
Debug Text$
It doesn't work. It replace any tag containing properties by "<\\1>".

Please share tips/fixes if you have some on it.

Regards
/Lionel
Webmaster of Basic-univers
lionel_om
User
User
Posts: 31
Joined: Wed Jul 18, 2007 4:14 pm
Location: France

Post by lionel_om »

Solve thanks to AND51 and this post : http://www.purebasic.fr/english/viewtop ... c&start=15:

Replacements strings should be used tis way : "/\1".

/Lio
Last edited by lionel_om on Thu Dec 18, 2008 6:15 pm, edited 1 time in total.
Webmaster of Basic-univers
lionel_om
User
User
Posts: 31
Joined: Wed Jul 18, 2007 4:14 pm
Location: France

Post by lionel_om »

Hi all,

This seems to doesn't work anymore.
Here an example:

Code: Select all

Options.l = #PB_RegularExpression_DotAll |  #PB_RegularExpression_Extended |  #PB_RegularExpression_AnyNewLine
Pattern$ = "<([a-zA-Z]+)\ *[^>]+>"
Text$ = " <p class=hello> <a href=test><script> text inside </script></a> after"
Replace$ = "</\1>"


hRegex = CreateRegularExpression(#PB_Any, Pattern$, Options)
If hRegex
  Debug ReplaceRegularExpression(hRegex, Text$, Replace$)
  FreeRegularExpression(hRegex)
Else
  Debug "Can't create a Regex with this pattern : " + Pattern$
EndIf
It's a simple Regex to remove the properties of every HTML tag.
Do someone have any idea on how could it be fixed ?

Thanks
/Lio
Webmaster of Basic-univers
AND51
Addict
Addict
Posts: 1040
Joined: Sun Oct 15, 2006 8:56 pm
Location: Germany
Contact:

Post by AND51 »

It's a simple Regex to remove the properties of every HTML tag.
Do I understand you right?
You just want to eliminate the attributes?

<a href="..."> :arrow: <a>
<img src="..."> :arrow: <src>
<p class="hello"> :arrow: <p>
PB 4.30

Code: Select all

onErrorGoto(?Fred)
lionel_om
User
User
Posts: 31
Joined: Wed Jul 18, 2007 4:14 pm
Location: France

Post by lionel_om »

AND51 wrote:Do I understand you right?
You just want to eliminate the attributes?
You're right !
Webmaster of Basic-univers
AND51
Addict
Addict
Posts: 1040
Joined: Sun Oct 15, 2006 8:56 pm
Location: Germany
Contact:

Post by AND51 »

I'm working on it...
PB 4.30

Code: Select all

onErrorGoto(?Fred)
AND51
Addict
Addict
Posts: 1040
Joined: Sun Oct 15, 2006 8:56 pm
Location: Germany
Contact:

Post by AND51 »

This could be a possible solution, but I didn't know that look behind assertions must have a fixed length.

Code: Select all

Procedure.s RemoveHtmlAttributes(html$)
	Protected exp=CreateRegularExpression(#PB_Any, "(?Us)(?<=<\w+).*(?=>)")
	If Not exp
		Debug RegularExpressionError()
		End
	EndIf
	html$=ReplaceRegularExpression(exp, html$, "")
	FreeRegularExpression(exp)
	ProcedureReturn html$
EndProcedure


Define test.s="<a href=http://www.and51.de>Click this <hr size=6>image to get there <img src='images/logo.png' border=0></a>"
Debug RemoveHtmlAttributes(test)
PB 4.30

Code: Select all

onErrorGoto(?Fred)
AND51
Addict
Addict
Posts: 1040
Joined: Sun Oct 15, 2006 8:56 pm
Location: Germany
Contact:

Post by AND51 »

This version gives my look behind a fixed length, because in a For-loop I am counting from 1 to 15 (15 should be enough).

The look behind makes sure that A, P, CENTER, BODY, etc. will remain, but its attributes will be eliminated. The longest HTML-tag, that quickly came to my mind is BLOCKQUOTE (10 letters). Is there any tag that is longer? To catch all tags, my For counts up to 15.

Code: Select all

Procedure.s RemoveHtmlAttributes(html$)
	Protected exp, n
	For n=1 To 15 ; <BLOCKQUOTE> has 10 letters, but to be sure we take 15
		exp=CreateRegularExpression(#PB_Any, "(?Us)(?<=<\w{"+Str(n)+"})\s.*(?=>)")
		html$=ReplaceRegularExpression(exp, html$, "")
		FreeRegularExpression(exp)
	Next
	ProcedureReturn html$
EndProcedure


Define test.s="<a href=http://www.and51.de>Click this <hr "+#CRLF$+"size=6>image To get there <img src='images/logo.png' border=0></a>"
Debug RemoveHtmlAttributes(test)
PB 4.30

Code: Select all

onErrorGoto(?Fred)
AND51
Addict
Addict
Posts: 1040
Joined: Sun Oct 15, 2006 8:56 pm
Location: Germany
Contact:

Post by AND51 »

Sorry for spamming. :)

You can also use a Repeat loop, to replace automatically until there is nothing more that you can replace.
Although, this solution copies the string (and thus needs 2x memory), it might be the fastest, because it only counts as long as neccessary.
For example, if there is no tag longer than 6 letters (e. g. CENTER), then my loop only counts up to 7.

Code: Select all

Procedure.s RemoveHtmlAttributes(html$)
	Protected exp, n=1, old.s
	Repeat
		old=html$
		exp=CreateRegularExpression(#PB_Any, "(?Us)(?<=<\w{"+Str(n)+"})\s.*(?=>)")
		html$=ReplaceRegularExpression(exp, html$, "")
		FreeRegularExpression(exp)
		n+1
	Until html$ = old ; until there's nothing left to replace
	ProcedureReturn html$
EndProcedure


Define test.s="<a href=http://www.and51.de>Click this <hr "+#CRLF$+"size=6>image To get there <img src='images/logo.png' border=0></a>"
Debug RemoveHtmlAttributes(test)
PB 4.30

Code: Select all

onErrorGoto(?Fred)
lionel_om
User
User
Posts: 31
Joined: Wed Jul 18, 2007 4:14 pm
Location: France

Post by lionel_om »

Thanks for your help AND51.
I know what is a look head. I prefer the way I've tried as it is faster and consume less resources. It was working before but It's not working anymore. It's strange.

I've read the documentation of the Regex plugin, but I can't find the pattern to retrieve the caught group.

I hope someone can come up with this solution.
/Lio
Webmaster of Basic-univers
lionel_om
User
User
Posts: 31
Joined: Wed Jul 18, 2007 4:14 pm
Location: France

Post by lionel_om »

AND51 wrote:Sorry for spamming. :)
Your help is welcome !!!
/Lio
Webmaster of Basic-univers
Post Reply