Help with RegExp library for PB

Just starting out? Need help? Post your questions and find answers here.
neomember
User
User
Posts: 14
Joined: Tue Jul 27, 2004 12:17 am

Help with RegExp library for PB

Post by neomember »

I'm trying to use regular expression to parse HTML file.

I've did some tests with the RegExp Library for PB from 'FloHimself' found at the link below:
http://www.purearea.net/pb/english/userlibs.php

The expression <HTML\b[^>]*> should match the tag <HTML> in:
<HTML><HEAD><TITLE>PureBasic : ...i/TITLE></HEAD><BODY></BODY></HTML>

I want to remove the tag.

This is the code i've tried

Code: Select all

; Simple example on using the RegExp PureBasic library
; FloHimself (FloHimself@web.de) - Oct 14, 2003

;*Reg = RegComp ("(<TITLE>|<title>)(.*)(</TITLE>|</title>)") ; compiles a regular expression 

*Reg = RegComp ("<HTML\b[^>]*>") ; compiles a regular expression 
Html$ = "<HTML><HEAD><TITLE>PureBasic : visual basic compiler, easy & optimized basic programming language, basic, compiler</TITLE></HEAD><BODY></BODY></HTML>"
RegExec(*Reg, Html$)  ; Returns 1 for success (match) and 0 for failure (no match)
Title$ = Space(500)          ; Size destination buffer to store substitution
RegSub(*Reg, "\2", Title$)   ; Copy substitution to destination buffer
Debug Title$
It doesn't work.

I don't understand the second argument in the RegSub() function ("\2").
From the help file it says:
RegSub()

Syntax

RegSub(*Reg, Source$, Dest$)
Description

RegSub(*Reg, Source$, Dest$) copies Source$ to Dest$, making substitutions according to the most recent regexec performed using *Reg. Size the Dest$ buffer large enough to store the substitution, otherwise a runtime error may occur!
Each instance of '&' in Source$ is replaced by the substring. Each instance of '\n', where n is a digit, is replaced by a stored substring. To get a literal '&' or '\n' into Dest$, prefix it with \; to get a literal \ preceding '&' or \n, prefix it with another \.
Anybody got any luck with that library??

Here's a link to test regular expression:
http://www.javaregex.com/test.html
Last edited by neomember on Wed May 03, 2006 10:22 pm, edited 1 time in total.
neomember
User
User
Posts: 14
Joined: Tue Jul 27, 2004 12:17 am

Post by neomember »

Ok, i found out that the second argument is for something called "backreference".

I still need help with making any expression working.

From:
http://www.regular-expressions.info/brackets.html

Backreference
Besides grouping part of a regular expression together, round brackets also create a "backreference". A backreference stores the part of the string matched by the part of the regular expression inside the parentheses.

To figure out the number of a particular backreference, scan the regular expression from left to right and count the opening round brackets.
Example:
(<TITLE>|<title>)(.*)(</TITLE>|</title>)

Applied to:
<HTML><HEAD><TITLE>PureBasic : visual basic compiler, easy & optimized basic programming language, basic, compiler</TITLE></HEAD><BODY></BODY></HTML>


"&" = is equivalent to one entire regex match (can be used multiple times)
"\0" = will select the entire regex match as backreference zero
"\1" = will select the match of the first backreference (group)
"\2" = will select the match of the second backreference (group)
"\3" = will select the match of the third backreference (group)
... = and so on...

so...

"&" = <HTML><HEAD><TITLE>PureBasic : visual basic compiler, easy & optimized basic programming language, basic, compiler</TITLE></HEAD><BODY></BODY></HTML>

"&&" = <HTML><HEAD><TITLE>PureBasic : visual basic compiler, easy & optimized basic programming language, basic, compiler</TITLE></HEAD><BODY></BODY></HTML><HTML><HEAD><TITLE>PureBasic : visual basic compiler, easy & optimized basic programming language, basic, compiler</TITLE></HEAD><BODY></BODY></HTML>

"\0" = <HTML><HEAD><TITLE>PureBasic : visual basic compiler, easy & optimized basic programming language, basic, compiler</TITLE></HEAD><BODY></BODY></HTML>

"\1" = <TITLE>

"\2" = PureBasic : visual basic compiler, easy & optimized basic programming language, basic, compiler

"\3" =</TITLE>
neomember
User
User
Posts: 14
Joined: Tue Jul 27, 2004 12:17 am

Post by neomember »

Well... i guess the library doesn't support all the expressions then.
Last edited by neomember on Wed May 03, 2006 10:23 pm, edited 1 time in total.
Armoured
Enthusiast
Enthusiast
Posts: 365
Joined: Mon Jan 26, 2004 11:39 am
Location: ITALY
Contact:

Post by Armoured »

Hi neomember. :)

Code: Select all

; Simple example on using the RegExp PureBasic library
; FloHimself (FloHimself@web.de) - Oct 14, 2003

;*Reg = RegComp ("(<TITLE>|<title>)(.*)(</TITLE>|</title>)") ; compiles a regular expression

*Reg = RegComp ("(<HTML>)(.*)") ; compiles a regular expression
Html$ = "<HTML><HEAD><TITLE>PureBasic : visual basic compiler, easy & optimized basic programming language, basic, compiler</TITLE></HEAD><BODY></BODY></HTML>"
RegExec(*Reg, Html$)  ; Returns 1 for success (match) and 0 for failure (no match)
Title$ = Space(500)          ; Size destination buffer to store substitution
RegSub(*Reg, "\2", Title$)   ; Copy substitution to destination buffer
Debug Title$
and if you want remove the "<HTML>" and "</HTML>":
; Simple example on using the RegExp PureBasic library
; FloHimself (FloHimself@web.de) - Oct 14, 2003

;*Reg = RegComp ("(<TITLE>|<title>)(.*)(</TITLE>|</title>)") ; compiles a regular expression

*Reg = RegComp ("(<HTML>)(.*)(</HTML>)") ; compiles a regular expression
Html$ = "<HTML><HEAD><TITLE>PureBasic : visual basic compiler, easy & optimized basic programming language, basic, compiler</TITLE></HEAD><BODY></BODY></HTML>"
RegExec(*Reg, Html$) ; Returns 1 for success (match) and 0 for failure (no match)
Title$ = Space(500) ; Size destination buffer to store substitution
RegSub(*Reg, "\2", Title$) ; Copy substitution to destination buffer
Debug Title$
bye!
Post Reply