Page 1 of 1
PB RegularExpression Limitations - groups()?
Posted: Sun Jun 23, 2013 2:36 am
by zxtunes.com
Python sample:
Code: Select all
s='<div class="t_i_date"> Yesterday <span class="t_i_time">14:47</span> </div>'
r='<div class=\\"t_i_date\\"> (.*) <span class=\\"t_i_time\\">(.*)<\/span> <\/div>'
print s
print r
m=re.match(r,s)
g=m.groups()
print g[0], g[1]
result = Yesterday 14:47 - Good
PB result = <div class="t_i_date"> Yesterday <span class="t_i_time">14:47</span> </div> - Bad
PB gives the entire line, along with the pattern search.
How can I get only the contents of the brackets?
Re: PB RegularExpression Limitations - groups()?
Posted: Sun Jun 23, 2013 4:06 am
by Zach
Use some logic involving FindString / RemoveString to identify the HTML characters or whole tags and strip them?
Re: PB RegularExpression Limitations - groups()?
Posted: Sun Jun 23, 2013 6:47 am
by JHPJHP
Try COMatePLUS:
http://purecoder.net/comate.htm
In the [Basic demos] folder see Demo_RegExp.pb
Re: PB RegularExpression Limitations - groups()?
Posted: Sun Jun 23, 2013 8:52 am
by Little John
With PureBasic 5.11, for instance the following code yields the desired result for your given example, using so called "positive lookbehind" (if needed, replace in the strings
' with
"):
Code: Select all
source$ = "<div class='t_i_date'> Yesterday <span class='t_i_time'>14:47</span> </div>"
regex$ = "(?<=<div class='t_i_date'>|<span class='t_i_time'>)[^<]*"
If CreateRegularExpression(0, regex$)
Dim Result$(0)
NbFound = ExtractRegularExpression(0, source$, Result$())
For k = 0 To NbFound-1
Debug "'" + Result$(k) + "'"
Next
Else
Debug RegularExpressionError()
EndIf
Re: PB RegularExpression Limitations - groups()?
Posted: Sun Jun 23, 2013 9:21 am
by NicknameFJ
Hallo,
I had the same question
http://www.purebasic.fr/german/viewtopi ... =3&t=25124.
In the above thread edel shows me how to get the group result. I put it into a procedure to handle it like ExtractRegularExpression does.
The procedure put the full solution and the content of the groups into an structured array. Each array element is containing only a list. The first element of every list gets the whole solution and every next element is containing the content of the groups.
Here is the code:
Code: Select all
ImportC ""
pb_pcre_exec(*pcre, *extra, subject.p-ascii, length, startoffset, options, *ovector, ovecsize)
pb_pcre_get_substring(subject.p-ascii, *ovector, stringcount, stringnumber, *stringptr)
pb_pcre_free_substring(*stringptr)
EndImport
Structure RegEx_Klammern
List RegEx_SubItem.s()
EndStructure
EnableExplicit
Procedure ExtractRegExItems(Regex,subject.s,Array Items.RegEx_Klammern(1))
Protected len = Len(subject), offset = 0, first_sub = 0, count = 0
Protected TrefferCounter = -1, Erg = 0, numcount = 0, Text$ = ""
Protected Dim ovec(30)
If regex
count = pb_pcre_exec(PeekL(regex), 0, subject, len, offset, 0, @ovec(), 30)
While count>0
TrefferCounter +1
ReDim Items(TrefferCounter)
Erg = 0
For numCount = 0 To count-1
Erg= pb_pcre_get_substring(subject, ovec(), count, numcount, @first_sub)
If Erg >= 0
Text$ = PeekS(first_sub,-1,#PB_Ascii)
AddElement(Items(TrefferCounter)\RegEx_SubItem())
Items(TrefferCounter)\RegEx_SubItem() = Text$
EndIf
Next
pb_pcre_free_substring(first_sub)
If Offset = ovec(1)
offset = ovec(1)+1
Else
offset = ovec(1)
EndIf
count = pb_pcre_exec(PeekL(regex), 0, subject, len, offset, 0, @ovec(), 30)
Wend
EndIf
ProcedureReturn TrefferCounter +1
EndProcedure
; DEMO
Dim Items.RegEx_Klammern(0)
Define subject.s = "abc123abc def456def"
Define pattern.s = "([a-z]+)([0-9]+)\1"
Define RegEx, Groesse, i, Text$, Counter
RegEx = CreateRegularExpression(#PB_Any, pattern)
Groesse = ExtractRegExItems (RegEx,subject, Items())
For i = 0 To Groesse-1
Counter = 0
ForEach Items(i)\RegEx_SubItem()
If Counter = 0
Text$ = "full solution: "
Else
Text$ = "Content of "+Str(Counter)+". group: "
EndIf
Debug Text$ + Items(i)\RegEx_SubItem()
counter +1
Next
Debug "--------------------------------"
Next
NicknameFJ
Re: PB RegularExpression Limitations - groups()?
Posted: Sun Jun 23, 2013 10:55 pm
by zxtunes.com
THX!!
Some fixed from me:
Code: Select all
pb_pcre_exec(*pcre, *extra, subject.p-utf8, length, startoffset, options, *ovector, ovecsize)
pb_pcre_get_substring(subject.p-utf8, *ovector, stringcount, stringnumber, *stringptr)
...
Text$ = PeekS(first_sub,-1,#PB_UTF8)
And work fine.
However, I do not understand. Why FRED ignore such an important thing?
Anyone who has programmed in JS / PHP / Python having come to the PB get frustrated.
Re: PB RegularExpression Limitations - groups()?
Posted: Mon Jun 24, 2013 7:47 pm
by NicknameFJ
*** LITTLE BUG FIXED ***
New Code in my last post
NicknameFJ
Re: PB RegularExpression Limitations - groups()?
Posted: Tue Jun 25, 2013 2:13 am
by citystate
zxtunes.com wrote:However, I do not understand. Why FRED ignore such an important thing?
my guess is that it's because Fred isn't an expert in everything (sorry Fred).
I find it's better to be amazed at everything that PureBasic
can do, than bemoan the things it can't (yet)

Re: PB RegularExpression Limitations - groups()?
Posted: Tue Jun 25, 2013 2:29 am
by JHPJHP
Interview 2012 with Frédéric ‘AlphaSND’ Laboureur
http://www.purearea.net/pb/english/index.htm
26.
Some people are complaining about feature requests, which are made already years ago and are renewed from time to time, and because of a missing (positive) answer they feel ignored by you… What do you say to them – are some things to hard to implement, are there other reasons…?
What I can say is we are reading all the features requests. We don't answer because we don't want the users to have expectation on a feature when we have no time frame about the implementation...
Re: PB RegularExpression Limitations - groups()?
Posted: Sun Oct 06, 2013 1:22 am
by eddy
x64 fixup
updated code below:
Code: Select all
EnableExplicit
ImportC ""
pb_pcre_exec(*pcre, *extra, subject.p-utf8, length, startoffset, options, *ovector, ovecsize)
pb_pcre_get_substring(subject.p-utf8, *ovector, stringcount, stringnumber, *stringptr)
pb_pcre_free_substring(*stringptr)
EndImport
Structure REGEX_MATCH
StartPosition.i ; PB string position index (first index is 1)
List Groups.s() ; array of sub strings
Map NamedGroups.s() ; map of sub strings (TODO)
EndStructure
Procedure ExtractRegexItems(Regex, Subject.s, Array Matches.REGEX_MATCH(1), StartPosition = 1)
Protected MatchCounter = -1, len = Len(Subject), offset=StartPosition-1
Protected subCount = 0, subIndex = 0, subString = 0, subLength = 0
Protected Dim ovec.Long(30)
If Regex And StartPosition>0
While offset<len
subCount=pb_pcre_exec(PeekL(Regex), 0, Subject, len, offset, 0, ovec(), ArraySize(ovec()))
If subCount=0 : Break : EndIf
;register new match and its position
MatchCounter+1
ReDim Matches(MatchCounter)
Matches(MatchCounter)\StartPosition=ovec(0)\l+1
;register sub strings of new match
For subIndex=0 To subCount-1
subLength=pb_pcre_get_substring(Subject, ovec(), subCount, subIndex, @subString)
If subLength >= 0
AddElement(Matches(MatchCounter)\Groups())
Matches(MatchCounter)\Groups()=PeekS(subString, -1, #PB_UTF8)
EndIf
Next
pb_pcre_free_substring(subString)
;find next offset
offset=ovec(1)\l+1*Bool(offset = ovec(1))
Wend
EndIf
ProcedureReturn MatchCounter+1
EndProcedure
; DEMO
Dim Matches.REGEX_MATCH(0)
Define subject.s = "abc123abc def456def"
Define pattern.s = "([a-z]+)([0-9]+)\1"
Define regex, matchCount, groupIndex, i, Text$
regex = CreateRegularExpression(#PB_Any, pattern)
matchCount = ExtractRegexItems(regex, subject, Matches())
For i = 0 To matchCount-1
ForEach Matches(i)\Groups()
groupIndex=ListIndex(Matches(i)\Groups())
If groupIndex = 0
Text$ = "Full solution at position "+Matches(i)\StartPosition+": "
Else
Text$ = "Content of "+Str(groupIndex)+". group: "
EndIf
Debug Text$ + Matches(i)\Groups()
Next
Debug "--------------------------------"
Next