PB RegularExpression Limitations - groups()?

Just starting out? Need help? Post your questions and find answers here.
User avatar
zxtunes.com
Enthusiast
Enthusiast
Posts: 375
Joined: Wed Apr 23, 2008 7:51 am
Location: Saint-Petersburg, Russia
Contact:

PB RegularExpression Limitations - groups()?

Post by zxtunes.com »

Python sample:

Code: Select all


s='<div class="t_i_date"> Yesterday <span class="t_i_time">14:47</span> </div>'

r='<div class=\\"t_i_date\\"> (.*) <span class=\\"t_i_time\\">(.*)<\/span> <\/div>'

print s
print r

m=re.match(r,s)
g=m.groups()

print g[0], g[1]
result = Yesterday 14:47 - Good

PB result = <div class="t_i_date"> Yesterday <span class="t_i_time">14:47</span> </div> - Bad :(

PB gives the entire line, along with the pattern search.

How can I get only the contents of the brackets?
Zach
Addict
Addict
Posts: 1675
Joined: Sun Dec 12, 2010 12:36 am
Location: Somewhere in the midwest
Contact:

Re: PB RegularExpression Limitations - groups()?

Post by Zach »

Use some logic involving FindString / RemoveString to identify the HTML characters or whole tags and strip them?
User avatar
JHPJHP
Addict
Addict
Posts: 2253
Joined: Sat Oct 09, 2010 3:47 am

Re: PB RegularExpression Limitations - groups()?

Post by JHPJHP »

Try COMatePLUS: http://purecoder.net/comate.htm

In the [Basic demos] folder see Demo_RegExp.pb

If you're not investing in yourself, you're falling behind.

My PureBasic StuffFREE STUFF, Scripts & Programs.
My PureBasic Forum ➤ Questions, Requests & Comments.
Little John
Addict
Addict
Posts: 4779
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: PB RegularExpression Limitations - groups()?

Post by Little John »

With PureBasic 5.11, for instance the following code yields the desired result for your given example, using so called "positive lookbehind" (if needed, replace in the strings ' with "):

Code: Select all

source$ = "<div class='t_i_date'> Yesterday <span class='t_i_time'>14:47</span> </div>"
regex$  = "(?<=<div class='t_i_date'>|<span class='t_i_time'>)[^<]*"

If CreateRegularExpression(0, regex$)
   Dim Result$(0)
   NbFound = ExtractRegularExpression(0, source$, Result$())
   For k = 0 To NbFound-1
      Debug "'" + Result$(k) + "'"
   Next
Else
   Debug RegularExpressionError()
EndIf
User avatar
NicknameFJ
User
User
Posts: 90
Joined: Tue Mar 17, 2009 6:36 pm
Location: Germany

Re: PB RegularExpression Limitations - groups()?

Post by NicknameFJ »

Hallo,

I had the same question http://www.purebasic.fr/german/viewtopi ... =3&t=25124.

In the above thread edel shows me how to get the group result. I put it into a procedure to handle it like ExtractRegularExpression does.

The procedure put the full solution and the content of the groups into an structured array. Each array element is containing only a list. The first element of every list gets the whole solution and every next element is containing the content of the groups.

Here is the code:

Code: Select all

 ImportC ""
  pb_pcre_exec(*pcre, *extra, subject.p-ascii, length, startoffset, options, *ovector, ovecsize)
  pb_pcre_get_substring(subject.p-ascii, *ovector, stringcount, stringnumber, *stringptr)
  pb_pcre_free_substring(*stringptr)
EndImport


Structure RegEx_Klammern
  List RegEx_SubItem.s() 
EndStructure


EnableExplicit

Procedure ExtractRegExItems(Regex,subject.s,Array Items.RegEx_Klammern(1))
 
  Protected len = Len(subject), offset = 0, first_sub = 0, count = 0
  Protected TrefferCounter = -1, Erg = 0, numcount  = 0, Text$ = "" 
  Protected Dim ovec(30)
 
  If regex
   
    count = pb_pcre_exec(PeekL(regex), 0, subject, len, offset, 0, @ovec(), 30)
   
    While count>0
     
      TrefferCounter +1
      ReDim Items(TrefferCounter)
      Erg = 0
     
      For numCount = 0 To count-1
        Erg=  pb_pcre_get_substring(subject, ovec(), count, numcount, @first_sub)
       
       
        If Erg >= 0
          Text$ = PeekS(first_sub,-1,#PB_Ascii)
          AddElement(Items(TrefferCounter)\RegEx_SubItem())
          Items(TrefferCounter)\RegEx_SubItem() = Text$
        EndIf
       
      Next
     
      pb_pcre_free_substring(first_sub)
     
      
      If Offset = ovec(1)
        offset = ovec(1)+1
      Else
        offset = ovec(1)
      EndIf
      
      
      count = pb_pcre_exec(PeekL(regex), 0, subject, len, offset, 0, @ovec(), 30)
    Wend
  EndIf
 
 
  ProcedureReturn TrefferCounter +1
 
EndProcedure



; DEMO



Dim Items.RegEx_Klammern(0)

Define subject.s = "abc123abc def456def"
Define pattern.s = "([a-z]+)([0-9]+)\1"
Define RegEx, Groesse, i, Text$, Counter

RegEx   = CreateRegularExpression(#PB_Any, pattern)
Groesse = ExtractRegExItems (RegEx,subject, Items())

For i = 0 To Groesse-1 
  Counter = 0
  ForEach Items(i)\RegEx_SubItem()
   
    If Counter = 0
       Text$ = "full solution:  " 
    Else
        Text$ = "Content of "+Str(Counter)+". group:  "
    EndIf
   
     
    Debug Text$ + Items(i)\RegEx_SubItem()     
    counter +1
  Next
 
  Debug "--------------------------------"
 
Next


NicknameFJ
Last edited by NicknameFJ on Mon Jun 24, 2013 7:46 pm, edited 1 time in total.
PS: Sorry for my weird english, but english is not my native language.



Image
User avatar
zxtunes.com
Enthusiast
Enthusiast
Posts: 375
Joined: Wed Apr 23, 2008 7:51 am
Location: Saint-Petersburg, Russia
Contact:

Re: PB RegularExpression Limitations - groups()?

Post by zxtunes.com »

NicknameFJ wrote:Hallo,

I had the same question http://www.purebasic.fr/german/viewtopi ... =3&t=25124.
THX!!

Some fixed from me:

Code: Select all

pb_pcre_exec(*pcre, *extra, subject.p-utf8, length, startoffset, options, *ovector, ovecsize)
pb_pcre_get_substring(subject.p-utf8, *ovector, stringcount, stringnumber, *stringptr)
...
Text$ = PeekS(first_sub,-1,#PB_UTF8)
And work fine.

However, I do not understand. Why FRED ignore such an important thing?

Anyone who has programmed in JS / PHP / Python having come to the PB get frustrated.
User avatar
NicknameFJ
User
User
Posts: 90
Joined: Tue Mar 17, 2009 6:36 pm
Location: Germany

Re: PB RegularExpression Limitations - groups()?

Post by NicknameFJ »

*** LITTLE BUG FIXED ***

New Code in my last post

NicknameFJ
PS: Sorry for my weird english, but english is not my native language.



Image
citystate
Enthusiast
Enthusiast
Posts: 638
Joined: Sun Feb 12, 2006 10:06 pm

Re: PB RegularExpression Limitations - groups()?

Post by citystate »

zxtunes.com wrote:However, I do not understand. Why FRED ignore such an important thing?
my guess is that it's because Fred isn't an expert in everything (sorry Fred).

I find it's better to be amazed at everything that PureBasic can do, than bemoan the things it can't (yet) :mrgreen:
there is no sig, only zuul (and the following disclaimer)

WARNING: may be talking out of his hat
User avatar
JHPJHP
Addict
Addict
Posts: 2253
Joined: Sat Oct 09, 2010 3:47 am

Re: PB RegularExpression Limitations - groups()?

Post by JHPJHP »

Interview 2012 with Frédéric ‘AlphaSND’ Laboureur

http://www.purearea.net/pb/english/index.htm
26.
Some people are complaining about feature requests, which are made already years ago and are renewed from time to time, and because of a missing (positive) answer they feel ignored by you… What do you say to them – are some things to hard to implement, are there other reasons…?


What I can say is we are reading all the features requests. We don't answer because we don't want the users to have expectation on a feature when we have no time frame about the implementation...

If you're not investing in yourself, you're falling behind.

My PureBasic StuffFREE STUFF, Scripts & Programs.
My PureBasic Forum ➤ Questions, Requests & Comments.
User avatar
eddy
Addict
Addict
Posts: 1479
Joined: Mon May 26, 2003 3:07 pm
Location: Nantes

Re: PB RegularExpression Limitations - groups()?

Post by eddy »

x64 fixup

Code: Select all

Protected Dim ovec.Long(30)

updated code below:

Code: Select all

EnableExplicit

ImportC ""
  pb_pcre_exec(*pcre, *extra, subject.p-utf8, length, startoffset, options, *ovector, ovecsize)
  pb_pcre_get_substring(subject.p-utf8, *ovector, stringcount, stringnumber, *stringptr)
  pb_pcre_free_substring(*stringptr)
EndImport

Structure REGEX_MATCH
  StartPosition.i     ; PB string position index (first index is 1)
  List Groups.s()     ; array of sub strings
  Map NamedGroups.s() ; map of sub strings (TODO)
EndStructure

Procedure ExtractRegexItems(Regex, Subject.s, Array Matches.REGEX_MATCH(1), StartPosition = 1)
  
  Protected MatchCounter = -1, len = Len(Subject), offset=StartPosition-1
  Protected subCount = 0, subIndex = 0, subString = 0, subLength = 0
  Protected Dim ovec.Long(30)
  
  If Regex And StartPosition>0
    While offset<len
      subCount=pb_pcre_exec(PeekL(Regex), 0, Subject, len, offset, 0, ovec(), ArraySize(ovec()))
      If subCount=0 : Break : EndIf
      
      ;register new match and its position
      MatchCounter+1
      ReDim Matches(MatchCounter)
      Matches(MatchCounter)\StartPosition=ovec(0)\l+1
      
      ;register sub strings of new match
      For subIndex=0 To subCount-1
        subLength=pb_pcre_get_substring(Subject, ovec(), subCount, subIndex, @subString)
        
        If subLength >= 0
          AddElement(Matches(MatchCounter)\Groups())
          Matches(MatchCounter)\Groups()=PeekS(subString, -1, #PB_UTF8)
        EndIf
      Next
      pb_pcre_free_substring(subString)
      
      ;find next offset
      offset=ovec(1)\l+1*Bool(offset = ovec(1))
    Wend
  EndIf
  
  ProcedureReturn MatchCounter+1
EndProcedure

; DEMO


Dim Matches.REGEX_MATCH(0)

Define subject.s = "abc123abc def456def"
Define pattern.s = "([a-z]+)([0-9]+)\1"
Define regex, matchCount, groupIndex, i, Text$

regex = CreateRegularExpression(#PB_Any, pattern)
matchCount = ExtractRegexItems(regex, subject, Matches())

For i = 0 To matchCount-1
  ForEach Matches(i)\Groups()
    
    groupIndex=ListIndex(Matches(i)\Groups())
    If groupIndex = 0
      Text$ = "Full solution at position "+Matches(i)\StartPosition+":  "
    Else
      Text$ = "Content of "+Str(groupIndex)+". group:  "
    EndIf
    
    
    Debug Text$ + Matches(i)\Groups()
  Next
  
  Debug "--------------------------------"  
Next
Imagewin10 x64 5.72 | IDE | PB plugin | Tools | Sprite | JSON | visual tool
Post Reply