PureBasic Forums - English

Posted: **Sun Jun 23, 2013 2:36 am**

Python sample:


s='<div class="t_i_date"> Yesterday <span class="t_i_time">14:47</span> </div>'

r='<div class=\\"t_i_date\\"> (.*) <span class=\\"t_i_time\\">(.*)<\/span> <\/div>'

print s
print r

m=re.match(r,s)
g=m.groups()

print g[0], g[1]

result = Yesterday 14:47 - Good

PB result = <div class="t_i_date"> Yesterday <span class="t_i_time">14:47</span> </div> - Bad

PB gives the entire line, along with the pattern search.

How can I get only the contents of the brackets?

Posted: **Sun Jun 23, 2013 4:06 am**

Use some logic involving FindString / RemoveString to identify the HTML characters or whole tags and strip them?

Posted: **Sun Jun 23, 2013 6:47 am**

Try COMatePLUS: http://purecoder.net/comate.htm

In the [Basic demos] folder see Demo_RegExp.pb

Posted: **Sun Jun 23, 2013 8:52 am**

With PureBasic 5.11, for instance the following code yields the desired result for your given example, using so called "positive lookbehind" (if needed, replace in the strings ' with "):

Code: Select all

source$ = "<div class='t_i_date'> Yesterday <span class='t_i_time'>14:47</span> </div>"
regex$  = "(?<=<div class='t_i_date'>|<span class='t_i_time'>)[^<]*"

If CreateRegularExpression(0, regex$)
   Dim Result$(0)
   NbFound = ExtractRegularExpression(0, source$, Result$())
   For k = 0 To NbFound-1
      Debug "'" + Result$(k) + "'"
   Next
Else
   Debug RegularExpressionError()
EndIf

Posted: **Sun Jun 23, 2013 9:21 am**

Hallo,

I had the same question http://www.purebasic.fr/german/viewtopi ... =3&t=25124.

In the above thread edel shows me how to get the group result. I put it into a procedure to handle it like ExtractRegularExpression does.

The procedure put the full solution and the content of the groups into an structured array. Each array element is containing only a list. The first element of every list gets the whole solution and every next element is containing the content of the groups.

Here is the code:

Code: Select all

 ImportC ""
  pb_pcre_exec(*pcre, *extra, subject.p-ascii, length, startoffset, options, *ovector, ovecsize)
  pb_pcre_get_substring(subject.p-ascii, *ovector, stringcount, stringnumber, *stringptr)
  pb_pcre_free_substring(*stringptr)
EndImport


Structure RegEx_Klammern
  List RegEx_SubItem.s() 
EndStructure


EnableExplicit

Procedure ExtractRegExItems(Regex,subject.s,Array Items.RegEx_Klammern(1))
 
  Protected len = Len(subject), offset = 0, first_sub = 0, count = 0
  Protected TrefferCounter = -1, Erg = 0, numcount  = 0, Text$ = "" 
  Protected Dim ovec(30)
 
  If regex
   
    count = pb_pcre_exec(PeekL(regex), 0, subject, len, offset, 0, @ovec(), 30)
   
    While count>0
     
      TrefferCounter +1
      ReDim Items(TrefferCounter)
      Erg = 0
     
      For numCount = 0 To count-1
        Erg=  pb_pcre_get_substring(subject, ovec(), count, numcount, @first_sub)
       
       
        If Erg >= 0
          Text$ = PeekS(first_sub,-1,#PB_Ascii)
          AddElement(Items(TrefferCounter)\RegEx_SubItem())
          Items(TrefferCounter)\RegEx_SubItem() = Text$
        EndIf
       
      Next
     
      pb_pcre_free_substring(first_sub)
     
      
      If Offset = ovec(1)
        offset = ovec(1)+1
      Else
        offset = ovec(1)
      EndIf
      
      
      count = pb_pcre_exec(PeekL(regex), 0, subject, len, offset, 0, @ovec(), 30)
    Wend
  EndIf
 
 
  ProcedureReturn TrefferCounter +1
 
EndProcedure



; DEMO



Dim Items.RegEx_Klammern(0)

Define subject.s = "abc123abc def456def"
Define pattern.s = "([a-z]+)([0-9]+)\1"
Define RegEx, Groesse, i, Text$, Counter

RegEx   = CreateRegularExpression(#PB_Any, pattern)
Groesse = ExtractRegExItems (RegEx,subject, Items())

For i = 0 To Groesse-1 
  Counter = 0
  ForEach Items(i)\RegEx_SubItem()
   
    If Counter = 0
       Text$ = "full solution:  " 
    Else
        Text$ = "Content of "+Str(Counter)+". group:  "
    EndIf
   
     
    Debug Text$ + Items(i)\RegEx_SubItem()     
    counter +1
  Next
 
  Debug "--------------------------------"
 
Next

NicknameFJ

Posted: **Sun Jun 23, 2013 10:55 pm**

NicknameFJ wrote:Hallo,

I had the same question http://www.purebasic.fr/german/viewtopi ... =3&t=25124.

THX!!

Some fixed from me:

Code: Select all

pb_pcre_exec(*pcre, *extra, subject.p-utf8, length, startoffset, options, *ovector, ovecsize)
pb_pcre_get_substring(subject.p-utf8, *ovector, stringcount, stringnumber, *stringptr)
...
Text$ = PeekS(first_sub,-1,#PB_UTF8)

And work fine.

However, I do not understand. Why FRED ignore such an important thing?

Anyone who has programmed in JS / PHP / Python having come to the PB get frustrated.

Posted: **Mon Jun 24, 2013 7:47 pm**

*** LITTLE BUG FIXED ***

New Code in my last post

NicknameFJ

Posted: **Tue Jun 25, 2013 2:13 am**

zxtunes.com wrote:However, I do not understand. Why FRED ignore such an important thing?

my guess is that it's because Fred isn't an expert in everything (sorry Fred).

I find it's better to be amazed at everything that PureBasic can do, than bemoan the things it can't (yet)

Posted: **Tue Jun 25, 2013 2:29 am**

Interview 2012 with Frédéric ‘AlphaSND’ Laboureur

http://www.purearea.net/pb/english/index.htm

26.
Some people are complaining about feature requests, which are made already years ago and are renewed from time to time, and because of a missing (positive) answer they feel ignored by you… What do you say to them – are some things to hard to implement, are there other reasons…?

What I can say is we are reading all the features requests. We don't answer because we don't want the users to have expectation on a feature when we have no time frame about the implementation...

Posted: **Sun Oct 06, 2013 1:22 am**

x64 fixup

Code: Select all

Protected Dim ovec.Long(30)

updated code below:

Code: Select all

EnableExplicit

ImportC ""
  pb_pcre_exec(*pcre, *extra, subject.p-utf8, length, startoffset, options, *ovector, ovecsize)
  pb_pcre_get_substring(subject.p-utf8, *ovector, stringcount, stringnumber, *stringptr)
  pb_pcre_free_substring(*stringptr)
EndImport

Structure REGEX_MATCH
  StartPosition.i     ; PB string position index (first index is 1)
  List Groups.s()     ; array of sub strings
  Map NamedGroups.s() ; map of sub strings (TODO)
EndStructure

Procedure ExtractRegexItems(Regex, Subject.s, Array Matches.REGEX_MATCH(1), StartPosition = 1)
  
  Protected MatchCounter = -1, len = Len(Subject), offset=StartPosition-1
  Protected subCount = 0, subIndex = 0, subString = 0, subLength = 0
  Protected Dim ovec.Long(30)
  
  If Regex And StartPosition>0
    While offset<len
      subCount=pb_pcre_exec(PeekL(Regex), 0, Subject, len, offset, 0, ovec(), ArraySize(ovec()))
      If subCount=0 : Break : EndIf
      
      ;register new match and its position
      MatchCounter+1
      ReDim Matches(MatchCounter)
      Matches(MatchCounter)\StartPosition=ovec(0)\l+1
      
      ;register sub strings of new match
      For subIndex=0 To subCount-1
        subLength=pb_pcre_get_substring(Subject, ovec(), subCount, subIndex, @subString)
        
        If subLength >= 0
          AddElement(Matches(MatchCounter)\Groups())
          Matches(MatchCounter)\Groups()=PeekS(subString, -1, #PB_UTF8)
        EndIf
      Next
      pb_pcre_free_substring(subString)
      
      ;find next offset
      offset=ovec(1)\l+1*Bool(offset = ovec(1))
    Wend
  EndIf
  
  ProcedureReturn MatchCounter+1
EndProcedure

; DEMO


Dim Matches.REGEX_MATCH(0)

Define subject.s = "abc123abc def456def"
Define pattern.s = "([a-z]+)([0-9]+)\1"
Define regex, matchCount, groupIndex, i, Text$

regex = CreateRegularExpression(#PB_Any, pattern)
matchCount = ExtractRegexItems(regex, subject, Matches())

For i = 0 To matchCount-1
  ForEach Matches(i)\Groups()
    
    groupIndex=ListIndex(Matches(i)\Groups())
    If groupIndex = 0
      Text$ = "Full solution at position "+Matches(i)\StartPosition+":  "
    Else
      Text$ = "Content of "+Str(groupIndex)+". group:  "
    EndIf
    
    
    Debug Text$ + Matches(i)\Groups()
  Next
  
  Debug "--------------------------------"  
Next

PureBasic Forums - English

PB RegularExpression Limitations - groups()?

PB RegularExpression Limitations - groups()?

Re: PB RegularExpression Limitations - groups()?

Re: PB RegularExpression Limitations - groups()?

Re: PB RegularExpression Limitations - groups()?

Re: PB RegularExpression Limitations - groups()?

Re: PB RegularExpression Limitations - groups()?

Re: PB RegularExpression Limitations - groups()?

Re: PB RegularExpression Limitations - groups()?

Re: PB RegularExpression Limitations - groups()?

Re: PB RegularExpression Limitations - groups()?