Regular Expressions: backreference

Got an idea for enhancing PureBasic? New command(s) you'd like to see?
mdp
Enthusiast
Enthusiast
Posts: 115
Joined: Mon Apr 18, 2005 8:28 pm

Regular Expressions: backreference

Post by mdp »

(version 4.30)
Bug or unexpected behaviour, anyway serious in this context - regular expressions backreferences do not seem to work.
The following code

Code: Select all

s.s = "This is a string"
CreateRegularExpression(0,"\b([^ ]*?i[^ ]*?)\b")
Debug ReplaceRegularExpression(0,s,"-\1-")
outputs

Code: Select all

-\1- -\1- a -\1-
instead of expected

Code: Select all

-This- -is- a -string-
Is there a proper workaround?
User avatar
IceSoft
Addict
Addict
Posts: 1690
Joined: Thu Jun 24, 2004 8:51 am
Location: Germany

Post by IceSoft »

try this one:

Code: Select all

ImportC "" 
  pcre_exec(*pcre, *extra, subject.s, length, startoffset, options, *ovector, ovecsize) 
  pcre_get_substring(subject.s, *ovector, stringcount, stringnumber, *stringptr) 
  pcre_free_substring(*stringptr) 
EndImport 


subject.s = "This is a string" 

pattern.s = "([^ ]*?i[^ ]*?)" 
len = Len(subject) 
offset = 0 
first_sub = 0 
count = 0 

Dim ovec(30) 

regex = CreateRegularExpression(#PB_Any, pattern) 
Debug regex 
count = pcre_exec(PeekL(regex), 0, subject, len, offset, 0, @ovec(), 30) 

While count>0 
  pcre_get_substring(subject, ovec(), count, 1, @first_sub) 
  Debug "-" + PeekS(first_sub) +"-" 
  pcre_free_substring(first_sub) 
  offset = ovec(1) 
  count = pcre_exec(PeekL(regex), 0, subject, len, offset, 0, @ovec(), 30) 
Wend 
Belive! C++ version of Puzzle of Mystralia
Bug Planet
<Wrapper>4PB, PB<game>, =QONK=, PetriDish, Movie2Image, PictureManager,...
User avatar
pcfreak
User
User
Posts: 75
Joined: Sat May 22, 2004 1:38 am

Post by pcfreak »

try it in unicode mode...
A workaround is still just a workaround.
Teng
User
User
Posts: 21
Joined: Thu Aug 27, 2009 12:13 pm

Post by Teng »

IceSoft wrote:try this one:

Code: Select all

ImportC "" 
  pcre_exec(*pcre, *extra, subject.s, length, startoffset, options, *ovector, ovecsize) 
  pcre_get_substring(subject.s, *ovector, stringcount, stringnumber, *stringptr) 
  pcre_free_substring(*stringptr) 
EndImport 


subject.s = "This is a string" 

pattern.s = "([^ ]*?i[^ ]*?)" 
len = Len(subject) 
offset = 0 
first_sub = 0 
count = 0 

Dim ovec(30) 

regex = CreateRegularExpression(#PB_Any, pattern) 
Debug regex 
count = pcre_exec(PeekL(regex), 0, subject, len, offset, 0, @ovec(), 30) 

While count>0 
  pcre_get_substring(subject, ovec(), count, 1, @first_sub) 
  Debug "-" + PeekS(first_sub) +"-" 
  pcre_free_substring(first_sub) 
  offset = ovec(1) 
  count = pcre_exec(PeekL(regex), 0, subject, len, offset, 0, @ovec(), 30) 
Wend 
Debug output :
3808920
-Thi-
-i-
-stri-
:?: :?:

Using Purebasic 4.31 Demo version x86. What's ImportC? Why isn't this in the pb helpfile?
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Post by srod »

Using Purebasic 4.31 Demo version x86. What's ImportC? Why isn't this in the pb helpfile?
Because you are obviously using a different help manual than the rest of us! ImportC is in the manual :
For advanced programmers. Import : EndImport allows to easy declare external functions and variables from a library (.lib) or an object (.obj) file.

Once declared, the imported functions are directly available for use in the program, like any other commands. The compiler doesn't check if the functions really exists in the imported file, so if an error occurs, it will be reported by the linker.

This feature can replace the OpenLibrary()/CallFunction() sequence as it has some advantages: type checking is done, number of of parameters is validated. Unlike CallFunction(), it can deal with double, float and quad without any problem.

The last parameters can have a default value (need to be a constant expression), so if these parameters are omitted when the function is called, the default value will be used.

By default the imported function symbol is 'decorated' in the following way: _FunctionName@callsize.That should work for most of the functions which use the standard call convension (stdcall). If the library is a C one, and the function are not stdcall, the ImportC variant should be used instead. In this case, the default function symbol is decorated like: _FunctionName.

The pseudotypes can be used for the parameters, but not for the returned value.
Example:
PCRE uses utf-8 internally and so the ImportC code will not function correctly in Unicode mode without arranging for the use of utf-8 etc.
I may look like a mule, but I'm not a complete ass.
mdp
Enthusiast
Enthusiast
Posts: 115
Joined: Mon Apr 18, 2005 8:28 pm

Re: Regular Expressions: backreference

Post by mdp »

Had the need again, found a solution.

Procedure.s BackrefReplaceRegularExpression( regexp_handle , string.s , replacement.s )
(Limited to three backreferences for I did not need extensive and more clever code)

Code: Select all

ImportC "" 
  pcre_exec(*pcre,*extra,subject.s,length,startoffset,options,*ovector,ovecsize)
EndImport

Procedure.s BackrefReplaceRegularExpression( regexp_handle , string.s , replacement.s )
   Static Dim pcre_results(12)
   While pcre_exec(PeekL(regexp_handle),0,string,Len(string),pcre_results(1),0,@pcre_results(),12) > 0
      rpl.s = replacement
      p=pcre_results(0)
      q=pcre_results(1)
      If FindString(replacement,"\1",1)
         p1=pcre_results(2)
         q1=pcre_results(3)
         rpl=ReplaceString(rpl,"\1",PeekS(@string+p1,q1-p1)) 
      EndIf
      If FindString(replacement,"\2",1)
         p1=pcre_results(4)
         q1=pcre_results(5)
         rpl=ReplaceString(rpl,"\2",PeekS(@string+p1,q1-p1)) 
      EndIf
      If FindString(replacement,"\3",1)
         p1=pcre_results(6)
         q1=pcre_results(7)
         rpl=ReplaceString(rpl,"\3",PeekS(@string+p1,q1-p1)) 
      EndIf
      string=Left(string,p)+rpl+Right(string,Len(string)-q) 
   Wend 
   ProcedureReturn string
EndProcedure


rh = CreateRegularExpression(0," t(.*?)e(.*?) ")

Debug BackrefReplaceRegularExpression(rh,"As they functions tried importing them through the tipteptop","_.T\1E\2._")

@IceSoft: thanks for the feedback. Hope you like my "strategy" in case of need.

(@Fred... should not the original replace function work more this way?)
Fred
Administrator
Administrator
Posts: 18162
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Re: Regular Expressions: backreference

Post by Fred »

Back references are not supported yet, moved to feature request and updated doc.
Post Reply