Page 1 of 1

Regular Expressions Engine (deelx) with PureBasic

Posted: Tue Jul 31, 2012 7:45 am
by applePi
here is how to use purebasic with an excellent perl compatible regular expressions engine called DEELX (freeware). at first download DLL version binary file from codeproject http://www.codeproject.com/Articles/159 ... gine-for-C or from here http://www.mediafire.com/?33basaihl2e5xez with the PB codes and the text file if you don't have a free account. the author site for the C++ version is here http://www.regexlab.com/en/
the dll contains 20 functions which we can know from using this code:

Code: Select all

Define dll.l = OpenLibrary(#PB_Any,"libdeelx.dll")
ExamineLibraryFunctions(dll)
For i = 1 To 20
NextLibraryFunction()
R.s = LibraryFunctionName()
Debug R
Next
FreeLibrary_(dll)
searching for a pattern:
suppose we have the string "ww12v345PY24536567" and we want to search the beginning and end positions of the pattern "2.*?3" ie digit 2 followed by any chars or digits until we find digit 3 with the minimum search span and over the whole string

Code: Select all

handle.l = 0:result.l= 0:nextSearch.l
txt.s = "ww12v345PY24536567"
regex.s = "2.*?3"
OpenLibrary(0,"libdeelx.dll")
    *func = GetFunction(0, "regexp_create")
    handle = CallFunctionFast(*func )
    *func = GetFunction(0, "result_create")
    result = CallFunctionFast(*func )
    *func = GetFunction(0, "regexp_compile")
    CallFunctionFast(*func ,handle,@regex,0)
    Debug "text =           "+txt
    Debug "pattern   =    "+regex
    Debug ".........................................."
    startpos.l = -1
    For i.l = 1 To Len(txt)
      *func = GetFunction(0, "regexp_match")
      CallFunctionFast(*func ,handle,@txt,startpos,result)
      *func = GetFunction(0, "result_ismatched")
      rs.l = CallFunctionFast(*func,result)
    
      If rs <> 0 
        *func = GetFunction(0, "result_start")
        resultStart.l = CallFunctionFast(*func, result)
        *func = GetFunction(0, "result_end")
        resultEnd.l = CallFunctionFast(*func, result)
        txtpart.s = Mid(txt, resultStart + 1, resultEnd - resultStart)
        Debug "pattern begins at       "+Str(resultStart)
        Debug "pattern ends at         "+Str(resultEnd-1)
        Debug "substring matched     "+txtpart
        Debug "==================================="
    startpos = resultEnd
  Else 
    Break
  EndIf
Next

*func = GetFunction(0, "regexp_free")
CallFunctionFast(*func, handle)
*func = GetFunction(0, "result_free")
CallFunctionFast(*func, result)
FreeLibrary_(0)


the results should be this:
pattern begins at 3
pattern ends at 5
substring matched 2v3
===================================
pattern begins at 10
pattern ends at 13
substring matched 2453
===================================

replace text:
download this text file to use with the following code
http://www.textfiles.com/stories/abbey.txt
we want to replace "the" capital or small with "CATSZ"

Code: Select all

Enumeration
  #DLL
  #FILE
  #OUT
EndEnumeration
handle.l = 0
regex.s = "[Tt][Hh][Ee]"
ReplacedTo.s = "CATSZ"
Global null.s
OpenLibrary(#DLL,"libdeelx.dll")
*func = GetFunction(#DLL, "regexp_create")
handle = CallFunctionFast(*func )
*func = GetFunction(#DLL, "regexp_compile")
CallFunctionFast(*func ,handle,@regex,0)

CreateFile(#OUT, "out.txt") 
  
If ReadFile(#FILE, "abbey.txt")
  While Eof(#FILE) = 0 
    txt.s = ReadString(#FILE)
    If Len(txt)=0
      Continue
    EndIf  
      
    *func = GetFunction(#DLL, "regexp_replace")
    result_length.l = CallFunctionFast(*func ,handle,@txt,@regex,-1,-1,@null,0)
    result.s = Space(result_length*2)
    CallFunctionFast(*func ,handle,@txt,@ReplacedTo,-1,-1,@result, result_length*2)
    
    WriteString(#OUT, result)
    WriteString(#OUT,Chr(13)+Chr(10))
    
    Wend
      
  EndIf
  CloseFile(#FILE)
  CloseFile(#OUT)  
*func = GetFunction(#DLL, "regexp_free")
CallFunctionFast(*func, handle)
FreeLibrary_(#DLL)


the vb demo example from codeproject contains a funny example , he want to search for words wich have a digit at its end then to move that digit to the beginning of the word
the string is: "x1 yy2 zzz3"
the pattern wanted (\w+)(\d)
the replacement pattern $2$1 means reverse the found pattern
so the result will be "1x 2yy 3zzz"
the best introductory book is " Sams Teach Yourself Regular Expressions in 10 Minutes ":
http://www.forta.com/books/0672325667/
"10 minutes" he means a month or more
note that i have used a small number of functions from the engine dll, the other functions i don't know yet how to use. the above functions are sufficent for me .

more ref:
Regular Expressions for DarkBasic:
http://forum.thegamecreators.com/?m=for ... 196596&b=1

Re: Regular Expressions Engine (deelx) with PureBasic

Posted: Tue Jul 31, 2012 5:25 pm
by Tenaja
What advantage does this library have over the PCRE library that PB uses?

Re: Regular Expressions Engine (deelx) with PureBasic

Posted: Tue Jul 31, 2012 10:06 pm
by applePi
i don't know what is the difference between perl re and pcre, but here http://www.regular-expressions.info/lookaround.htmlthey said "PCRE is not fully Perl-compatible when it comes to lookbehind" such as searching the text "zxy yzxy zxye uzxy" for the pattern "(?<!y)zxy\b" ie we want zxy at the end of the word but not preceded by y , using the attached above code will return the bold zxy in "zxy yzxy zxye uzxy".
i need to search more about the subject.

Re: Regular Expressions Engine (deelx) with PureBasic

Posted: Wed Dec 05, 2012 7:09 pm
by applePi
after using Lib2PBImport http://www.purebasic.fr/english/viewtop ... 14&t=25353
to make *.pbi ,importing the libdeelx.LIB makes the code shorter and more readable, look below, attached the libdeelx.LIB, libdeelx.dll, libdeelx.PBI and the code for searching patterns and replace here
http://www.mediafire.com/?de4lexlle19j48i
and the following code show the code for searching a pattern

Code: Select all

IncludeFile "libdeelx.pbi"
  
handle.l = 0:result.l= 0:nextSearch.l
handle = regexp_create()
result = result_create()
txt.s = "ww12v345PY24536567"
regex.s = "2.*?3"
regexp_compile(handle, @regex, 0)
Debug "text =           "+txt
Debug "pattern   =    "+regex
Debug ".........................................."
startpos = -1
For i = 1 To Len(txt)
  regexp_match(handle, @txt, startpos, result)
  
  If result_ismatched(result) <> 0 
    resultStart = result_start(result)
    resultEnd = result_end(result)
    txtpart.s = Mid(txt, resultStart + 1, resultEnd - resultStart)
        Debug "pattern begins at       "+Str(resultStart)
        Debug "pattern ends at         "+Str(resultEnd-1)
        Debug "substring matched     "+txtpart
        Debug "==================================="
    startpos = resultEnd
  Else 
    Break
  EndIf
Next
regexp_free(handle)
result_free(result)
and the code for replacing text (opening a file then replacing every occurrence of "the" (capital or small letter ) with "CATSZ" in the attached txt file

Code: Select all

IncludeFile "libdeelx.pbi"

Enumeration
  #DLL
  #FILE
  #OUT
EndEnumeration
handle.l = 0
regex.s = "[Tt][Hh][Ee]"
ReplacedTo.s = "CATSZ"
Global null.s
handle = regexp_create()
regexp_compile(handle, @regex, 0)

CreateFile(#OUT, "out.txt") 
  
If ReadFile(#FILE, "abbey.txt")
  While Eof(#FILE) = 0 
    txt.s = ReadString(#FILE)
    If Len(txt)=0
      Continue
    EndIf  
    
    result_length.l = regexp_replace(handle,@txt,@regex,-1,-1,@null,0)
    result.s = Space(result_length*2)
    regexp_replace(handle,@txt,@ReplacedTo,-1,-1,@result, result_length*2)
        
    WriteString(#OUT, result)
    WriteString(#OUT,Chr(13)+Chr(10))
    
    Wend
      
  EndIf
  CloseFile(#FILE)
  CloseFile(#OUT)  
regexp_free(handle)