IsAlpha IsNumeric
-
- Enthusiast
- Posts: 229
- Joined: Wed May 14, 2003 3:38 pm
- Location: Lüneburg - Germany
I know, but does naw?Kale wrote: Thats exactly what i was pointing at earlier on in this thread.

Yes it is a Windows Lib. I couldn't compile it for linux, because i've no linux system running atm...naw wrote:wow! Brilliant - I bet there's only a Lib for Windows, though, Linux would be very nice.
Try: http://www.florian-s.com/download/PureBasic/naw wrote:(I cant check reelmedia because the reelmedia server is refusing connections again)
- tinman
- PureBasic Expert
- Posts: 1102
- Joined: Sat Apr 26, 2003 4:56 pm
- Location: Level 5 of Robot Hell
- Contact:
There's pcre (perl compatible regular expressions) which is GPL and has been ported as both static and dynamic libraries for many platforms. If you can understand the API (and there's not much to it, maybe 6 or 8 functions you need to use) then you should be able to use them in PureBasic using either the Library library or using the DLL importer.naw wrote:wow! Brilliant - I bet there's only a Lib for Windows, though, Linux would be very nice. (I cant check reelmedia because the reelmedia server is refusing connections again)
If you paint your butt blue and glue the hole shut you just themed your ass but lost the functionality.
(WinXPhSP3 PB5.20b14)
(WinXPhSP3 PB5.20b14)
mmmm a bit more work than i expected 
and some things i don't get (yet) about the syntax...
[A-Z]* means any char fro A to Z any numer of times?
[A-Z]+ means same but at least one time?

and some things i don't get (yet) about the syntax...
[A-Z]* means any char fro A to Z any numer of times?
[A-Z]+ means same but at least one time?
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
( The path to enlightenment and the PureBasic Survival Guide right here... )
how to include a '[' then? [[] ?
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
( The path to enlightenment and the PureBasic Survival Guide right here... )
I've started looking at the RegExpr Library, but the example given below is a bit of a mind blower for the uninitiated...
I can't begin to make sense of it - I was kind of hoping for syntax like:
result$=RegExprReplace(string$,regexpr$)
position=RegExprFind(string$,regexpr$)
- oh well...
[/code]
I can't begin to make sense of it - I was kind of hoping for syntax like:
result$=RegExprReplace(string$,regexpr$)
position=RegExprFind(string$,regexpr$)
- oh well...
Code: Select all
; RegCompEx Test
*compiled.REGEXP
Debug RegCompEx(@*compiled, "(<TITLE>|<title>)(.*)(</TITLE>|</title>)")
; RegErrorEx Test
error$ = Space(80)
PeekS(RegErrorEx(#REGEXP_ESPACE, *compiled, error$, 80))
Debug error$
; RegNSubExpEx Test
Debug "Number of SubExpressions: " + Str(RegNSubExpEx(*compiled))
; RegExecEx Test
Test$ = "<HTML><HEAD><TITLE>PureBasic : visual basic compiler, easy & optimized basic programming language, basic, compiler</TITLE></HEAD><BODY></BODY></HTML>"
Dim test.REGMATCH(RegNSubExpEx(*compiled))
Debug RegExecEx(*compiled, Test$, RegNSubExpEx(*compiled), @test(0))
Debug PeekS(@Test$ + test(0)\subexp_begin) ; Test REGMATCH offset
Debug PeekS(@Test$ + test(1)\subexp_begin)
Debug PeekS(@Test$ + test(2)\subexp_begin)
Debug PeekS(@Test$ + test(3)\subexp_begin)
Debug PeekS(@Test$ + test(4)\subexp_begin)
; RegSubEx Test
*buffer = AllocateMemory(1, 200, 0)
Debug RegSubEx(*compiled, Test$, "\2", @*buffer)
Debug "The Buffer contains: " + PeekS(*buffer)
; RegFreeEx Test
RegFreeEx(*compiled)
Ta - N
naw, plz tell me what the following do:
abc\.
abc\\
abc.+
abc.*
abc.[
abc\.
abc\\
abc.+
abc.*
abc.[
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
( The path to enlightenment and the PureBasic Survival Guide right here... )
Code: Select all
abc\. "." is a special character in RE meaning repetitions of the previous character - so "A." will match "AA" but not "AB". The "\" escapes the special meaning of the next character which is "." so effectively "abc\." matches "abc."
abc\\ "\" = escape, so "abc\\" matches "abc\"
abc.+ - sorry dont know what "+" means - never used it...
abc.* will match "abccccc" or abcccdd234234" but not "abdefg"
abc.[ is a badly formed RE - ie "abc.[e-z]" would match "abccd" or "abccccce" or "abccccccccx"
if you wanted to match "abc.[" you would have to use "abc\.\["
Ta - N
is this valid?
"this is a test"
is matched by
"th.*a test"
pretty nasty, by the way, full support of patterns could lead to massive iterations... i think the following pseudo code should do it, still have to code it though
some speed improvement is possible by doing a walk through first and detect startpos / endpos of some blocks... hmmm... better first code this, but now it's bedtime
"this is a test"
is matched by
"th.*a test"
pretty nasty, by the way, full support of patterns could lead to massive iterations... i think the following pseudo code should do it, still have to code it though

Code: Select all
; concept
;
; take pattern apart, split it up in blocks
; per block: type (0 exact match to a number of chars, 1 fancy stuff)
; per block: min (0 or 1) and max (1 or n) characters
; put this stuff in a table
; and now the real stuff... l = len(string)
; startpos(1) = 1, endpos(1) = l
; n = 1
;
; again = false
; repeat
; try to match block(n) *as far away as possible* aka. up to endpos(n)
; if match
; p = found pos (last character of match, in range startpos(n) to endpos(n))
; endpos(n) = p
; inc n
; if n< nr of blocks
; startpos(n) = p+1
; endpos(n) = l
; again = true
; endif
; else
; no match, damn
; dec n
; if n>1
; endpos(n) = endpos(n)-1
; again = true
; endif
; endif
; until again = false
;
; if n < 1 no match
; if n > nr of blocks and endpos(nr of blocks) = l then there is a match
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
( The path to enlightenment and the PureBasic Survival Guide right here... )
there's a little thing unclear about the dot:
so, what is it now?. Matches any single character except newline. In awk, dot can match newline also.
[/endquote]
abc\. "." is a special character in RE meaning repetitions of the previous character - so "A." will match "AA" but not "AB". The "\" escapes the special meaning of the next character which is "." so effectively "abc\." matches "abc."
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
( The path to enlightenment and the PureBasic Survival Guide right here... )
-
- Enthusiast
- Posts: 229
- Joined: Wed May 14, 2003 3:38 pm
- Location: Lüneburg - Germany
that's correct. many unix programs deal with regexps, like grep, sed, awk, vi, some shells.. every one has special metacharcters and some are differently implemented. so its up to you to decide which implementation you will follow.."." Matches any single character except newline. In awk, dot can match newline also.
ok... now, another question 
is this valid? [ABC|CDE|X*]*
that's *very* nasty to code
*edit*
hmmm... looking at the descriptions i can find on the net, there are indeed different variations
ok, the following appears to be valid:
(abd|cde) which means that segment has either to match abd or cde, forcing me into a two dimensional array on my current approach
the following i haven't seen so i assume it isn't valid, or is it?
(abc|[a-z])
this is nasty to code, as backtracking is almost impossible as for every possibility i would have to backtrack, oh it can be done, recursive, but it can take ages to resolve
an alternative approach would be to check the match on the 'known' (fixed) segments, then try to fix the variable ones in between... brrr... what did i start...

is this valid? [ABC|CDE|X*]*
that's *very* nasty to code

*edit*
hmmm... looking at the descriptions i can find on the net, there are indeed different variations

ok, the following appears to be valid:
(abd|cde) which means that segment has either to match abd or cde, forcing me into a two dimensional array on my current approach
the following i haven't seen so i assume it isn't valid, or is it?
(abc|[a-z])
this is nasty to code, as backtracking is almost impossible as for every possibility i would have to backtrack, oh it can be done, recursive, but it can take ages to resolve
an alternative approach would be to check the match on the 'known' (fixed) segments, then try to fix the variable ones in between... brrr... what did i start...
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
( The path to enlightenment and the PureBasic Survival Guide right here... )
-
- Enthusiast
- Posts: 229
- Joined: Wed May 14, 2003 3:38 pm
- Location: Lüneburg - Germany
yes it is. it matches any expression, because the last asterisk (*) means: the expression [ABC|CDE|X*] has to appear "0" or "n" times.is this valid? [ABC|CDE|X*]*
Edit:
sure this is valid, too. this is a very simple expressions: matching "abc" or any lowercase character "a", "b", "c",...,"z"the following i haven't seen so i assume it isn't valid, or is it?
(abc|[a-z])
if fred would send me the pb linux version i could compile my regexp lib for linux. but he hasn't replied yet..