IsAlpha IsNumeric

Share your advanced PureBasic knowledge/code with the community.
naw
Enthusiast
Enthusiast
Posts: 573
Joined: Fri Apr 25, 2003 4:57 pm

Post by naw »

wow! Brilliant - I bet there's only a Lib for Windows, though, Linux would be very nice. (I cant check reelmedia because the reelmedia server is refusing connections again)
Ta - N
FloHimself
Enthusiast
Enthusiast
Posts: 229
Joined: Wed May 14, 2003 3:38 pm
Location: Lüneburg - Germany

Post by FloHimself »

Kale wrote: Thats exactly what i was pointing at earlier on in this thread. 8)
I know, but does naw? ;)
naw wrote:wow! Brilliant - I bet there's only a Lib for Windows, though, Linux would be very nice.
Yes it is a Windows Lib. I couldn't compile it for linux, because i've no linux system running atm...
naw wrote:(I cant check reelmedia because the reelmedia server is refusing connections again)
Try: http://www.florian-s.com/download/PureBasic/
User avatar
tinman
PureBasic Expert
PureBasic Expert
Posts: 1102
Joined: Sat Apr 26, 2003 4:56 pm
Location: Level 5 of Robot Hell
Contact:

Post by tinman »

naw wrote:wow! Brilliant - I bet there's only a Lib for Windows, though, Linux would be very nice. (I cant check reelmedia because the reelmedia server is refusing connections again)
There's pcre (perl compatible regular expressions) which is GPL and has been ported as both static and dynamic libraries for many platforms. If you can understand the API (and there's not much to it, maybe 6 or 8 functions you need to use) then you should be able to use them in PureBasic using either the Library library or using the DLL importer.
If you paint your butt blue and glue the hole shut you just themed your ass but lost the functionality.
(WinXPhSP3 PB5.20b14)
naw
Enthusiast
Enthusiast
Posts: 573
Joined: Fri Apr 25, 2003 4:57 pm

Post by naw »

- more easily said than done Tinman - I'm only a casual programmer so building a Linux library is really a little beyond my ability...
Ta - N
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6166
Joined: Sat May 17, 2003 11:31 am
Contact:

Post by blueznl »

mmmm a bit more work than i expected :-)

and some things i don't get (yet) about the syntax...

[A-Z]* means any char fro A to Z any numer of times?
[A-Z]+ means same but at least one time?
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
naw
Enthusiast
Enthusiast
Posts: 573
Joined: Fri Apr 25, 2003 4:57 pm

Post by naw »

Thats it Blueznl - you've got the idea!!!
Ta - N
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6166
Joined: Sat May 17, 2003 11:31 am
Contact:

Post by blueznl »

how to include a '[' then? [[] ?
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
naw
Enthusiast
Enthusiast
Posts: 573
Joined: Fri Apr 25, 2003 4:57 pm

Post by naw »

I've started looking at the RegExpr Library, but the example given below is a bit of a mind blower for the uninitiated...

I can't begin to make sense of it - I was kind of hoping for syntax like:

result$=RegExprReplace(string$,regexpr$)
position=RegExprFind(string$,regexpr$)

- oh well...

Code: Select all

; RegCompEx Test
*compiled.REGEXP
Debug RegCompEx(@*compiled, "(<TITLE>|<title>)(.*)(</TITLE>|</title>)")

; RegErrorEx Test
error$ = Space(80)
PeekS(RegErrorEx(#REGEXP_ESPACE, *compiled, error$, 80))
Debug error$

; RegNSubExpEx Test
Debug "Number of SubExpressions: " + Str(RegNSubExpEx(*compiled))

; RegExecEx Test
Test$ = "<HTML><HEAD><TITLE>PureBasic : visual basic compiler, easy & optimized basic programming language, basic, compiler</TITLE></HEAD><BODY></BODY></HTML>"
Dim test.REGMATCH(RegNSubExpEx(*compiled))  

Debug RegExecEx(*compiled, Test$, RegNSubExpEx(*compiled), @test(0))

Debug PeekS(@Test$ + test(0)\subexp_begin) ; Test REGMATCH offset
Debug PeekS(@Test$ + test(1)\subexp_begin)
Debug PeekS(@Test$ + test(2)\subexp_begin)
Debug PeekS(@Test$ + test(3)\subexp_begin)
Debug PeekS(@Test$ + test(4)\subexp_begin)

; RegSubEx Test
*buffer = AllocateMemory(1, 200, 0) 
Debug RegSubEx(*compiled, Test$, "\2", @*buffer)
Debug "The Buffer contains: " + PeekS(*buffer)

; RegFreeEx Test
RegFreeEx(*compiled)
[/code]
Ta - N
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6166
Joined: Sat May 17, 2003 11:31 am
Contact:

Post by blueznl »

naw, plz tell me what the following do:


abc\.
abc\\
abc.+
abc.*
abc.[
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
naw
Enthusiast
Enthusiast
Posts: 573
Joined: Fri Apr 25, 2003 4:57 pm

Post by naw »

Code: Select all

abc\.     "." is a special character in RE meaning repetitions of the previous character - so "A." will match "AA" but not "AB". The "\" escapes the special meaning of the next character which is "." so effectively "abc\." matches "abc."

abc\\     "\" = escape, so "abc\\" matches "abc\"
abc.+     - sorry dont know what "+" means - never used it...
abc.*     will match "abccccc" or abcccdd234234" but not "abdefg"
abc.[      is a badly formed RE - ie "abc.[e-z]" would match "abccd" or "abccccce" or "abccccccccx" 

if you wanted to match "abc.[" you would have to use "abc\.\["
Ta - N
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6166
Joined: Sat May 17, 2003 11:31 am
Contact:

Post by blueznl »

is this valid?

"this is a test"

is matched by

"th.*a test"

pretty nasty, by the way, full support of patterns could lead to massive iterations... i think the following pseudo code should do it, still have to code it though :-)

Code: Select all

  ; concept
  ;
  ; take pattern apart, split it up in blocks
  ; per block: type (0 exact match to a number of chars, 1 fancy stuff)
  ; per block: min (0 or 1) and max (1 or n) characters
  ; put this stuff in a table
  ; and now the real stuff... l = len(string)
  ; startpos(1) = 1,  endpos(1) = l
  ; n = 1
  ;
  ; again = false
  ; repeat
  ;   try to match block(n) *as far away as possible* aka. up to endpos(n)
  ;   if match
  ;     p = found pos (last character of match, in range startpos(n) to endpos(n))
  ;     endpos(n) = p
  ;     inc n
  ;     if n< nr of blocks
  ;       startpos(n) = p+1
  ;       endpos(n) = l
  ;       again = true
  ;     endif
  ;   else
  ;     no match, damn
  ;     dec n
  ;     if n>1
  ;       endpos(n) = endpos(n)-1
  ;       again = true
  ;     endif
  ;   endif
  ; until again = false
  ;
  ; if n < 1 no match
  ; if n > nr of blocks and endpos(nr of blocks) = l then there is a match
  

some speed improvement is possible by doing a walk through first and detect startpos / endpos of some blocks... hmmm... better first code this, but now it's bedtime
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6166
Joined: Sat May 17, 2003 11:31 am
Contact:

Post by blueznl »

there's a little thing unclear about the dot:
. Matches any single character except newline. In awk, dot can match newline also.
[/endquote]
abc\. "." is a special character in RE meaning repetitions of the previous character - so "A." will match "AA" but not "AB". The "\" escapes the special meaning of the next character which is "." so effectively "abc\." matches "abc."
so, what is it now?
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
FloHimself
Enthusiast
Enthusiast
Posts: 229
Joined: Wed May 14, 2003 3:38 pm
Location: Lüneburg - Germany

Post by FloHimself »

"." Matches any single character except newline. In awk, dot can match newline also.
that's correct. many unix programs deal with regexps, like grep, sed, awk, vi, some shells.. every one has special metacharcters and some are differently implemented. so its up to you to decide which implementation you will follow..
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6166
Joined: Sat May 17, 2003 11:31 am
Contact:

Post by blueznl »

ok... now, another question :-)

is this valid? [ABC|CDE|X*]*

that's *very* nasty to code :-)

*edit*

hmmm... looking at the descriptions i can find on the net, there are indeed different variations :-)

ok, the following appears to be valid:

(abd|cde) which means that segment has either to match abd or cde, forcing me into a two dimensional array on my current approach

the following i haven't seen so i assume it isn't valid, or is it?

(abc|[a-z])

this is nasty to code, as backtracking is almost impossible as for every possibility i would have to backtrack, oh it can be done, recursive, but it can take ages to resolve

an alternative approach would be to check the match on the 'known' (fixed) segments, then try to fix the variable ones in between... brrr... what did i start...
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
FloHimself
Enthusiast
Enthusiast
Posts: 229
Joined: Wed May 14, 2003 3:38 pm
Location: Lüneburg - Germany

Post by FloHimself »

is this valid? [ABC|CDE|X*]*
yes it is. it matches any expression, because the last asterisk (*) means: the expression [ABC|CDE|X*] has to appear "0" or "n" times.

Edit:
the following i haven't seen so i assume it isn't valid, or is it?

(abc|[a-z])
sure this is valid, too. this is a very simple expressions: matching "abc" or any lowercase character "a", "b", "c",...,"z"

if fred would send me the pb linux version i could compile my regexp lib for linux. but he hasn't replied yet..
Post Reply