Page 1 of 3

IsRegExp command

Posted: Sun Jan 27, 2019 12:27 am
by marcoagpinto
Could the team implement a command to check for RegExp or whatever it is called?

Code: Select all

SFX G e ing [^eioy]e 
SFX G 0 ing [eoy]e 
SFX G ie ying ie 
SFX G 0 bing [^aeio][aeiou]b 
For example:

Code: Select all

IsRegExp(string$, pattern$, #Right_to_Left/#Left_to_Right)

Code: Select all

If IsRegExp("party","[^aeio][aeiou]b",#Right_to_Left)=#True blah blah blah

Thank you!

Re: IsRegExp command

Posted: Sun Jan 27, 2019 1:43 am
by STARGÅTE
I do not understand your request.

PB has a Regular Expression library: PureBasic - RegularExpression

It has procedures like:
CreateRegularExpression() - to create a regular expression
IsRegularExpression() - to check whether it is valid.
and MatchRegularExpression() - to match the expression with a string.

Re: IsRegExp command

Posted: Sun Jan 27, 2019 2:08 am
by marcoagpinto
STARGÅTE wrote:I do not understand your request.

PB has a Regular Expression library: PureBasic - RegularExpression

It has procedures like:
CreateRegularExpression() - to create a regular expression
IsRegularExpression() - to check whether it is valid.
and MatchRegularExpression() - to match the expression with a string.
Is it built-in or do I have to install the library (and how to install it)?

Thanks!

Re: IsRegExp command

Posted: Sun Jan 27, 2019 2:57 am
by mk-soft
Is build-in.

At time only two lib not build in. Engine3d.dll and libmariadb.dll

Don't forget to publish the third-party licenses from the libraries as well.

Re: IsRegExp command

Posted: Sun Jan 27, 2019 3:03 am
by marcoagpinto
Thank you my friends,

I will add the licence in the user guide of my application Proofing Tool GUI.

Re: IsRegExp command

Posted: Sun Jan 27, 2019 3:11 am
by marcoagpinto
Not to sound annoying, but why does this produce #True?:

Code: Select all

#RegularExpression=1
a$="score"
b$="[^aeio][aeiou]r"

CreateRegularExpression(#RegularExpression,b$)
match=MatchRegularExpression(#RegularExpression,a$)
FreeRegularExpression(#RegularExpression)
Debug "match:"+Str(match)

Re: IsRegExp command

Posted: Sun Jan 27, 2019 7:21 am
by Oma
marcoagpinto wrote:but why does this produce #True?:
because [^aeio] also matches c,
because [aeiou] matches o,
because r matches r,
And where is the string 'cor' contained :?:
:idea: :wink:

Re: IsRegExp command

Posted: Sun Jan 27, 2019 8:36 am
by Bisonte
To learn and test regex see this site : https://regexr.com/

Re: IsRegExp command

Posted: Sun Jan 27, 2019 3:32 pm
by marcoagpinto
marcoagpinto wrote:Not to sound annoying, but why does this produce #True?:

Code: Select all

#RegularExpression=1
a$="score"
b$="[^aeio][aeiou]r"

CreateRegularExpression(#RegularExpression,b$)
match=MatchRegularExpression(#RegularExpression,a$)
FreeRegularExpression(#RegularExpression)
Debug "match:"+Str(match)
Buaaaaaaa

I simply wanted to implement a faster way to decode prefixes and suffixes in my Hunspell tool "Proofing Tool GUI".

But the results aren't the ones it should give.

In my previous message the regex was planned to work like:

"score"
the rules would be:
1) first chr from the right should match "r";
2) second chr from the right should match any of "[aeiou]";
3) third chr from the right SHOULD NOT match any of "[^aeio]" (see the "^").

Can it be done with RegExp commands in PureBasic?

Thank you!

Re: IsRegExp command

Posted: Sun Jan 27, 2019 3:43 pm
by Oma
Does b$="[^aeio][aeiou]r$" do what you want?

Re: IsRegExp command

Posted: Sun Jan 27, 2019 3:46 pm
by #NULL
marcoagpinto wrote:1) first chr from the right should match "r";
use \b (word boundary) or $ (end of string) after r
marcoagpinto wrote:2) second chr from the right should match any of "[aeiou]";
you have to specify how many of those chars can occur. you can use [...]+ for 1 or more, or use [...]{1} for exactly one char.
marcoagpinto wrote:3) third chr from the right SHOULD NOT match any of "[^aeio]" (see the "^").
i think this should work as is.

<edit>
word boundary is \b, not \w (which is whitespace)

Re: IsRegExp command

Posted: Sun Jan 27, 2019 4:15 pm
by Marc56us
word boundary is \b, not \w (which is whitespace)
\w is not withespace

\w : "word character" (letters, digits, _ )

\s : whitespace (space, tab, line break)

(PureBasic uses PCRE)

:wink:

Re: IsRegExp command

Posted: Sun Jan 27, 2019 7:20 pm
by marcoagpinto
Oma wrote:Does b$="[^aeio][aeiou]r$" do what you want?

Yes, it worked but another issue rose.

I have some prefixes in the GB speller which have a dot at the end of the pattern.

For prefixes, I am inverting the word since I believe the regular expressions should be met by checking the dictionary words from left to right, so:

1) SUFFIXES:

Code: Select all

    ; Try to match regular expression - 27/JAN/2019
    CreateRegularExpression(#RegularExpression,b$+"$")
    match=MatchRegularExpression(#RegularExpression,a$)
    FreeRegularExpression(#RegularExpression)
    ProcedureReturn match
2) PREFIXES:

Code: Select all

    ; Try to match regular expression - 27/JAN/2019
    CreateRegularExpression(#RegularExpression,b$+"$")
    match=MatchRegularExpression(#RegularExpression,ReverseString(a$))
    FreeRegularExpression(#RegularExpression)
    ProcedureReturn match

The prefixes codes produces incorrect derivates if the pattern$ has a dot at the end:
PFX F 0 con [^abehilmopru].

Image

I did a DIFF of the two wordlists exported using Tortoise SVN and that is how I found the dot issue.

I have sent an e-mail to some Hunspell related friends asking for their opinion, but can we assume that the dots are a bug in the British .aff file and that I should remove all the dots I can find next to a "]"?

Thank you!

Kind regards,
>Marco A.G.Pinto
----------------

Re: IsRegExp command

Posted: Sun Jan 27, 2019 7:51 pm
by infratec
If you want to use regex... you have to know regex :wink:

A dot is a meta character. You have to escape it like \.

Re: IsRegExp command

Posted: Sun Jan 27, 2019 8:04 pm
by Little John
infratec wrote:If you want to use regex... you have to know regex :wink:
Yes, definitely! It's not possible to seriously handle Regular Expressions by trial and error.