IsRegExp command

#NULL · Post by **#NULL** » Sun Jan 27, 2019 8:23 pm

First he needs to know if the dot is supposed to be there at all. If it is supposed to be there and is a literal dot then it needs to be escaped as \. in a regex (if outside of the character class [...]). If it's not a literal dot but instead the metacharacter then it must not be escaped.

Marc56us · Post by **Marc56us** » Mon Jan 28, 2019 7:02 am

I have some prefixes in the GB speller which have a dot at the end of the pattern.

[^aeio][aeiou]r\.?$
or
[^aeio][aeiou]r[.]{0,1}$
or
[^aeio][aeiou]r[.]?$

marcoagpinto · Post by **marcoagpinto** » Mon Jan 28, 2019 9:25 am

Marc56us wrote:
I have some prefixes in the GB speller which have a dot at the end of the pattern.
[^aeio][aeiou]r\.?$
or
[^aeio][aeiou]r[.]{0,1}$
or
[^aeio][aeiou]r[.]?$

Mauro told me that the dot means that the word in which to apply the pattern must have at least two characters.

But, today I had an e-mail from him telling me that the dot could be anywhere in the pattern and not just at the end.

EDIT:
He replied again a minute ago saying that after all the dot means it matches any chr and not >1.

EDIT2:
[^aeio][aeiou]r\.?$ seems to have done the job! Thank you guys for all the help!

#NULL · Post by **#NULL** » Mon Jan 28, 2019 10:42 am

marcoagpinto wrote: EDIT:
He replied again a minute ago saying that after all the dot means it matches any chr and not >1.

So you need to include the dot in the regex pattern as is. But [^abehilmopru]. will also match if there is anything else before or after the matching part.

marcoagpinto · Post by **marcoagpinto** » Mon Jan 28, 2019 11:34 am

Guys,

Not to sound annoying, but replacing the dots with:
\.?

Doesn't work for this scenario:

Code: Select all

SFX C Y 2
SFX C 0 one nagy
SFX C 0 two nag.

For the word:

Code: Select all

nagy/C

It should produce:

Code: Select all

nagyone
nagytwo

It seems that the dot isn't ignoring the first chr of "nagy":

Code: Select all

SFX C 0 two nag.

How do I code it to work anywhere in the pattern?

The dot can appear anywhere in the pattern.

I was using:

Code: Select all

    ; Handle dot - Mauro Trevisan + PB Forum *START*
    b$=ReplaceString(b$,".","\.?")
    ; Handle dot - Mauro Trevisan + PB Forum *END*

Thank you!

Marc56us · Post by **Marc56us** » Mon Jan 28, 2019 11:51 am

[.] = a dot
. = any char
But in PB CreateRegularExpression() must have option #PB_RegularExpression_DotAll

\.? = 0 or 1 dot
.? = 0 or 1 char

Learn regex...

#NULL · Post by **#NULL** » Mon Jan 28, 2019 12:04 pm

Just leave the dot alone!!

If you escape it as \. then it will search for a literal dot (a period) in the text and not for any character. The question mark makes it optional.

Marc56us · Post by **Marc56us** » Tue Jan 29, 2019 7:32 am

at viewtopic.php?p=532147#p532147 the dot (if exist) was at end only.

I have sent an e-mail to some Hunspell related friends asking for their opinion, but can we assume that the dots are a bug in the British .aff file and that I should remove all the dots I can find next to a "]"?

So \.?$ check for a dot at end or no dot.

[^aeio][aeiou]r\.?$

marcoagpinto · Post by **marcoagpinto** » Tue Jan 29, 2019 11:27 am

Marc56us wrote:at viewtopic.php?p=532147#p532147 the dot (if exist) was at end only.
I have sent an e-mail to some Hunspell related friends asking for their opinion, but can we assume that the dots are a bug in the British .aff file and that I should remove all the dots I can find next to a "]"?
So \.?$ check for a dot at end or no dot.

[^aeio][aeiou]r\.?$

@Marc56us

The dots can be anywhere in the rule and it means to ignore a chr at that position.

For SUFFIXES I used:

Code: Select all

    CreateRegularExpression(#RegularExpression,b$+"$",#PB_RegularExpression_DotAll)
    match=MatchRegularExpression(#RegularExpression,a$)
    FreeRegularExpression(#RegularExpression)
    ProcedureReturn match

For PREFIXES I used (notice that I inverted a$ to search from left to right):

Code: Select all

    CreateRegularExpression(#RegularExpression,b$+"$",#PB_RegularExpression_DotAll)
    match=MatchRegularExpression(#RegularExpression,ReverseString(a$))
    FreeRegularExpression(#RegularExpression)
    ProcedureReturn match

I have created a dictionary and an affix file with rules to test it:

Code: Select all

nagy/C
nagy/D

Code: Select all

SFX C Y 4
SFX C 0 one nagy
SFX C 0 two nag.
SFX C 0 three n.g.
SFX C 0 four [^a].g.

PFX D Y 4
PFX D 0 kar1 an
PFX D 0 kar2 y.an
PFX D 0 kar3 [yw].an
PFX D 0 kar4 [^y].an

However, there is at least one prefix rule in the British speller that is causing false positives.

I must make some tests when I am not stressed... my holiday has ended and I will be back to my weekend supermarket job in a few days.

I will try to use the tool Hunspeller developed by Mauro Trevisan to analyse the results of Proofing Tool GUI and his.

I will post here after I test it.

Thank you my friends,

marcoagpinto · Post by **marcoagpinto** » Tue Jan 29, 2019 4:41 pm

@Marc56us and guys,

This is the bug I was referring to:

Code: Select all

    #RegularExpression=1
    a$="ached"
    b$="[^abehilmopru]."
    CreateRegularExpression(#RegularExpression,b$+"$",#PB_RegularExpression_DotAll)
    match=MatchRegularExpression(#RegularExpression,ReverseString(a$))
    FreeRegularExpression(#RegularExpression)
    Debug match

How can it give #True?
a$="ached" which is the primary word (apply prefix)
b$=""[^abehilmopru]." which is the regex pattern.

I invert a$ to match the regex scanning the primary word from right to left.

This gives a$="dehca".

So, the first character "a" is the "."
and the second one is "c" which is different from "abehilmopru".

This gives a valid rule?

It is strange... what am I doing wrong?

Marc56us · Post by **Marc56us** » Tue Jan 29, 2019 4:48 pm

[^abehilmopru].

Mean: 2 chars anywhere in a string!
- One who is not one of 'abehilmopru'
- One who is anything
Always use limits (\b, ^, $, \A etc) because otherwise the regex eats everything on the table and even several times

It is often difficult to express what you want with sentences.

Instead, give us some examples:

1. Example of a string that must be found
2. Example of a chain that should be excluded.

marcoagpinto · Post by **marcoagpinto** » Tue Jan 29, 2019 5:05 pm

Marc56us wrote:
[^abehilmopru].
Mean: 2 chars anywhere in a string!
- One who is not one of 'abehilmopru'
- One who is anything
Always use limits (\b, ^, $, \A etc) because otherwise the regex eats everything on the table and even several times

It is often difficult to express what you want with sentences.

Instead, give us some examples:

1. Example of a string that must be found
2. Example of a chain that should be excluded.

@Marc56us

Ahhhh... I used a $ in the pattern: "[^abehilmopru].$"

But now I no longer know if prefixes should be inverted.

After 6 years working on Proofing Tool GUI I am no longer sure if I have been decoding prefixes with 100% accuracy... it is sad and I am very disturbed.

I have sent an e-mail to Mauro asking him for advice since he created Hunspeller ( https://github.com/mtrevisan/Hunspeller ).

I will wait for his reply.

Thank you, my friends!

marcoagpinto · Post by **marcoagpinto** » Tue Jan 29, 2019 5:32 pm

@Marcus

Mauro replied this:

From the description you gave I see an error: you inverted the word so that
you can do the match from right to left, but you miss to invert also the
condition!
a$="dehca"
b$=".[^abehilmopru]"
So, the first character "a" is the " [^abehilmopru]"
and the second one is "c" which is "any letter".

I tried to ReverseString(b$) and it gives an error in:

Code: Select all

match=MatchRegularExpression(#RegularExpression,a$)

How can I solve it?

Thanks!

Marc56us · Post by **Marc56us** » Tue Jan 29, 2019 5:45 pm

I don't understand why you're reversing the character string ? it won't go any faster.

It is useful to understand how a regular expression analysis engine works (it's quite simple in fact)
Look at Jan Goyvaerts' reference help
https://www.regular-expressions.info/engine.html

Please, post some lines:
1. Example of a string that match
2. Example of a string that should not match.

marcoagpinto · Post by **marcoagpinto** » Tue Jan 29, 2019 6:13 pm

Marc56us wrote:I don't understand why you're reversing the character string ? it won't go any faster.

It is useful to understand how a regular expression analysis engine works (it's quite simple in fact)
Look at Jan Goyvaerts' reference help
https://www.regular-expressions.info/engine.html

Please, post some lines:
1. Example of a string that match
2. Example of a string that should not match.

Hello!

While decoding suffixes and prefixes, the rules are scanned from right to left on suffixes and left to right on prefixes:
primary word="play"

applying the flag "D" suffix to it it scans the affix (.aff) file and it scans every D rule and this one matches:
SFX D 0 ed [aeiou]y

the word is "play" and the pattern "[aeiou]y"

the pattern matches in the primary word "play".
1) first letter from right "y"
2) second letter from right is one of "aeiou"

So, it adds a suffix "ed" to "play", producing "played".

With prefixes the rules are scanned in the primary words from left to right:
For example, with the same primary word "play" if I would apply the flag prefix "A":
PFX A 0 re [^e]

1) first letter from left in "play" is different from "e"

So, it adds a prefix "re" to "play" producing "replay".

Rules can have several [blah blah]letters[^blah blah]letters

I was doing it with my own code but then I found out the PB supports regexp, so I replaced my complex code with it.

But only then I noticed that there was also the possibility of using dots in rules which means my old complex code didn't produce 100% accurate results.

Then, I don't know how to scan primary words from left to right or from right to left using the regexp commands in PB, all I know is that I add an $ to the regexp command and it seems to work with suffixes but misworks with prefixes?

Is this explanation descriptive?

Thank you, Marcus and guys.

PureBasic Forums - English

IsRegExp command

Re: IsRegExp command

Re: IsRegExp command

Re: IsRegExp command

Re: IsRegExp command

Re: IsRegExp command

Re: IsRegExp command

Re: IsRegExp command

Re: IsRegExp command

Re: IsRegExp command

Re: IsRegExp command

Re: IsRegExp command

Re: IsRegExp command

Re: IsRegExp command

Re: IsRegExp command

Re: IsRegExp command