IsRegExp command

Just starting out? Need help? Post your questions and find answers here.
#NULL
Addict
Addict
Posts: 1497
Joined: Thu Aug 30, 2007 11:54 pm
Location: right here

Re: IsRegExp command

Post by #NULL »

First he needs to know if the dot is supposed to be there at all. If it is supposed to be there and is a literal dot then it needs to be escaped as \. in a regex (if outside of the character class [...]). If it's not a literal dot but instead the metacharacter then it must not be escaped.
Marc56us
Addict
Addict
Posts: 1600
Joined: Sat Feb 08, 2014 3:26 pm

Re: IsRegExp command

Post by Marc56us »

I have some prefixes in the GB speller which have a dot at the end of the pattern.
[^aeio][aeiou]r\.?$
or
[^aeio][aeiou]r[.]{0,1}$
or
[^aeio][aeiou]r[.]?$
User avatar
marcoagpinto
Addict
Addict
Posts: 1039
Joined: Sun Mar 10, 2013 3:01 pm
Location: Portugal
Contact:

Re: IsRegExp command

Post by marcoagpinto »

Marc56us wrote:
I have some prefixes in the GB speller which have a dot at the end of the pattern.
[^aeio][aeiou]r\.?$
or
[^aeio][aeiou]r[.]{0,1}$
or
[^aeio][aeiou]r[.]?$
Mauro told me that the dot means that the word in which to apply the pattern must have at least two characters.

But, today I had an e-mail from him telling me that the dot could be anywhere in the pattern and not just at the end.

EDIT:
He replied again a minute ago saying that after all the dot means it matches any chr and not >1.

EDIT2:
[^aeio][aeiou]r\.?$ seems to have done the job! Thank you guys for all the help!
#NULL
Addict
Addict
Posts: 1497
Joined: Thu Aug 30, 2007 11:54 pm
Location: right here

Re: IsRegExp command

Post by #NULL »

marcoagpinto wrote: EDIT:
He replied again a minute ago saying that after all the dot means it matches any chr and not >1.
So you need to include the dot in the regex pattern as is. But [^abehilmopru]. will also match if there is anything else before or after the matching part.
User avatar
marcoagpinto
Addict
Addict
Posts: 1039
Joined: Sun Mar 10, 2013 3:01 pm
Location: Portugal
Contact:

Re: IsRegExp command

Post by marcoagpinto »

Guys,

Not to sound annoying, but replacing the dots with:
\.?

Doesn't work for this scenario:

Code: Select all

SFX C Y 2
SFX C 0 one nagy
SFX C 0 two nag.
For the word:

Code: Select all

nagy/C
It should produce:

Code: Select all

nagyone
nagytwo
It seems that the dot isn't ignoring the first chr of "nagy":

Code: Select all

SFX C 0 two nag.
How do I code it to work anywhere in the pattern?

The dot can appear anywhere in the pattern.

I was using:

Code: Select all

    ; Handle dot - Mauro Trevisan + PB Forum *START*
    b$=ReplaceString(b$,".","\.?")
    ; Handle dot - Mauro Trevisan + PB Forum *END*
Thank you!
Marc56us
Addict
Addict
Posts: 1600
Joined: Sat Feb 08, 2014 3:26 pm

Re: IsRegExp command

Post by Marc56us »

[.] = a dot
. = any char
But in PB CreateRegularExpression() must have option #PB_RegularExpression_DotAll


\.? = 0 or 1 dot
.? = 0 or 1 char

Learn regex... :wink:
#NULL
Addict
Addict
Posts: 1497
Joined: Thu Aug 30, 2007 11:54 pm
Location: right here

Re: IsRegExp command

Post by #NULL »

Just leave the dot alone!! :)
If you escape it as \. then it will search for a literal dot (a period) in the text and not for any character. The question mark makes it optional.
Marc56us
Addict
Addict
Posts: 1600
Joined: Sat Feb 08, 2014 3:26 pm

Re: IsRegExp command

Post by Marc56us »

at viewtopic.php?p=532147#p532147 the dot (if exist) was at end only.
I have sent an e-mail to some Hunspell related friends asking for their opinion, but can we assume that the dots are a bug in the British .aff file and that I should remove all the dots I can find next to a "]"?
So \.?$ check for a dot at end or no dot.

[^aeio][aeiou]r\.?$

:wink:
User avatar
marcoagpinto
Addict
Addict
Posts: 1039
Joined: Sun Mar 10, 2013 3:01 pm
Location: Portugal
Contact:

Re: IsRegExp command

Post by marcoagpinto »

Marc56us wrote:at viewtopic.php?p=532147#p532147 the dot (if exist) was at end only.
I have sent an e-mail to some Hunspell related friends asking for their opinion, but can we assume that the dots are a bug in the British .aff file and that I should remove all the dots I can find next to a "]"?
So \.?$ check for a dot at end or no dot.

[^aeio][aeiou]r\.?$

:wink:
@Marc56us

The dots can be anywhere in the rule and it means to ignore a chr at that position.

For SUFFIXES I used:

Code: Select all

    CreateRegularExpression(#RegularExpression,b$+"$",#PB_RegularExpression_DotAll)
    match=MatchRegularExpression(#RegularExpression,a$)
    FreeRegularExpression(#RegularExpression)
    ProcedureReturn match

For PREFIXES I used (notice that I inverted a$ to search from left to right):

Code: Select all

    CreateRegularExpression(#RegularExpression,b$+"$",#PB_RegularExpression_DotAll)
    match=MatchRegularExpression(#RegularExpression,ReverseString(a$))
    FreeRegularExpression(#RegularExpression)
    ProcedureReturn match

I have created a dictionary and an affix file with rules to test it:

Code: Select all

nagy/C
nagy/D

Code: Select all

SFX C Y 4
SFX C 0 one nagy
SFX C 0 two nag.
SFX C 0 three n.g.
SFX C 0 four [^a].g.

PFX D Y 4
PFX D 0 kar1 an
PFX D 0 kar2 y.an
PFX D 0 kar3 [yw].an
PFX D 0 kar4 [^y].an
However, there is at least one prefix rule in the British speller that is causing false positives.

I must make some tests when I am not stressed... my holiday has ended and I will be back to my weekend supermarket job in a few days.

I will try to use the tool Hunspeller developed by Mauro Trevisan to analyse the results of Proofing Tool GUI and his.

I will post here after I test it.

Thank you my friends,
User avatar
marcoagpinto
Addict
Addict
Posts: 1039
Joined: Sun Mar 10, 2013 3:01 pm
Location: Portugal
Contact:

Re: IsRegExp command

Post by marcoagpinto »

@Marc56us and guys,

This is the bug I was referring to:

Code: Select all

    #RegularExpression=1
    a$="ached"
    b$="[^abehilmopru]."
    CreateRegularExpression(#RegularExpression,b$+"$",#PB_RegularExpression_DotAll)
    match=MatchRegularExpression(#RegularExpression,ReverseString(a$))
    FreeRegularExpression(#RegularExpression)
    Debug match
How can it give #True?
a$="ached" which is the primary word (apply prefix)
b$=""[^abehilmopru]." which is the regex pattern.

I invert a$ to match the regex scanning the primary word from right to left.

This gives a$="dehca".

So, the first character "a" is the "."
and the second one is "c" which is different from "abehilmopru".

This gives a valid rule?

It is strange... what am I doing wrong?
Marc56us
Addict
Addict
Posts: 1600
Joined: Sat Feb 08, 2014 3:26 pm

Re: IsRegExp command

Post by Marc56us »

[^abehilmopru].
Mean: 2 chars anywhere in a string!
- One who is not one of 'abehilmopru'
- One who is anything
Always use limits (\b, ^, $, \A etc) because otherwise the regex eats everything on the table and even several times :lol:

It is often difficult to express what you want with sentences. :?

Instead, give us some examples:

1. Example of a string that must be found
2. Example of a chain that should be excluded.

8)
User avatar
marcoagpinto
Addict
Addict
Posts: 1039
Joined: Sun Mar 10, 2013 3:01 pm
Location: Portugal
Contact:

Re: IsRegExp command

Post by marcoagpinto »

Marc56us wrote:
[^abehilmopru].
Mean: 2 chars anywhere in a string!
- One who is not one of 'abehilmopru'
- One who is anything
Always use limits (\b, ^, $, \A etc) because otherwise the regex eats everything on the table and even several times :lol:

It is often difficult to express what you want with sentences. :?

Instead, give us some examples:

1. Example of a string that must be found
2. Example of a chain that should be excluded.

8)
@Marc56us

Ahhhh... I used a $ in the pattern: "[^abehilmopru].$"

But now I no longer know if prefixes should be inverted.

After 6 years working on Proofing Tool GUI I am no longer sure if I have been decoding prefixes with 100% accuracy... it is sad and I am very disturbed.

I have sent an e-mail to Mauro asking him for advice since he created Hunspeller ( https://github.com/mtrevisan/Hunspeller ).

I will wait for his reply.

Thank you, my friends!
User avatar
marcoagpinto
Addict
Addict
Posts: 1039
Joined: Sun Mar 10, 2013 3:01 pm
Location: Portugal
Contact:

Re: IsRegExp command

Post by marcoagpinto »

@Marcus

Mauro replied this:
From the description you gave I see an error: you inverted the word so that
you can do the match from right to left, but you miss to invert also the
condition!
a$="dehca"
b$=".[^abehilmopru]"
So, the first character "a" is the " [^abehilmopru]"
and the second one is "c" which is "any letter".
I tried to ReverseString(b$) and it gives an error in:

Code: Select all

match=MatchRegularExpression(#RegularExpression,a$)
How can I solve it?

Thanks!
Marc56us
Addict
Addict
Posts: 1600
Joined: Sat Feb 08, 2014 3:26 pm

Re: IsRegExp command

Post by Marc56us »

I don't understand why you're reversing the character string ? it won't go any faster.

It is useful to understand how a regular expression analysis engine works (it's quite simple in fact)
Look at Jan Goyvaerts' reference help
https://www.regular-expressions.info/engine.html

Please, post some lines:
1. Example of a string that match
2. Example of a string that should not match.

:wink:
User avatar
marcoagpinto
Addict
Addict
Posts: 1039
Joined: Sun Mar 10, 2013 3:01 pm
Location: Portugal
Contact:

Re: IsRegExp command

Post by marcoagpinto »

Marc56us wrote:I don't understand why you're reversing the character string ? it won't go any faster.

It is useful to understand how a regular expression analysis engine works (it's quite simple in fact)
Look at Jan Goyvaerts' reference help
https://www.regular-expressions.info/engine.html

Please, post some lines:
1. Example of a string that match
2. Example of a string that should not match.

:wink:
Hello!

While decoding suffixes and prefixes, the rules are scanned from right to left on suffixes and left to right on prefixes:
primary word="play"

applying the flag "D" suffix to it it scans the affix (.aff) file and it scans every D rule and this one matches:
SFX D 0 ed [aeiou]y

the word is "play" and the pattern "[aeiou]y"

the pattern matches in the primary word "play".
1) first letter from right "y"
2) second letter from right is one of "aeiou"

So, it adds a suffix "ed" to "play", producing "played".

With prefixes the rules are scanned in the primary words from left to right:
For example, with the same primary word "play" if I would apply the flag prefix "A":
PFX A 0 re [^e]

1) first letter from left in "play" is different from "e"

So, it adds a prefix "re" to "play" producing "replay".

Rules can have several [blah blah]letters[^blah blah]letters

I was doing it with my own code but then I found out the PB supports regexp, so I replaced my complex code with it.

But only then I noticed that there was also the possibility of using dots in rules which means my old complex code didn't produce 100% accurate results.

Then, I don't know how to scan primary words from left to right or from right to left using the regexp commands in PB, all I know is that I add an $ to the regexp command and it seems to work with suffixes but misworks with prefixes?

Is this explanation descriptive?

Thank you, Marcus and guys.
Post Reply