IsRegExp command
Re: IsRegExp command
First he needs to know if the dot is supposed to be there at all. If it is supposed to be there and is a literal dot then it needs to be escaped as \. in a regex (if outside of the character class [...]). If it's not a literal dot but instead the metacharacter then it must not be escaped.
Re: IsRegExp command
[^aeio][aeiou]r\.?$I have some prefixes in the GB speller which have a dot at the end of the pattern.
or
[^aeio][aeiou]r[.]{0,1}$
or
[^aeio][aeiou]r[.]?$
- marcoagpinto
- Addict
- Posts: 1039
- Joined: Sun Mar 10, 2013 3:01 pm
- Location: Portugal
- Contact:
Re: IsRegExp command
Mauro told me that the dot means that the word in which to apply the pattern must have at least two characters.Marc56us wrote:[^aeio][aeiou]r\.?$I have some prefixes in the GB speller which have a dot at the end of the pattern.
or
[^aeio][aeiou]r[.]{0,1}$
or
[^aeio][aeiou]r[.]?$
But, today I had an e-mail from him telling me that the dot could be anywhere in the pattern and not just at the end.
EDIT:
He replied again a minute ago saying that after all the dot means it matches any chr and not >1.
EDIT2:
[^aeio][aeiou]r\.?$ seems to have done the job! Thank you guys for all the help!
Re: IsRegExp command
So you need to include the dot in the regex pattern as is. But [^abehilmopru]. will also match if there is anything else before or after the matching part.marcoagpinto wrote: EDIT:
He replied again a minute ago saying that after all the dot means it matches any chr and not >1.
- marcoagpinto
- Addict
- Posts: 1039
- Joined: Sun Mar 10, 2013 3:01 pm
- Location: Portugal
- Contact:
Re: IsRegExp command
Guys,
Not to sound annoying, but replacing the dots with:
\.?
Doesn't work for this scenario:
For the word:
It should produce:
It seems that the dot isn't ignoring the first chr of "nagy":
How do I code it to work anywhere in the pattern?
The dot can appear anywhere in the pattern.
I was using:
Thank you!
Not to sound annoying, but replacing the dots with:
\.?
Doesn't work for this scenario:
Code: Select all
SFX C Y 2
SFX C 0 one nagy
SFX C 0 two nag.
Code: Select all
nagy/C
Code: Select all
nagyone
nagytwo
Code: Select all
SFX C 0 two nag.
The dot can appear anywhere in the pattern.
I was using:
Code: Select all
; Handle dot - Mauro Trevisan + PB Forum *START*
b$=ReplaceString(b$,".","\.?")
; Handle dot - Mauro Trevisan + PB Forum *END*
Re: IsRegExp command
[.] = a dot
. = any char
But in PB CreateRegularExpression() must have option #PB_RegularExpression_DotAll
\.? = 0 or 1 dot
.? = 0 or 1 char
Learn regex...
. = any char
But in PB CreateRegularExpression() must have option #PB_RegularExpression_DotAll
\.? = 0 or 1 dot
.? = 0 or 1 char
Learn regex...

Re: IsRegExp command
Just leave the dot alone!!
If you escape it as \. then it will search for a literal dot (a period) in the text and not for any character. The question mark makes it optional.

If you escape it as \. then it will search for a literal dot (a period) in the text and not for any character. The question mark makes it optional.
Re: IsRegExp command
at viewtopic.php?p=532147#p532147 the dot (if exist) was at end only.
[^aeio][aeiou]r\.?$

So \.?$ check for a dot at end or no dot.I have sent an e-mail to some Hunspell related friends asking for their opinion, but can we assume that the dots are a bug in the British .aff file and that I should remove all the dots I can find next to a "]"?
[^aeio][aeiou]r\.?$

- marcoagpinto
- Addict
- Posts: 1039
- Joined: Sun Mar 10, 2013 3:01 pm
- Location: Portugal
- Contact:
Re: IsRegExp command
@Marc56usMarc56us wrote:at viewtopic.php?p=532147#p532147 the dot (if exist) was at end only.So \.?$ check for a dot at end or no dot.I have sent an e-mail to some Hunspell related friends asking for their opinion, but can we assume that the dots are a bug in the British .aff file and that I should remove all the dots I can find next to a "]"?
[^aeio][aeiou]r\.?$
The dots can be anywhere in the rule and it means to ignore a chr at that position.
For SUFFIXES I used:
Code: Select all
CreateRegularExpression(#RegularExpression,b$+"$",#PB_RegularExpression_DotAll)
match=MatchRegularExpression(#RegularExpression,a$)
FreeRegularExpression(#RegularExpression)
ProcedureReturn match
For PREFIXES I used (notice that I inverted a$ to search from left to right):
Code: Select all
CreateRegularExpression(#RegularExpression,b$+"$",#PB_RegularExpression_DotAll)
match=MatchRegularExpression(#RegularExpression,ReverseString(a$))
FreeRegularExpression(#RegularExpression)
ProcedureReturn match
I have created a dictionary and an affix file with rules to test it:
Code: Select all
nagy/C
nagy/D
Code: Select all
SFX C Y 4
SFX C 0 one nagy
SFX C 0 two nag.
SFX C 0 three n.g.
SFX C 0 four [^a].g.
PFX D Y 4
PFX D 0 kar1 an
PFX D 0 kar2 y.an
PFX D 0 kar3 [yw].an
PFX D 0 kar4 [^y].an
I must make some tests when I am not stressed... my holiday has ended and I will be back to my weekend supermarket job in a few days.
I will try to use the tool Hunspeller developed by Mauro Trevisan to analyse the results of Proofing Tool GUI and his.
I will post here after I test it.
Thank you my friends,
- marcoagpinto
- Addict
- Posts: 1039
- Joined: Sun Mar 10, 2013 3:01 pm
- Location: Portugal
- Contact:
Re: IsRegExp command
@Marc56us and guys,
This is the bug I was referring to:
How can it give #True?
a$="ached" which is the primary word (apply prefix)
b$=""[^abehilmopru]." which is the regex pattern.
I invert a$ to match the regex scanning the primary word from right to left.
This gives a$="dehca".
So, the first character "a" is the "."
and the second one is "c" which is different from "abehilmopru".
This gives a valid rule?
It is strange... what am I doing wrong?
This is the bug I was referring to:
Code: Select all
#RegularExpression=1
a$="ached"
b$="[^abehilmopru]."
CreateRegularExpression(#RegularExpression,b$+"$",#PB_RegularExpression_DotAll)
match=MatchRegularExpression(#RegularExpression,ReverseString(a$))
FreeRegularExpression(#RegularExpression)
Debug match
a$="ached" which is the primary word (apply prefix)
b$=""[^abehilmopru]." which is the regex pattern.
I invert a$ to match the regex scanning the primary word from right to left.
This gives a$="dehca".
So, the first character "a" is the "."
and the second one is "c" which is different from "abehilmopru".
This gives a valid rule?
It is strange... what am I doing wrong?
Re: IsRegExp command
Mean: 2 chars anywhere in a string![^abehilmopru].
- One who is not one of 'abehilmopru'
- One who is anything
Always use limits (\b, ^, $, \A etc) because otherwise the regex eats everything on the table and even several times

It is often difficult to express what you want with sentences.

Instead, give us some examples:
1. Example of a string that must be found
2. Example of a chain that should be excluded.

- marcoagpinto
- Addict
- Posts: 1039
- Joined: Sun Mar 10, 2013 3:01 pm
- Location: Portugal
- Contact:
Re: IsRegExp command
@Marc56usMarc56us wrote:Mean: 2 chars anywhere in a string![^abehilmopru].
- One who is not one of 'abehilmopru'
- One who is anything
Always use limits (\b, ^, $, \A etc) because otherwise the regex eats everything on the table and even several times![]()
It is often difficult to express what you want with sentences.![]()
Instead, give us some examples:
1. Example of a string that must be found
2. Example of a chain that should be excluded.
Ahhhh... I used a $ in the pattern: "[^abehilmopru].$"
But now I no longer know if prefixes should be inverted.
After 6 years working on Proofing Tool GUI I am no longer sure if I have been decoding prefixes with 100% accuracy... it is sad and I am very disturbed.
I have sent an e-mail to Mauro asking him for advice since he created Hunspeller ( https://github.com/mtrevisan/Hunspeller ).
I will wait for his reply.
Thank you, my friends!
- marcoagpinto
- Addict
- Posts: 1039
- Joined: Sun Mar 10, 2013 3:01 pm
- Location: Portugal
- Contact:
Re: IsRegExp command
@Marcus
Mauro replied this:
How can I solve it?
Thanks!
Mauro replied this:
I tried to ReverseString(b$) and it gives an error in:From the description you gave I see an error: you inverted the word so that
you can do the match from right to left, but you miss to invert also the
condition!
a$="dehca"
b$=".[^abehilmopru]"
So, the first character "a" is the " [^abehilmopru]"
and the second one is "c" which is "any letter".
Code: Select all
match=MatchRegularExpression(#RegularExpression,a$)
Thanks!
Re: IsRegExp command
I don't understand why you're reversing the character string ? it won't go any faster.
It is useful to understand how a regular expression analysis engine works (it's quite simple in fact)
Look at Jan Goyvaerts' reference help
https://www.regular-expressions.info/engine.html
Please, post some lines:
1. Example of a string that match
2. Example of a string that should not match.

It is useful to understand how a regular expression analysis engine works (it's quite simple in fact)
Look at Jan Goyvaerts' reference help
https://www.regular-expressions.info/engine.html
Please, post some lines:
1. Example of a string that match
2. Example of a string that should not match.

- marcoagpinto
- Addict
- Posts: 1039
- Joined: Sun Mar 10, 2013 3:01 pm
- Location: Portugal
- Contact:
Re: IsRegExp command
Hello!Marc56us wrote:I don't understand why you're reversing the character string ? it won't go any faster.
It is useful to understand how a regular expression analysis engine works (it's quite simple in fact)
Look at Jan Goyvaerts' reference help
https://www.regular-expressions.info/engine.html
Please, post some lines:
1. Example of a string that match
2. Example of a string that should not match.
While decoding suffixes and prefixes, the rules are scanned from right to left on suffixes and left to right on prefixes:
primary word="play"
applying the flag "D" suffix to it it scans the affix (.aff) file and it scans every D rule and this one matches:
SFX D 0 ed [aeiou]y
the word is "play" and the pattern "[aeiou]y"
the pattern matches in the primary word "play".
1) first letter from right "y"
2) second letter from right is one of "aeiou"
So, it adds a suffix "ed" to "play", producing "played".
With prefixes the rules are scanned in the primary words from left to right:
For example, with the same primary word "play" if I would apply the flag prefix "A":
PFX A 0 re [^e]
1) first letter from left in "play" is different from "e"
So, it adds a prefix "re" to "play" producing "replay".
Rules can have several [blah blah]letters[^blah blah]letters
I was doing it with my own code but then I found out the PB supports regexp, so I replaced my complex code with it.
But only then I noticed that there was also the possibility of using dots in rules which means my old complex code didn't produce 100% accurate results.
Then, I don't know how to scan primary words from left to right or from right to left using the regexp commands in PB, all I know is that I add an $ to the regexp command and it seems to work with suffixes but misworks with prefixes?
Is this explanation descriptive?
Thank you, Marcus and guys.