Stemming (?)
Posted: Thu Feb 29, 2024 4:42 pm
Has anyone done this algorithm?
Using a regular expression this is easy to do, but I would like to do this without regular expressions.
Here is the algorithm for the Russian language.
Finding vowels is easy [аеиоуыэюя]
Converting a regular expression like this ([ая])(в|вши|вшись)$ is already more difficult. I need to move the pointer to the end of the line, and check from the end for a match of the token and check one of the two letters before this token.
Here's for English: link1
I adapted the algorithm for AkelPad to use it to create an auto-completion list in a help file without resorting to online resources. But now I would like to use it in TextCorrection for the functionality of replacing abbreviations and calque.
Using a regular expression this is easy to do, but I would like to do this without regular expressions.
Here is the algorithm for the Russian language.
Finding vowels is easy [аеиоуыэюя]
Converting a regular expression like this ([ая])(в|вши|вшись)$ is already more difficult. I need to move the pointer to the end of the line, and check from the end for a match of the token and check one of the two letters before this token.
Here's for English: link1
I adapted the algorithm for AkelPad to use it to create an auto-completion list in a help file without resorting to online resources. But now I would like to use it in TextCorrection for the functionality of replacing abbreviations and calque.
Code: Select all
Define length, start$, rv$
Define *c.Character, *g.Character
Define *g0, *c0
Define Text$ = "берётся"
Define RVRE$ = "аеиоуыэюя"
; Define PERFECTIVEGROUND_1 = ""
If FindString(Text$, "ё", 1, #PB_String_NoCase)
OrigText$ = Text$
ReplaceString(Text$, "ё", "е", #PB_String_NoCase | #PB_String_InPlace)
EndIf
*c = @Text$
*c0 = *c
length = Len(Text$)
; *c + (length - 1) * SizeOf(Character)
*g = @RVRE$
*g0 = *g
While *c\c
*g = *g0
While *g\c
If *c\c = *g\c
pos = *c - *c0 + 1
; Debug Chr(*c\c) + Chr(*g\c)
; Debug pos
start$ = Mid(Text$, 1, pos - 1)
rv$ = Mid(Text$, pos)
Break 2
EndIf
*g + SizeOf(Character)
Wend
*c + SizeOf(Character)
Wend
Debug start$
Debug rv$