RegEx engine module that builds NFA/DFA and uses it for matching

Applications, Games, Tools, User libs and useful stuff coded in PureBasic
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 538
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

RegEx engine module that builds NFA/DFA and uses it for matching

Post by Sicro »

RegEx engine module for PureBasic
  • GitHub project page: Click
  • Download: Click (see assets)
I've been working on a "regex engine" module for PureBasic for a while. The engine compiles a regular expression into an NFA (Nondeterministic finite automaton), optionally also into a very fast DFA (Deterministic finite automaton), and can run the NFA/DFA against a string.

When matching, the engine always chooses the longest match among several possible matches. No backtracking is required during this process, since all alternatives are checked simultaneously. So for a regular expression "0123x|0123y" and the string "0123y", after the failure at "x", the engine does not start again from the beginning at "0", but is behind "3" and the engine only has to read "y".

It is also possible to pass multiple regular expressions to the engine and set unique ID numbers for them to be able to determine which regular expression matched in case of a match. This functionality makes it easy to create lexers. This is also the real motivation that led to this project. However, the engine is kept flexible in its usage and can easily be used for other purposes as well, which is why I don't consider it exclusively a lexer engine and therefore didn't name it that way.

You can find some code examples, the listing of the supported syntax for the regular expressions, further information and the module itself on GitHub.

Currently, the project is in beta phase, so I would be very grateful for feedback from you.

Features that are currently planned for upcoming versions: Click
These features will not be included in the current beta phase of version 1.0.0, but only in the next minor versions, e.g. 1.1.0, 1.2.0 etc.

Note that this project does not intend to provide a RegEx engine to replace more feature-rich engines like the PCRE natively integrated in PureBasic. So there will be no support for capturing groups, backreferences, word boundaries and anchors.
Last edited by Sicro on Sun Jan 01, 2023 7:47 pm, edited 2 times in total.
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 538
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

Re: RegEx engine module that builds NFA/DFA and uses it for matching

Post by Sicro »

1.0.0-beta.2 released

Download: Click

Improved
  • Unicode case unfolding table rewritten, resulting in 44% smaller executables.
  • Character classes now produce much more compact NFAs, resulting in faster NFA matching, faster DFA creation, and less memory consumption.
  • The more compact NFAs also benefit the DFAs, as they are now much smaller as well.
Fixed
  • Escaping opening square brackets within character classes was not possible.
Last edited by Sicro on Sat Dec 31, 2022 3:37 pm, edited 3 times in total.
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5342
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: RegEx engine module that builds NFA/DFA and uses it for matching

Post by Kwai chang caine »

Hello SICRO

First thanks for sharing this nice code 8)
Unfortunately, like surely numerous persons i don't understand one word of the REGEX :oops:
But the worst it's that interesting me when even :mrgreen:

I profit from your splendid works to ask to you a question, if you allow me :wink:
Your work can he translate REGEX in something understandable by the first fool to come ? (like me for exampler :mrgreen: )
And if not, know you a very very simple freeware, for babies, who can create a REGEX without learn this complex rules ? :oops:

Congratulations again for your work 8)
ImageThe happiness is a road...
Not a destination
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 538
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

Re: RegEx engine module that builds NFA/DFA and uses it for matching

Post by Sicro »

Kwai chang caine wrote: Thu Oct 27, 2022 4:36 pm Your work can he translate REGEX in something understandable by the first fool to come ? (like me for exampler :mrgreen: )
If you have trouble imagining how a RegEx works, you can use a RegEx visualizer or a tool that explains the RegEx.
Kwai chang caine wrote: Thu Oct 27, 2022 4:36 pm And if not, know you a very very simple freeware, for babies, who can create a REGEX without learn this complex rules ? :oops:
You don't need a tool for that. If you experiment a bit with RegEx, you will quickly understand how it works. Start simple, like "a|b" and experiment further.

If you want to discuss further about RegEx in general, better create a new thread in this forum.
Kwai chang caine wrote: Thu Oct 27, 2022 4:36 pmCongratulations again for your work 8)
Thanks!
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5342
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: RegEx engine module that builds NFA/DFA and uses it for matching

Post by Kwai chang caine »

Thanks a lot for your answer 8)
ImageThe happiness is a road...
Not a destination
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 538
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

Re: RegEx engine module that builds NFA/DFA and uses it for matching

Post by Sicro »

1.0.0-beta.2 has been re-released.

The SimpleCaseUnfolding table was a bit buggy. I didn't want to release a new beta version number for that.

Beta 3 is already in work and will probably be the last beta version.
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 538
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

Re: RegEx engine module that builds NFA/DFA and uses it for matching

Post by Sicro »

Finale 1.0.0 released

After some testing and improvements, I have now decided to release the final version 1.0.0

Download: Click

New
  • Added ASCII mode, which restricts the predefined character classes to ASCII characters only (encoding is still UCS-2) and restricts case-insensitive mode to uppercase and lowercase letters only, instead of applying case-folding.
  • Developer tools added.
Improved
  • More error handling added.
Fixed
  • Processing of RegExes like (a*)* or (a*)+ triggered an infinite loop.
  • Parsing of RegEx modes was fixed.
  • Memory leaks have been fixed.
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Post Reply