Page 1 of 1

Position number for ExamineRegularExpression()

Posted: Sat Dec 30, 2023 11:43 pm
by AZJIO

Code: Select all

ExamineRegularExpression(#RegularExpression, String$[, Position])
; Or
SetRegularExpressionPosition(#RegularExpression, String$, Position) ; for ExamineRegularExpression(), ReplaceRegularExpression(), etc
We are always looking for a match from the beginning. The NextRegularExpressionMatch() function remembers the position from which to continue the search. What if I want to start the search not from the beginning, but from a previously found position using another function? What if I want to cache a position for a quick search in the future?

Re: Position number for ExamineRegularExpression()

Posted: Sun Dec 31, 2023 1:13 am
by highend
There is

Code: Select all

RegularExpressionMatchPosition()
Just store these positions during the loop if you want to access them (later)?

Re: Position number for ExamineRegularExpression()

Posted: Sun Dec 31, 2023 1:54 am
by AZJIO
highend wrote: Sun Dec 31, 2023 1:13 am There is

Code: Select all

RegularExpressionMatchPosition()
Just store these positions during the loop if you want to access them (later)?
I need to use the position, not know it. I need to use it to speed up the search. Let's say I have 1000 files of 1 MB each and I use a complex regular expression with forward and backward viewing, while if I need to find some text at the end of the file, then I would not have to view 1000 MB of data, but only 100 MB or 1 MB if I do a one-step viewing, no more than one element in front. That is, waiting for 1 second or 1000 seconds=16 minutes, I hope it matters to you.

Re: Position number for ExamineRegularExpression()

Posted: Sun Dec 31, 2023 8:55 am
by highend
Afaik not possible. That's internal data of the regex engine, you can't start / resume a regex search from a different position.

What you can do is: If you got the position from the first match, split the string at that pos (-1) and start the search on the second part of the split

Re: Position number for ExamineRegularExpression()

Posted: Sun Dec 31, 2023 9:15 am
by Marc56us
AZJIO wrote: Sat Dec 30, 2023 11:43 pm

Code: Select all

ExamineRegularExpression(#RegularExpression, String$[, Position])
Good suggestion.
In the meantime, some ideas (not tested)

1. Isolate the remaining part of the string with a pointer. The search will then be performed on this new string.
- or -
2. If the starting point is known for all files (e.g. search only the end), then create the equivalent of the unix command 'tail' using i.e FileSeek() (In my opinion the best solution as you don't load the whole file in RAM)
- or -
3. If you want to start the search from a certain point, then ignore the beginning. F (remember use the correct encoding). i.e ignore the first 10000 chars

Code: Select all

^(?:.{10000}).+?(<regex>)
etc

Re: Position number for ExamineRegularExpression()

Posted: Sun Dec 31, 2023 12:58 pm
by AZJIO
highend wrote: Sun Dec 31, 2023 8:55 am Afaik not possible. That's internal data of the regex engine, you can't start / resume a regex search from a different position.

What you can do is: If you got the position from the first match, split the string at that pos (-1) and start the search on the second part of the split
Look at the "offset" parameter in the StringRegExp function.
In C, the concept of a string is a pointer to a sequential piece of data. You just add the number "offset" to the pointer and you get the pointer to the last part of the line. We need to pass the new pointer to the regular expression engine. That's it.
Marc56us wrote: Sun Dec 31, 2023 9:15 am In the meantime, some ideas (not tested)
I know these methods.