It is currently Fri Mar 05, 2021 1:28 pm

All times are UTC + 1 hour




Post new topic Reply to topic  [ 7 posts ] 
Author Message
 Post subject: ReplaceRegularExpression() fail with "^" pattern
PostPosted: Thu Oct 22, 2020 9:22 am 
Offline
User
User
User avatar

Joined: Sat Oct 18, 2014 8:37 am
Posts: 22
I tried to indent some text using regular expression but it fail in tree ways.

The code:
Code:
If CreateRegularExpression(0, "^",#PB_RegularExpression_MultiLine)
Else
   Debug RegularExpressionError()
   End
EndIf

Text$ = "This multiline"+#LF$+
        "paragraph should"+#LF$+
        #LF$+
        "be indented"+#LF$+
        #LF$+
        #LF$+
        "with the [INDENT] string."+#LF$

Debug ReplaceRegularExpression(0, Text$, "[INDENT]")

Currently output
Quote:
[INDENT]his multiline
[INDENT][INDENT]aragraph should
[INDENT][INDENT][INDENT]e indented
[INDENT][INDENT][INDENT][INDENT]ith the [INDENT] string.



instead of
Quote:
[INDENT]This multiline
[INDENT]paragraph should
[INDENT]
[INDENT]be indented
[INDENT]
[INDENT]
[INDENT]With the [INDENT] string.

Like regex101.com does

Bug 1:
The "^" pattern match only a zero-width position at the start of the string but ReplaceRegularExpression() also replace the next char (like the "^." pattern.

Bug 2:
ReplaceRegularExpression() also collapse multiple line break

Bug 3:
At each line break it replace the text one more time than it should.

Note 1:
The last line break is not replaced and this is normal.

More infos about the circumflex in the PCRE doc : http://www.pcre.org/original/doc/html/p ... .html#SEC6

Tested with PB 5.72 x64 on windows 8.1 (New code, not tested for regression)

_________________
Please correct me if my English is bad.


Top
 Profile  
Reply with quote  
 Post subject: Re: ReplaceRegularExpression() fail with "^" pattern
PostPosted: Thu Oct 22, 2020 9:32 am 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Sun Jun 22, 2003 7:43 pm
Posts: 723
Location: Germany, Saarbrücken
The same issue occurs on Linux.

_________________
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.


Top
 Profile  
Reply with quote  
 Post subject: Re: ReplaceRegularExpression() fail with "^" pattern
PostPosted: Thu Oct 22, 2020 1:24 pm 
Offline
Addict
Addict

Joined: Thu Jun 07, 2007 3:25 pm
Posts: 3972
Location: Berlin, Germany
Naheulf wrote:
I tried to indent some text using regular expression but it fail in tree ways.

I think there is only one bug in this context:
Code:
CreateRegularExpression(0, "^",#PB_RegularExpression_MultiLine)
should return 0 (to indicate an error) because a circumflex alone IMHO cannot be considered a valid regular expression.

_________________
Please excuse my flawed English. My native language is PureBasic.
Search
RSBasic's backups


Top
 Profile  
Reply with quote  
 Post subject: Re: ReplaceRegularExpression() fail with "^" pattern
PostPosted: Thu Oct 22, 2020 1:57 pm 
Offline
Addict
Addict

Joined: Sat Feb 08, 2014 3:26 pm
Posts: 989
Quote:
should return 0 (to indicate an error) because a circumflex alone IMHO cannot be considered a valid regular expression.
No, it's quite valid :wink:
This works well in all text editors that support regex (ie: Notepad++).

What is also strange is that under PB, the first letter of each line is also deleted ? (see example or try)

Already #LF$ poses him a problem. A different (less false) result is obtained with #CRLF$
#PB_RegularExpression_AnyNewLine does not change anything.

Edit: This works (^.)
Code:
If CreateRegularExpression(0, "^.",#PB_RegularExpression_MultiLine)
Else
   Debug RegularExpressionError()
   End
EndIf

Text$ = "This multiline"+#LF$+
        "paragraph should"+#LF$+
        #LF$+
        "be indented"+#LF$+
        #LF$+
        #LF$+
        "with the [INDENT] string."+#LF$

Debug ReplaceRegularExpression(0, Text$, "[INDENT]")

But the first letter of each line is dropped and he doesn't write [INDENT] on the blank lines. However he writes the right number of lines
Code:
[INDENT]his multiline
[INDENT]aragraph should

[INDENT]e indented


[INDENT]ith the [INDENT] string.
Not much is missing anymore... 8)

Not sure if it's a bug after all? Maybe just the implementation done in PB ?


Top
 Profile  
Reply with quote  
 Post subject: Re: ReplaceRegularExpression() fail with "^" pattern
PostPosted: Thu Oct 22, 2020 4:33 pm 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Sun Jun 22, 2003 7:43 pm
Posts: 723
Location: Germany, Saarbrücken
At least you can fake it like this as a workaround.
Code:
If CreateRegularExpression(0, #LF$, #PB_RegularExpression_MultiLine)
Else
   Debug RegularExpressionError()
   End
EndIf

Text$ = "This multiline"+#LF$+
        "paragraph should"+#LF$+
        #LF$+
        "be indented"+#LF$+
        #LF$+
        #LF$+
        "with the [INDENT] string."+#LF$

Debug "[INDENT]" + ReplaceRegularExpression(0, Text$, #LF$ + "[INDENT]")

But for this case it would be even more simple to just use ReplaceString: :wink:
Code:
Text$ = "This multiline"+#LF$+
        "paragraph should"+#LF$+
        #LF$+
        "be indented"+#LF$+
        #LF$+
        #LF$+
        "with the [INDENT] string."+#LF$

Debug "[INDENT]" + ReplaceString(Text$, #LF$, #LF$ + "[INDENT]")

But in the end it is still a bug.

_________________
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.


Top
 Profile  
Reply with quote  
 Post subject: Re: ReplaceRegularExpression() fail with "^" pattern
PostPosted: Thu Oct 22, 2020 5:05 pm 
Offline
User
User
User avatar

Joined: Sat Oct 18, 2014 8:37 am
Posts: 22
The new line with #LF$ is only the last test I made.
I also tested :
- with #CR$ and #CRLF$ as well
- with and without #PB_RegularExpression_AnyNewLine
- with and without "(*CR)" or "(*LF)" or "(*CRLF)" or "(*ANYCRLF)" at the start of pattern (see PCRE doc)

Another strange thing that come with ExamineRegularExpression() and NextRegularExpressionMatch() : the same match (position + length) is returned several times.

_________________
Please correct me if my English is bad.


Top
 Profile  
Reply with quote  
 Post subject: Re: ReplaceRegularExpression() fail with "^" pattern
PostPosted: Thu Oct 22, 2020 9:11 pm 
Offline
Addict
Addict

Joined: Thu Jun 07, 2007 3:25 pm
Posts: 3972
Location: Berlin, Germany
Marc56us wrote:
Quote:
should return 0 (to indicate an error) because a circumflex alone IMHO cannot be considered a valid regular expression.
No, it's quite valid :wink:
This works well in all text editors that support regex (ie: Notepad++).

Yes, you are right. It's tricky, but I see now.
A circumflex (caret) alone matches <nothing> at the start of each line, i.e. the position directly before the first character of each line.

_________________
Please excuse my flawed English. My native language is PureBasic.
Search
RSBasic's backups


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  

 


Powered by phpBB © 2008 phpBB Group
subSilver+ theme by Canver Software, sponsor Sanal Modifiye