ISSUES: RegEx options

Share your advanced PureBasic knowledge/code with the community.
User avatar
Tenaja
Addict
Addict
Posts: 1959
Joined: Tue Nov 09, 2010 10:15 pm

ISSUES: RegEx options

Post by Tenaja »

The new Issues feature uses Regex to find Issues. The default setting is the case-sensitive whole word TODO anywhere in the comment.
Default value:
\bTODO\b.*
To make it case-insensitive:
\b(?i)TODO\b.*
the \b is a word boundary (so it is a whole word only), and the (?i) makes it case-insensitive.

To force the TODO to the start of the line, without case sensitivity:
\A(\s)*(?i)TODO\b.*
The \A indicates Start of String, and the (\s)* means 0 ore more white space characters. This combo ignores the keyword TODO when it is not the first word of the comment.
Last edited by Tenaja on Mon Jun 02, 2014 2:57 am, edited 1 time in total.
User avatar
Demivec
Addict
Addict
Posts: 4260
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: ISSUES: RegEx options

Post by Demivec »

Tenaja wrote:To make it case-insensitive:
\b(?i)bTODO\b.*
the \b is a word boundary (so it is a whole word only), and the (?i) makes it case-insensitive.
You got an extra 'b' in there. It should be:

Code: Select all

\b(?i)TODO\b.*
User avatar
Tenaja
Addict
Addict
Posts: 1959
Joined: Tue Nov 09, 2010 10:15 pm

Re: ISSUES: RegEx options

Post by Tenaja »

Thanks; typo fixed.
User avatar
Demivec
Addict
Addict
Posts: 4260
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: ISSUES: RegEx options

Post by Demivec »

I thought I would post my collection of 'issues'. Mine are all made with three punctuation marks.

Code: Select all

;something to get around to
To-do
(?<=\Q###\E).*
Priority: Info

;a must fix
Fix
(?<=\Q!!!\E).*
Priority: High

;code to be refactored
Refactor
(?<=\Q***\E).*
Priority: Low

;a section of code that needs to be tested
Test
(?<=\Q>>>\E).*
Priority: Normal

;a recent change from a previous version
New
(?<=\Q+++\E).*
Priority: Info

;used when analyzing other's code
Why?
(?<=\Q???\E).*
Priority: Normal
Coments on the regular expressions include: (?<=\Q+++\E) is a look-ahead phrase that doesn't become part of the displayed text. Because symbols are being matched \Q and \E are the start and end of a quoted phrase to match exactly. The expressions match everything after the look-ahead phrase to the end of line with .* .

For those that are curious, the punctuation was what I was already using but it can now be translated into the description text and listed with the 'Issues' tool.


I would also like to be able to create a regular expressions to use look-ahead with a case insensitive word (using word boundaries) but haven't been able to get it to work yet.
User avatar
Demivec
Addict
Addict
Posts: 4260
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: ISSUES: RegEx options

Post by Demivec »

I have another update to the 'issues' expressions I use that match three punctuation marks.

This code will match three #'s that are not immediately followed or preceded by another '#'. It will also match at the beginning of the line (meaning, immediately after the ';' in a comment) and will not include the three #'s as part of the matching expression that is displayed in the 'issues' list. It should be easily modified for other punctuation by simply replacing each '#' with the desired punctuation mark.

Code: Select all

(?<=((?<=[^#])|(?<=\A))\Q###\E)[^#].*
I use the phrase '###' here to represent Todo items. I didn't want a match on lines that just included a string of punctuation that was being used for emphasis. The expression given would match the first two lines but not the third.

Code: Select all

;###Must write documentation
count + adjustment ;update count ###Add additional calculations here
;######### start processing events #########
I'll break the expression down for explanation into 5 lettered parts:
  • A. (?<=
    The start of a look-ahead

    B. ((?<=[^#]) | (?<=\A))
    One of these two look-aheads needs to match before the quoted (\Q \E) marks. One is a character that isn't a '#' and the other is the beginning of the line.

    C. \Q###\E
    This is the quoted phrase marked by \Q \E that contains the marks that need to match.

    D. )
    The end of the complete look-ahead

    E. [^#].*
    This part matches the first character that is not a '#' through to the end of the line, if the complete look-ahead matches.
freak
PureBasic Team
PureBasic Team
Posts: 5940
Joined: Fri Apr 25, 2003 5:21 pm
Location: Germany

Re: ISSUES: RegEx options

Post by freak »

FYI, I added two optional named groups (named "display" and "mark") that can be included in the regex to control which part of the match is highlighted and which part is displayed in the tool. You then do not need to use the lookahead anymore.

Example: The default TODO pattern, but only display the part after the TODO in the tool:

Code: Select all

\bTODO\b(?<display>.*)
Example: Match everything in {}, but only highlight the inner part (display the full match in the tool)

Code: Select all

{(?<mark>.*)}
You can also use both named groups in a single regex for full control. If no groups with these names exist in the regex, the full match will be used like it was before.

This will be available with beta 3.
quidquid Latine dictum sit altum videtur
User avatar
Demivec
Addict
Addict
Posts: 4260
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: ISSUES: RegEx options

Post by Demivec »

freak wrote:FYI, I added two optional named groups (named "display" and "mark") that can be included in the regex to control which part of the match is highlighted and which part is displayed in the tool. You then do not need to use the lookahead anymore.
Thank you very very much. :)
User avatar
Demivec
Addict
Addict
Posts: 4260
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: ISSUES: RegEx options

Post by Demivec »

Here's my attempt to use the named groups "display" and "mark" along with the previous described issues using only punctuation marks, each consisting of three identical characters followed by a different character (i.e. ;###Add documentation).

Here's the basic sample for a Todo that uses '#':

Code: Select all

(?<=[^#]|\A)\Q###\E(?=(?<display>[^#].*))(?=(?<mark>[^#].*))
The explanation of the expression is broken down into lettered parts:
  • A. (?<=[^#]|\A)
    This is a look-behind that says the phrase is either preceded by the start of the line or a non-matching character (not a '#').

    B. \Q###\E
    This is the phrase we are looking for, '###'.

    C. (?=(?<display>[^#].*))
    This does a look a look-ahead that says the phrase is followed by a non-matching character (not a '#') and matches till the end of line. It saves the what is matches and displays it in the issue's description.

    D. (?=(?<mark>[^#].*))
    This is identical to C above with the exception that it uses the match to mark the text in the source code highlighting.
Because I needed the display and the marked sections be identical I could only do so by using a look-ahead for each of them. Perhaps there is another way. If only one of them is needed (i.e. the default is OK for the other) then it is not required to be a look-ahead.


The expression above would match the first two lines but not the third. Both matches would display and mark the text after the '###' until the end of line.

Code: Select all

;###Must write documentation
count + adjustment ;update count ###Add additional calculations here
;######### start processing events #########
For normal 'words' instead of punctuation I would follow freak's description and mark and display only the text after the caseless match of the word 'TODO' with:

Code: Select all

(?i)\bTODO\b(?=(?<display>.*))(?=(?<mark>.*))
Post Reply