I have some files which have to be sorted out, their names are something like...
Text.doc
Test 2015.doc
Abc Part 1.doc
Xyz 2015 Part 3.doc
Zzz.doc
They should be renamed to
0000 Text.doc
2015 Test.doc
0000 Abc (1).doc
2015 Xyz (3).doc
0000 Zzz.doc
So I need to collect the file names to different groups
I: Text, Test, Abc, Xyz and Zzz
II: 2015 or nil
III: 1, 3 or nil
I started to find a regular expression which makes groups for the text before the year, the year itself and after the year:
^(.*?)(2[0-1]\d\d)(.*?)$
This works fine, but not all file names contain a year, so I changed the expression to
^(.*?)(2[0-1]\d\d|)(.*?)$
Later I would have extended the expresion to something like ^(.*?)(2015|)(.*?)(( Part )(\d)|)(\.doc)$
But within these modified expressions, not the first but the last group seems to get all of the content, how can I make certain groups (a little bit) greedier?
Regular expression question
- Michael Vogel
- Addict

- Posts: 2823
- Joined: Thu Feb 09, 2006 11:27 pm
- Contact:
- Michael Vogel
- Addict

- Posts: 2823
- Joined: Thu Feb 09, 2006 11:27 pm
- Contact:
Re: Regular expression question
Okay, what about an easier one?
The code below works fine for a check, if the last char equals 'e' (set check.s=positive), but when I try to find all the text which does not end with an 'e' (check=negative), I get matches for each text line. What I am doing wrong?
With some (crazy) modifications, I get correct results...
check.s="(.*?)([^e])"
Dim test.s(10) : For i=1 To 10 : test(i)="Hall"+Mid("aeioue",Random(5)+1,1) : Next i
If CreateRegularExpression(0,"^"+check+trick+"$",#Null)
For i=1 To 10 : If ExamineRegularExpression(0,test(i)+trick+#LF$)
Debug test(i)+" = "+StringField("-.ok",NextRegularExpressionMatch(0)+1,".")
EndIf : Next i
EndIf
The code below works fine for a check, if the last char equals 'e' (set check.s=positive), but when I try to find all the text which does not end with an 'e' (check=negative), I get matches for each text line. What I am doing wrong?
Code: Select all
positive.s="(.*?)([e])"
negative.s="(.*?)([^e])"
check.s=negative; <<<<
Dim test.s(10)
For i=1 To 10
test(i)="Hall"+Mid("aeiou",Random(4)+1,1)
Next i
RegularCaseMode=#Null
If CreateRegularExpression(0,"^"+check+"$",RegularCaseMode)
For i=1 To 10
If ExamineRegularExpression(0,test(i)+#LF$)
Debug test(i)+" = "+StringField("-.ok",NextRegularExpressionMatch(0)+1,".")
EndIf
Next i
EndIf
check.s="(.*?)([^e])"
Dim test.s(10) : For i=1 To 10 : test(i)="Hall"+Mid("aeioue",Random(5)+1,1) : Next i
If CreateRegularExpression(0,"^"+check+trick+"$",#Null)
For i=1 To 10 : If ExamineRegularExpression(0,test(i)+trick+#LF$)
Debug test(i)+" = "+StringField("-.ok",NextRegularExpressionMatch(0)+1,".")
EndIf : Next i
EndIf
Re: Regular expression question
In you example the "[^e]" matches with the #LF$ character.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and more ― Typeface - Sprite-based font include/module
Lizard - Script language for symbolic calculations and more ― Typeface - Sprite-based font include/module
Re: Regular expression question
I would not include the extension on the Reg Ex since you are already selecting only files that have DOC as an extension
and use 2 separate Reg Ex one to process the year and the 2nd one to process "Part [1-9]".
If you really want to use Reg Ex then there is nothing like Didelphodon's RexMan to test your PureBasic code:
http://www.purebasic.fr/english/viewtop ... 39#p284139
Not feeling like reinventing the wheel?
There is a nice program I use all the time to rearange titles of MP3s & JPGs so that I can control the sequence of play adding numbers etc..
It would work great with your files and it has an undo feature.
Bulk rename Utility. It is free for home use
http://www.bulkrenameutility.co.uk/Download.php
or if it is for business I also like
https://www.advancedrenamer.com/download
both of these programs have portable versions and have a proven record.
Again, download RexMan because it will make your life easier when it comes to Reg Ex.
Norm.
Just realized, Smilies don't get lost in translation
and use 2 separate Reg Ex one to process the year and the 2nd one to process "Part [1-9]".
If you really want to use Reg Ex then there is nothing like Didelphodon's RexMan to test your PureBasic code:
http://www.purebasic.fr/english/viewtop ... 39#p284139
Not feeling like reinventing the wheel?
There is a nice program I use all the time to rearange titles of MP3s & JPGs so that I can control the sequence of play adding numbers etc..
It would work great with your files and it has an undo feature.
Bulk rename Utility. It is free for home use
http://www.bulkrenameutility.co.uk/Download.php
or if it is for business I also like
https://www.advancedrenamer.com/download
both of these programs have portable versions and have a proven record.
Again, download RexMan because it will make your life easier when it comes to Reg Ex.
Norm.
Just realized, Smilies don't get lost in translation
google Translate;Makes my jokes fall flat- Fait mes blagues tombent à plat- Machte meine Witze verpuffen- Eh cumpari ci vo sunari
- Michael Vogel
- Addict

- Posts: 2823
- Joined: Thu Feb 09, 2006 11:27 pm
- Contact:
Re: Regular expression question
Interesting, thanks - I added the #LF$ to all strings, otherwise '^.*$" does not match with empty strings. And I thought, the '$' at the end does match with the #LF$, but now it seems that I shouldn't add the #LF$ when the string is not empty.STARGÅTE wrote:In you example the "[^e]" matches with the #LF$ character.
New attempt (which seem to work fine in all cases), I replaced '$' and #LF$ by a different character (Backspace):
Code: Select all
RandomSeed(0)
positive.s="(.*?)([e])"
negative.s="(.*?)([^e])"
all.s=".*"
do=1
check.s=StringField(negative+"~"+positive+"~"+all,do,"~")
;XLF.s="$"
;CLF.s=#LF$
;CLF.s=""
XLF.s=#BS$; : - )
CLF.s=#BS$; : - )
Dim test.s(10)
test(0)=""
For i=1 To 10
test(i)="Hall"+Mid("aeioue",Random(5)+1,1)
Next i
Debug "Checking '"+check+"':"
If CreateRegularExpression(0,"^"+check+XLF,#Null)
For i=0 To 10
If ExamineRegularExpression(0,test(i)+CLF)
Debug test(i)+" = "+StringField("-.ok",NextRegularExpressionMatch(0)+1,".")
EndIf
Next i
EndIfRe: Regular expression question
anything up until this word or this character...
Code: Select all
.*?(?=Part|\.)Code: Select all
DataSection
Data.s "Text.doc"
Data.s "Test 2015.doc"
Data.s "Abc Part 1.doc"
Data.s "Xyz 2015 Part 3.doc"
Data.s "Xyz 2015 toto Part 4.doc"
Data.s "Xyz 2015 titi.doc"
Data.s "Zzz.doc"
Data.s ""
EndDataSection
If CreateRegularExpression(0, "^(?<name>[^\s.]+)\s*(?<year>20\d\d)?\s*(?<info>.*?(?=Part|\.))?\s*(Part (?<part>\d+))?\.doc$")
Repeat
Read.s filename$
convertion$=LSet(filename$, 30)
If ExamineRegularExpression(0, filename$)
While NextRegularExpressionMatch(0)
year$=RegularExpressionNamedGroup(0, "year")
part$=RegularExpressionNamedGroup(0, "part")
info$=RegularExpressionNamedGroup(0, "info")
If year$="" : year$="0000" : EndIf
If part$<>"" : part$=" (" + part$ + ")" : EndIf
If info$<>"" : info$=" [" + info$ + "]" : EndIf
convertion$ + #TAB$ + " => " + year$ + " " + RegularExpressionNamedGroup(0, "name") + info$ + part$
Wend
EndIf
Debug convertion$
Until filename$=""
Else
Debug "ERR"
EndIf
Last edited by eddy on Sat Apr 02, 2016 10:28 am, edited 3 times in total.
win10 x64 5.72 | IDE | PB plugin | Tools | Sprite | JSON | visual tool- Michael Vogel
- Addict

- Posts: 2823
- Joined: Thu Feb 09, 2006 11:27 pm
- Contact:
Re: Regular expression question
I'm im... impressed!eddy wrote:anything up until this word or this character...Code: Select all
.*?(?=Part|.)
Tried to adapt your brilliant expression ^(?<name>[^\s.]+)\s*(?<year>20\d\d)?\s*(?<info>.*?(?=Part|\.))?\s*(Part (?<part>\d+))?\.doc$ to even catch the following files (where multiple words or even ciphers could be seen before and after the year), but that seems to be impossible to do this in a single statement...
Text.doc
Text Abc.doc
Test 2015.doc
Test Def 2015.doc
Abc Part 1.doc
Abc Def Part 1.doc
Xyz 2015 Part 3.doc
Xxx Yyy 2015 Zzz Part 3.doc
Aaa 1 Bbb 2 Ccc 2015 Ddd 3 Eee 4 Part 3.doc
Zzz.doc
0000 Text.doc
0000 Text Abc.doc
2015 Test.doc
2015 Test Def.doc
0000 Abc - 1.doc
0000 Abc Def - 1.doc
2015 Xyz - 3.doc
2015 Xxx Yyy - 3.doc
2015 1 Bbb 2 Ccc - 3.doc
0000 Zzz.doc
