Page 1 of 1
Regular expression question
Posted: Mon Mar 28, 2016 6:54 pm
by Michael Vogel
I have some files which have to be sorted out, their names are something like...
Text.doc
Test 2015.doc
Abc Part 1.doc
Xyz 2015 Part 3.doc
Zzz.doc
They should be renamed to
0000 Text.doc
2015 Test.doc
0000 Abc (1).doc
2015 Xyz (3).doc
0000 Zzz.doc
So I need to collect the file names to different groups
I: Text, Test, Abc, Xyz and Zzz
II: 2015 or nil
III: 1, 3 or nil
I started to find a regular expression which makes groups for the text before the year, the year itself and after the year:
^(.*?)(2[0-1]\d\d)(.*?)$
This works fine, but not all file names contain a year, so I changed the expression to
^(.*?)(2[0-1]\d\d|)(.*?)$
Later I would have extended the expresion to something like ^(.*?)(2015|)(.*?)(( Part )(\d)|)(\.doc)$
But within these modified expressions, not the first but the last group seems to get all of the content, how can I make certain groups (a little bit) greedier?
Re: Regular expression question
Posted: Tue Mar 29, 2016 5:58 pm
by Michael Vogel
Okay, what about an easier one?
The code below works fine for a check, if the last char equals 'e' (set check.s=positive), but when I try to find all the text which does not end with an 'e' (check=negative), I get matches for each text line. What I am doing wrong?
Code: Select all
positive.s="(.*?)([e])"
negative.s="(.*?)([^e])"
check.s=negative; <<<<
Dim test.s(10)
For i=1 To 10
test(i)="Hall"+Mid("aeiou",Random(4)+1,1)
Next i
RegularCaseMode=#Null
If CreateRegularExpression(0,"^"+check+"$",RegularCaseMode)
For i=1 To 10
If ExamineRegularExpression(0,test(i)+#LF$)
Debug test(i)+" = "+StringField("-.ok",NextRegularExpressionMatch(0)+1,".")
EndIf
Next i
EndIf
With some (crazy) modifications, I get correct results...
check.s="(.*?)([^e])"
Dim test.s(10) : For i=1 To 10 : test(i)="Hall"+Mid("aeioue",Random(5)+1,1) : Next i
If CreateRegularExpression(0,"^"+check+trick+"$",#Null)
For i=1 To 10 : If ExamineRegularExpression(0,test(i)+trick+#LF$)
Debug test(i)+" = "+StringField("-.ok",NextRegularExpressionMatch(0)+1,".")
EndIf : Next i
EndIf
Re: Regular expression question
Posted: Tue Mar 29, 2016 9:59 pm
by STARGÅTE
In you example the "[^e]" matches with the #LF$ character.
Re: Regular expression question
Posted: Tue Mar 29, 2016 11:49 pm
by normeus
I would not include the extension on the Reg Ex since you are already selecting only files that have DOC as an extension
and use 2 separate Reg Ex one to process the year and the 2nd one to process "Part [1-9]".
If you really want to use Reg Ex then there is nothing like Didelphodon's RexMan to test your PureBasic code:
http://www.purebasic.fr/english/viewtop ... 39#p284139
Not feeling like reinventing the wheel?
There is a nice program I use all the time to rearange titles of MP3s & JPGs so that I can control the sequence of play adding numbers etc..
It would work great with your files and it has an undo feature.
Bulk rename Utility. It is free for home use
http://www.bulkrenameutility.co.uk/Download.php
or if it is for business I also like
https://www.advancedrenamer.com/download
both of these programs have portable versions and have a proven record.
Again, download RexMan because it will make your life easier when it comes to Reg Ex.
Norm.
Just realized, Smilies don't get lost in translation

Re: Regular expression question
Posted: Wed Mar 30, 2016 6:12 pm
by Michael Vogel
STARGÅTE wrote:In you example the "[^e]" matches with the #LF$ character.
Interesting, thanks - I added the #LF$ to all strings, otherwise '^.*$" does not match with empty strings. And I thought, the '$' at the end does match with the #LF$, but now it seems that I shouldn't add the #LF$ when the string is not empty.
New attempt (which seem to work fine in all cases), I replaced '$' and #LF$ by a different character (Backspace):
Code: Select all
RandomSeed(0)
positive.s="(.*?)([e])"
negative.s="(.*?)([^e])"
all.s=".*"
do=1
check.s=StringField(negative+"~"+positive+"~"+all,do,"~")
;XLF.s="$"
;CLF.s=#LF$
;CLF.s=""
XLF.s=#BS$; : - )
CLF.s=#BS$; : - )
Dim test.s(10)
test(0)=""
For i=1 To 10
test(i)="Hall"+Mid("aeioue",Random(5)+1,1)
Next i
Debug "Checking '"+check+"':"
If CreateRegularExpression(0,"^"+check+XLF,#Null)
For i=0 To 10
If ExamineRegularExpression(0,test(i)+CLF)
Debug test(i)+" = "+StringField("-.ok",NextRegularExpressionMatch(0)+1,".")
EndIf
Next i
EndIf
Re: Regular expression question
Posted: Wed Mar 30, 2016 11:13 pm
by eddy
anything up until this word or this character...
Code: Select all
DataSection
Data.s "Text.doc"
Data.s "Test 2015.doc"
Data.s "Abc Part 1.doc"
Data.s "Xyz 2015 Part 3.doc"
Data.s "Xyz 2015 toto Part 4.doc"
Data.s "Xyz 2015 titi.doc"
Data.s "Zzz.doc"
Data.s ""
EndDataSection
If CreateRegularExpression(0, "^(?<name>[^\s.]+)\s*(?<year>20\d\d)?\s*(?<info>.*?(?=Part|\.))?\s*(Part (?<part>\d+))?\.doc$")
Repeat
Read.s filename$
convertion$=LSet(filename$, 30)
If ExamineRegularExpression(0, filename$)
While NextRegularExpressionMatch(0)
year$=RegularExpressionNamedGroup(0, "year")
part$=RegularExpressionNamedGroup(0, "part")
info$=RegularExpressionNamedGroup(0, "info")
If year$="" : year$="0000" : EndIf
If part$<>"" : part$=" (" + part$ + ")" : EndIf
If info$<>"" : info$=" [" + info$ + "]" : EndIf
convertion$ + #TAB$ + " => " + year$ + " " + RegularExpressionNamedGroup(0, "name") + info$ + part$
Wend
EndIf
Debug convertion$
Until filename$=""
Else
Debug "ERR"
EndIf
Re: Regular expression question
Posted: Thu Mar 31, 2016 6:53 am
by Michael Vogel
eddy wrote:anything up until this word or this character...
I'm im... impressed!
Tried to adapt your brilliant expression ^(?<name>[^\s.]+)\s*(?<year>20\d\d)?\s*(?<info>.*?(?=Part|\.))?\s*(Part (?<part>\d+))?\.doc$ to even catch the following files (where multiple words or even ciphers could be seen before and after the year), but that seems to be impossible to do this in a single statement...
Text.doc
Text Abc.doc
Test 2015.doc
Test Def 2015.doc
Abc Part 1.doc
Abc Def Part 1.doc
Xyz 2015 Part 3.doc
Xxx Yyy 2015 Zzz Part 3.doc
Aaa 1 Bbb 2 Ccc 2015 Ddd 3 Eee 4 Part 3.doc
Zzz.doc
0000 Text.doc
0000 Text Abc.doc
2015 Test.doc
2015 Test Def.doc
0000 Abc - 1.doc
0000 Abc Def - 1.doc
2015 Xyz - 3.doc
2015 Xxx Yyy - 3.doc
2015 1 Bbb 2 Ccc - 3.doc
0000 Zzz.doc