Page 1 of 1

Regular expression question

Posted: Mon Mar 28, 2016 6:54 pm
by Michael Vogel
I have some files which have to be sorted out, their names are something like...
Text.doc
Test 2015.doc
Abc Part 1.doc
Xyz 2015 Part 3.doc
Zzz.doc


They should be renamed to
0000 Text.doc
2015 Test.doc
0000 Abc (1).doc
2015 Xyz (3).doc
0000 Zzz.doc


So I need to collect the file names to different groups
I: Text, Test, Abc, Xyz and Zzz
II: 2015 or nil
III: 1, 3 or nil

I started to find a regular expression which makes groups for the text before the year, the year itself and after the year:
^(.*?)(2[0-1]\d\d)(.*?)$

This works fine, but not all file names contain a year, so I changed the expression to
^(.*?)(2[0-1]\d\d|)(.*?)$

Later I would have extended the expresion to something like ^(.*?)(2015|)(.*?)(( Part )(\d)|)(\.doc)$

But within these modified expressions, not the first but the last group seems to get all of the content, how can I make certain groups (a little bit) greedier?

Re: Regular expression question

Posted: Tue Mar 29, 2016 5:58 pm
by Michael Vogel
Okay, what about an easier one?

The code below works fine for a check, if the last char equals 'e' (set check.s=positive), but when I try to find all the text which does not end with an 'e' (check=negative), I get matches for each text line. What I am doing wrong?

Code: Select all

positive.s="(.*?)([e])"
negative.s="(.*?)([^e])"

check.s=negative;	<<<< 

Dim test.s(10)
For i=1 To 10
	test(i)="Hall"+Mid("aeiou",Random(4)+1,1)
Next i

RegularCaseMode=#Null

If CreateRegularExpression(0,"^"+check+"$",RegularCaseMode)

	For i=1 To 10
		If ExamineRegularExpression(0,test(i)+#LF$)
			Debug test(i)+" = "+StringField("-.ok",NextRegularExpressionMatch(0)+1,".")
		EndIf
	Next i

EndIf
With some (crazy) modifications, I get correct results...
check.s="(.*?)([^e])"
Dim test.s(10) : For i=1 To 10 : test(i)="Hall"+Mid("aeioue",Random(5)+1,1) : Next i
If CreateRegularExpression(0,"^"+check+trick+"$",#Null)
For i=1 To 10 : If ExamineRegularExpression(0,test(i)+trick+#LF$)
Debug test(i)+" = "+StringField("-.ok",NextRegularExpressionMatch(0)+1,".")
EndIf : Next i
EndIf

Re: Regular expression question

Posted: Tue Mar 29, 2016 9:59 pm
by STARGÅTE
In you example the "[^e]" matches with the #LF$ character.

Re: Regular expression question

Posted: Tue Mar 29, 2016 11:49 pm
by normeus
I would not include the extension on the Reg Ex since you are already selecting only files that have DOC as an extension
and use 2 separate Reg Ex one to process the year and the 2nd one to process "Part [1-9]".

If you really want to use Reg Ex then there is nothing like Didelphodon's RexMan to test your PureBasic code:

http://www.purebasic.fr/english/viewtop ... 39#p284139

Not feeling like reinventing the wheel? :wink:
There is a nice program I use all the time to rearange titles of MP3s & JPGs so that I can control the sequence of play adding numbers etc..
It would work great with your files and it has an undo feature.

Bulk rename Utility. It is free for home use
http://www.bulkrenameutility.co.uk/Download.php

or if it is for business I also like
https://www.advancedrenamer.com/download

both of these programs have portable versions and have a proven record.

Again, download RexMan because it will make your life easier when it comes to Reg Ex. 8)

Norm.
Just realized, Smilies don't get lost in translation :D

Re: Regular expression question

Posted: Wed Mar 30, 2016 6:12 pm
by Michael Vogel
STARGÅTE wrote:In you example the "[^e]" matches with the #LF$ character.
Interesting, thanks - I added the #LF$ to all strings, otherwise '^.*$" does not match with empty strings. And I thought, the '$' at the end does match with the #LF$, but now it seems that I shouldn't add the #LF$ when the string is not empty.

New attempt (which seem to work fine in all cases), I replaced '$' and #LF$ by a different character (Backspace):

Code: Select all

RandomSeed(0)

positive.s="(.*?)([e])"
negative.s="(.*?)([^e])"
all.s=".*"

do=1

check.s=StringField(negative+"~"+positive+"~"+all,do,"~")
;XLF.s="$"
;CLF.s=#LF$
;CLF.s=""
XLF.s=#BS$;          : - )
CLF.s=#BS$;          : - )

Dim test.s(10)
test(0)=""
For i=1 To 10
	test(i)="Hall"+Mid("aeioue",Random(5)+1,1)
Next i

Debug "Checking '"+check+"':"
If CreateRegularExpression(0,"^"+check+XLF,#Null)
	For i=0 To 10
		If ExamineRegularExpression(0,test(i)+CLF)
			Debug test(i)+" = "+StringField("-.ok",NextRegularExpressionMatch(0)+1,".")
		EndIf
	Next i
EndIf

Re: Regular expression question

Posted: Wed Mar 30, 2016 11:13 pm
by eddy
anything up until this word or this character...

Code: Select all

.*?(?=Part|\.)

Code: Select all


DataSection
   Data.s "Text.doc"
   Data.s "Test 2015.doc"
   Data.s "Abc Part 1.doc"
   Data.s "Xyz 2015 Part 3.doc"
   Data.s "Xyz 2015 toto Part 4.doc"
   Data.s "Xyz 2015 titi.doc"
   Data.s "Zzz.doc"
   Data.s ""
EndDataSection


If CreateRegularExpression(0, "^(?<name>[^\s.]+)\s*(?<year>20\d\d)?\s*(?<info>.*?(?=Part|\.))?\s*(Part (?<part>\d+))?\.doc$")
   Repeat
      Read.s filename$
      convertion$=LSet(filename$, 30)
      If ExamineRegularExpression(0, filename$)
         While NextRegularExpressionMatch(0)
            year$=RegularExpressionNamedGroup(0, "year")
            part$=RegularExpressionNamedGroup(0, "part")
            info$=RegularExpressionNamedGroup(0, "info")
            If year$="" : year$="0000" : EndIf
            If part$<>"" : part$=" (" + part$ + ")" : EndIf
            If info$<>"" : info$=" [" + info$ + "]" : EndIf
            convertion$ + #TAB$ + " => " + year$ + " " + RegularExpressionNamedGroup(0, "name") + info$ + part$
         Wend         
      EndIf
      Debug convertion$
   Until filename$=""
Else
   Debug "ERR"
EndIf

Re: Regular expression question

Posted: Thu Mar 31, 2016 6:53 am
by Michael Vogel
eddy wrote:anything up until this word or this character...

Code: Select all

.*?(?=Part|.)
I'm im... impressed!

Tried to adapt your brilliant expression ^(?<name>[^\s.]+)\s*(?<year>20\d\d)?\s*(?<info>.*?(?=Part|\.))?\s*(Part (?<part>\d+))?\.doc$ to even catch the following files (where multiple words or even ciphers could be seen before and after the year), but that seems to be impossible to do this in a single statement...

Text.doc
Text Abc.doc
Test 2015.doc
Test Def 2015.doc
Abc Part 1.doc
Abc Def Part 1.doc
Xyz 2015 Part 3.doc
Xxx Yyy 2015 Zzz Part 3.doc
Aaa 1 Bbb 2 Ccc 2015 Ddd 3 Eee 4 Part 3.doc
Zzz.doc

0000 Text.doc
0000 Text Abc.doc
2015 Test.doc
2015 Test Def.doc
0000 Abc - 1.doc
0000 Abc Def - 1.doc
2015 Xyz - 3.doc
2015 Xxx Yyy - 3.doc
2015 1 Bbb 2 Ccc - 3.doc
0000 Zzz.doc