Regular expression question

Everything else that doesn't fall into one of the other PB categories.
User avatar
Michael Vogel
Addict
Addict
Posts: 2823
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Regular expression question

Post by Michael Vogel »

I have some files which have to be sorted out, their names are something like...
Text.doc
Test 2015.doc
Abc Part 1.doc
Xyz 2015 Part 3.doc
Zzz.doc


They should be renamed to
0000 Text.doc
2015 Test.doc
0000 Abc (1).doc
2015 Xyz (3).doc
0000 Zzz.doc


So I need to collect the file names to different groups
I: Text, Test, Abc, Xyz and Zzz
II: 2015 or nil
III: 1, 3 or nil

I started to find a regular expression which makes groups for the text before the year, the year itself and after the year:
^(.*?)(2[0-1]\d\d)(.*?)$

This works fine, but not all file names contain a year, so I changed the expression to
^(.*?)(2[0-1]\d\d|)(.*?)$

Later I would have extended the expresion to something like ^(.*?)(2015|)(.*?)(( Part )(\d)|)(\.doc)$

But within these modified expressions, not the first but the last group seems to get all of the content, how can I make certain groups (a little bit) greedier?
User avatar
Michael Vogel
Addict
Addict
Posts: 2823
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Re: Regular expression question

Post by Michael Vogel »

Okay, what about an easier one?

The code below works fine for a check, if the last char equals 'e' (set check.s=positive), but when I try to find all the text which does not end with an 'e' (check=negative), I get matches for each text line. What I am doing wrong?

Code: Select all

positive.s="(.*?)([e])"
negative.s="(.*?)([^e])"

check.s=negative;	<<<< 

Dim test.s(10)
For i=1 To 10
	test(i)="Hall"+Mid("aeiou",Random(4)+1,1)
Next i

RegularCaseMode=#Null

If CreateRegularExpression(0,"^"+check+"$",RegularCaseMode)

	For i=1 To 10
		If ExamineRegularExpression(0,test(i)+#LF$)
			Debug test(i)+" = "+StringField("-.ok",NextRegularExpressionMatch(0)+1,".")
		EndIf
	Next i

EndIf
With some (crazy) modifications, I get correct results...
check.s="(.*?)([^e])"
Dim test.s(10) : For i=1 To 10 : test(i)="Hall"+Mid("aeioue",Random(5)+1,1) : Next i
If CreateRegularExpression(0,"^"+check+trick+"$",#Null)
For i=1 To 10 : If ExamineRegularExpression(0,test(i)+trick+#LF$)
Debug test(i)+" = "+StringField("-.ok",NextRegularExpressionMatch(0)+1,".")
EndIf : Next i
EndIf
User avatar
STARGÅTE
Addict
Addict
Posts: 2267
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Regular expression question

Post by STARGÅTE »

In you example the "[^e]" matches with the #LF$ character.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
normeus
Enthusiast
Enthusiast
Posts: 485
Joined: Fri Apr 20, 2012 8:09 pm
Contact:

Re: Regular expression question

Post by normeus »

I would not include the extension on the Reg Ex since you are already selecting only files that have DOC as an extension
and use 2 separate Reg Ex one to process the year and the 2nd one to process "Part [1-9]".

If you really want to use Reg Ex then there is nothing like Didelphodon's RexMan to test your PureBasic code:

http://www.purebasic.fr/english/viewtop ... 39#p284139

Not feeling like reinventing the wheel? :wink:
There is a nice program I use all the time to rearange titles of MP3s & JPGs so that I can control the sequence of play adding numbers etc..
It would work great with your files and it has an undo feature.

Bulk rename Utility. It is free for home use
http://www.bulkrenameutility.co.uk/Download.php

or if it is for business I also like
https://www.advancedrenamer.com/download

both of these programs have portable versions and have a proven record.

Again, download RexMan because it will make your life easier when it comes to Reg Ex. 8)

Norm.
Just realized, Smilies don't get lost in translation :D
google Translate;Makes my jokes fall flat- Fait mes blagues tombent à plat- Machte meine Witze verpuffen- Eh cumpari ci vo sunari
User avatar
Michael Vogel
Addict
Addict
Posts: 2823
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Re: Regular expression question

Post by Michael Vogel »

STARGÅTE wrote:In you example the "[^e]" matches with the #LF$ character.
Interesting, thanks - I added the #LF$ to all strings, otherwise '^.*$" does not match with empty strings. And I thought, the '$' at the end does match with the #LF$, but now it seems that I shouldn't add the #LF$ when the string is not empty.

New attempt (which seem to work fine in all cases), I replaced '$' and #LF$ by a different character (Backspace):

Code: Select all

RandomSeed(0)

positive.s="(.*?)([e])"
negative.s="(.*?)([^e])"
all.s=".*"

do=1

check.s=StringField(negative+"~"+positive+"~"+all,do,"~")
;XLF.s="$"
;CLF.s=#LF$
;CLF.s=""
XLF.s=#BS$;          : - )
CLF.s=#BS$;          : - )

Dim test.s(10)
test(0)=""
For i=1 To 10
	test(i)="Hall"+Mid("aeioue",Random(5)+1,1)
Next i

Debug "Checking '"+check+"':"
If CreateRegularExpression(0,"^"+check+XLF,#Null)
	For i=0 To 10
		If ExamineRegularExpression(0,test(i)+CLF)
			Debug test(i)+" = "+StringField("-.ok",NextRegularExpressionMatch(0)+1,".")
		EndIf
	Next i
EndIf
User avatar
eddy
Addict
Addict
Posts: 1479
Joined: Mon May 26, 2003 3:07 pm
Location: Nantes

Re: Regular expression question

Post by eddy »

anything up until this word or this character...

Code: Select all

.*?(?=Part|\.)

Code: Select all


DataSection
   Data.s "Text.doc"
   Data.s "Test 2015.doc"
   Data.s "Abc Part 1.doc"
   Data.s "Xyz 2015 Part 3.doc"
   Data.s "Xyz 2015 toto Part 4.doc"
   Data.s "Xyz 2015 titi.doc"
   Data.s "Zzz.doc"
   Data.s ""
EndDataSection


If CreateRegularExpression(0, "^(?<name>[^\s.]+)\s*(?<year>20\d\d)?\s*(?<info>.*?(?=Part|\.))?\s*(Part (?<part>\d+))?\.doc$")
   Repeat
      Read.s filename$
      convertion$=LSet(filename$, 30)
      If ExamineRegularExpression(0, filename$)
         While NextRegularExpressionMatch(0)
            year$=RegularExpressionNamedGroup(0, "year")
            part$=RegularExpressionNamedGroup(0, "part")
            info$=RegularExpressionNamedGroup(0, "info")
            If year$="" : year$="0000" : EndIf
            If part$<>"" : part$=" (" + part$ + ")" : EndIf
            If info$<>"" : info$=" [" + info$ + "]" : EndIf
            convertion$ + #TAB$ + " => " + year$ + " " + RegularExpressionNamedGroup(0, "name") + info$ + part$
         Wend         
      EndIf
      Debug convertion$
   Until filename$=""
Else
   Debug "ERR"
EndIf
Last edited by eddy on Sat Apr 02, 2016 10:28 am, edited 3 times in total.
Imagewin10 x64 5.72 | IDE | PB plugin | Tools | Sprite | JSON | visual tool
User avatar
Michael Vogel
Addict
Addict
Posts: 2823
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Re: Regular expression question

Post by Michael Vogel »

eddy wrote:anything up until this word or this character...

Code: Select all

.*?(?=Part|.)
I'm im... impressed!

Tried to adapt your brilliant expression ^(?<name>[^\s.]+)\s*(?<year>20\d\d)?\s*(?<info>.*?(?=Part|\.))?\s*(Part (?<part>\d+))?\.doc$ to even catch the following files (where multiple words or even ciphers could be seen before and after the year), but that seems to be impossible to do this in a single statement...

Text.doc
Text Abc.doc
Test 2015.doc
Test Def 2015.doc
Abc Part 1.doc
Abc Def Part 1.doc
Xyz 2015 Part 3.doc
Xxx Yyy 2015 Zzz Part 3.doc
Aaa 1 Bbb 2 Ccc 2015 Ddd 3 Eee 4 Part 3.doc
Zzz.doc

0000 Text.doc
0000 Text Abc.doc
2015 Test.doc
2015 Test Def.doc
0000 Abc - 1.doc
0000 Abc Def - 1.doc
2015 Xyz - 3.doc
2015 Xxx Yyy - 3.doc
2015 1 Bbb 2 Ccc - 3.doc
0000 Zzz.doc
Post Reply