Page 1 of 1

RegExp help needed

Posted: Mon Aug 07, 2017 9:27 am
by Fangbeast
I've read the regexp tutorials but they do my head in and I've only gotten so far so I thought I'd yell for help.

I have a large(ish) collection of books stored on my hard drive and I want to import them into Calibre using their RegExp engine so I don't have to retype everything.

Given a filename formatting like this:

Abbey, Lynn - [Thieves World New Series 01] - Turning Points (v1.0) (html).rar

Using this expression:

(?P<author>.+) - (?P<series>.+) - (?P<title>.+)

I correctly get Author, Series and title into the program in the right fields.

However, I need to remove the square ([]) brackets from the series name and extract the series index into the correct calibre field from the series data and strip all the crap past the title.

They gave this as an example:

(?P<author>[^_-]+) -?\s*(?P<series>[^_0-9-]*)(?P<series_index>[0-9]*)\s*-\s*(?P<title>[^_].+) ?

Should I cut my wrists right now? (hopeful look)

Re: RegExp help needed

Posted: Mon Aug 07, 2017 10:05 am
by Marc56us
An approach (not complete)

Code: Select all

(?P<author>[^_-]+)\s+-\s+\[(?P<series>[A-Za-z ]+)(?P<series_index>[0-9]*)\]\s+-\s+(?P<title>[A-Za-z ]+)

Code: Select all

EnableExplicit

Define Text.s  = "Abbey, Lynn - [Thieves World New Series 01] - Turning Points (v1.0) (html).rar"
Define Regex.s = "(?P<author>[^_-]+)\s+-\s+\[(?P<series>[A-Za-z ]+)(?P<series_index>[0-9]*)\]\s+-\s+(?P<title>[A-Za-z ]+)"

If Not CreateRegularExpression(0, Regex)
     Debug "Regex KO"
     End
EndIf

ExamineRegularExpression(0, Text)

While NextRegularExpressionMatch(0)
     Debug "Author  : " + RegularExpressionNamedGroup(0, "author")
     Debug "Series  : " + RegularExpressionNamedGroup(0, "series")
     Debug "Index   : " + RegularExpressionNamedGroup(0, "series_index")
     Debug "Title   : " + RegularExpressionNamedGroup(0, "title")
     
Wend 

FreeRegularExpression(0)

End

Code: Select all

Author  : Abbey, Lynn
Series  : Thieves World New Series 
Index   : 01
Title   : Turning Points 
Hope this helps
:wink:

Re: RegExp help needed

Posted: Mon Aug 07, 2017 11:37 am
by Zebuddi123
Hi Fangbeast I would just change Marc56us`s regex quantifier + with * (zero or more) to capture any mistakes in the spacing of the " - " serperators ie:

(?P<author>[^_-]+)\s*-\s*\[(?P<series>[A-Za-z ]+)(?P<series_index>[0-9]*)\]\s*-\s*(?P<title>[A-Za-z ]+)

zebuddi. :)

Re: RegExp help needed

Posted: Mon Aug 07, 2017 1:21 pm
by Fangbeast
Hope this helps
It helps a hell of a lot. I may still have to fry my brains and crumb it as it doesn't understand what you've done but it helps.

Saves me months and months of retyping all details into my Calibre book library when your regexp can read the details from the properly formatted filenames.

I organise all my billls and receipts the same way. Very organised household:):)

Re: RegExp help needed

Posted: Mon Aug 07, 2017 1:22 pm
by Fangbeast
Zebuddi123 wrote:Hi Fangbeast I would just change Marc56us`s regex quantifier + with * (zero or more) to capture any mistakes in the spacing of the " - " serperators ie:

(?P<author>[^_-]+)\s*-\s*\[(?P<series>[A-Za-z ]+)(?P<series_index>[0-9]*)\]\s*-\s*(?P<title>[A-Za-z ]+)

zebuddi. :)
I don't understand it but I'll try it. That's how I code anyway (hehehehe)