RegExp help needed

Everything else that doesn't fall into one of the other PB categories.
User avatar
Fangbeast
PureBasic Protozoa
PureBasic Protozoa
Posts: 4747
Joined: Fri Apr 25, 2003 3:08 pm
Location: Not Sydney!!! (Bad water, no goats)

RegExp help needed

Post by Fangbeast »

I've read the regexp tutorials but they do my head in and I've only gotten so far so I thought I'd yell for help.

I have a large(ish) collection of books stored on my hard drive and I want to import them into Calibre using their RegExp engine so I don't have to retype everything.

Given a filename formatting like this:

Abbey, Lynn - [Thieves World New Series 01] - Turning Points (v1.0) (html).rar

Using this expression:

(?P<author>.+) - (?P<series>.+) - (?P<title>.+)

I correctly get Author, Series and title into the program in the right fields.

However, I need to remove the square ([]) brackets from the series name and extract the series index into the correct calibre field from the series data and strip all the crap past the title.

They gave this as an example:

(?P<author>[^_-]+) -?\s*(?P<series>[^_0-9-]*)(?P<series_index>[0-9]*)\s*-\s*(?P<title>[^_].+) ?

Should I cut my wrists right now? (hopeful look)
Amateur Radio, D-STAR/VK3HAF
Marc56us
Addict
Addict
Posts: 1477
Joined: Sat Feb 08, 2014 3:26 pm

Re: RegExp help needed

Post by Marc56us »

An approach (not complete)

Code: Select all

(?P<author>[^_-]+)\s+-\s+\[(?P<series>[A-Za-z ]+)(?P<series_index>[0-9]*)\]\s+-\s+(?P<title>[A-Za-z ]+)

Code: Select all

EnableExplicit

Define Text.s  = "Abbey, Lynn - [Thieves World New Series 01] - Turning Points (v1.0) (html).rar"
Define Regex.s = "(?P<author>[^_-]+)\s+-\s+\[(?P<series>[A-Za-z ]+)(?P<series_index>[0-9]*)\]\s+-\s+(?P<title>[A-Za-z ]+)"

If Not CreateRegularExpression(0, Regex)
     Debug "Regex KO"
     End
EndIf

ExamineRegularExpression(0, Text)

While NextRegularExpressionMatch(0)
     Debug "Author  : " + RegularExpressionNamedGroup(0, "author")
     Debug "Series  : " + RegularExpressionNamedGroup(0, "series")
     Debug "Index   : " + RegularExpressionNamedGroup(0, "series_index")
     Debug "Title   : " + RegularExpressionNamedGroup(0, "title")
     
Wend 

FreeRegularExpression(0)

End

Code: Select all

Author  : Abbey, Lynn
Series  : Thieves World New Series 
Index   : 01
Title   : Turning Points 
Hope this helps
:wink:
User avatar
Zebuddi123
Enthusiast
Enthusiast
Posts: 794
Joined: Wed Feb 01, 2012 3:30 pm
Location: Nottinghamshire UK
Contact:

Re: RegExp help needed

Post by Zebuddi123 »

Hi Fangbeast I would just change Marc56us`s regex quantifier + with * (zero or more) to capture any mistakes in the spacing of the " - " serperators ie:

(?P<author>[^_-]+)\s*-\s*\[(?P<series>[A-Za-z ]+)(?P<series_index>[0-9]*)\]\s*-\s*(?P<title>[A-Za-z ]+)

zebuddi. :)
malleo, caput, bang. Ego, comprehendunt in tempore
User avatar
Fangbeast
PureBasic Protozoa
PureBasic Protozoa
Posts: 4747
Joined: Fri Apr 25, 2003 3:08 pm
Location: Not Sydney!!! (Bad water, no goats)

Re: RegExp help needed

Post by Fangbeast »

Hope this helps
It helps a hell of a lot. I may still have to fry my brains and crumb it as it doesn't understand what you've done but it helps.

Saves me months and months of retyping all details into my Calibre book library when your regexp can read the details from the properly formatted filenames.

I organise all my billls and receipts the same way. Very organised household:):)
Amateur Radio, D-STAR/VK3HAF
User avatar
Fangbeast
PureBasic Protozoa
PureBasic Protozoa
Posts: 4747
Joined: Fri Apr 25, 2003 3:08 pm
Location: Not Sydney!!! (Bad water, no goats)

Re: RegExp help needed

Post by Fangbeast »

Zebuddi123 wrote:Hi Fangbeast I would just change Marc56us`s regex quantifier + with * (zero or more) to capture any mistakes in the spacing of the " - " serperators ie:

(?P<author>[^_-]+)\s*-\s*\[(?P<series>[A-Za-z ]+)(?P<series_index>[0-9]*)\]\s*-\s*(?P<title>[A-Za-z ]+)

zebuddi. :)
I don't understand it but I'll try it. That's how I code anyway (hehehehe)
Amateur Radio, D-STAR/VK3HAF
Post Reply