I've read the regexp tutorials but they do my head in and I've only gotten so far so I thought I'd yell for help.
I have a large(ish) collection of books stored on my hard drive and I want to import them into Calibre using their RegExp engine so I don't have to retype everything.
Given a filename formatting like this:
Abbey, Lynn - [Thieves World New Series 01] - Turning Points (v1.0) (html).rar
Using this expression:
(?P<author>.+) - (?P<series>.+) - (?P<title>.+)
I correctly get Author, Series and title into the program in the right fields.
However, I need to remove the square ([]) brackets from the series name and extract the series index into the correct calibre field from the series data and strip all the crap past the title.
They gave this as an example:
(?P<author>[^_-]+) -?\s*(?P<series>[^_0-9-]*)(?P<series_index>[0-9]*)\s*-\s*(?P<title>[^_].+) ?
Should I cut my wrists right now? (hopeful look)
RegExp help needed
- Fangbeast
- PureBasic Protozoa
- Posts: 4747
- Joined: Fri Apr 25, 2003 3:08 pm
- Location: Not Sydney!!! (Bad water, no goats)
RegExp help needed
Amateur Radio, D-STAR/VK3HAF
Re: RegExp help needed
An approach (not complete)
Hope this helps
Code: Select all
(?P<author>[^_-]+)\s+-\s+\[(?P<series>[A-Za-z ]+)(?P<series_index>[0-9]*)\]\s+-\s+(?P<title>[A-Za-z ]+)
Code: Select all
EnableExplicit
Define Text.s = "Abbey, Lynn - [Thieves World New Series 01] - Turning Points (v1.0) (html).rar"
Define Regex.s = "(?P<author>[^_-]+)\s+-\s+\[(?P<series>[A-Za-z ]+)(?P<series_index>[0-9]*)\]\s+-\s+(?P<title>[A-Za-z ]+)"
If Not CreateRegularExpression(0, Regex)
Debug "Regex KO"
End
EndIf
ExamineRegularExpression(0, Text)
While NextRegularExpressionMatch(0)
Debug "Author : " + RegularExpressionNamedGroup(0, "author")
Debug "Series : " + RegularExpressionNamedGroup(0, "series")
Debug "Index : " + RegularExpressionNamedGroup(0, "series_index")
Debug "Title : " + RegularExpressionNamedGroup(0, "title")
Wend
FreeRegularExpression(0)
End
Code: Select all
Author : Abbey, Lynn
Series : Thieves World New Series
Index : 01
Title : Turning Points
- Zebuddi123
- Enthusiast
- Posts: 794
- Joined: Wed Feb 01, 2012 3:30 pm
- Location: Nottinghamshire UK
- Contact:
Re: RegExp help needed
Hi Fangbeast I would just change Marc56us`s regex quantifier + with * (zero or more) to capture any mistakes in the spacing of the " - " serperators ie:
(?P<author>[^_-]+)\s*-\s*\[(?P<series>[A-Za-z ]+)(?P<series_index>[0-9]*)\]\s*-\s*(?P<title>[A-Za-z ]+)
zebuddi.
(?P<author>[^_-]+)\s*-\s*\[(?P<series>[A-Za-z ]+)(?P<series_index>[0-9]*)\]\s*-\s*(?P<title>[A-Za-z ]+)
zebuddi.
malleo, caput, bang. Ego, comprehendunt in tempore
- Fangbeast
- PureBasic Protozoa
- Posts: 4747
- Joined: Fri Apr 25, 2003 3:08 pm
- Location: Not Sydney!!! (Bad water, no goats)
Re: RegExp help needed
It helps a hell of a lot. I may still have to fry my brains and crumb it as it doesn't understand what you've done but it helps.Hope this helps
Saves me months and months of retyping all details into my Calibre book library when your regexp can read the details from the properly formatted filenames.
I organise all my billls and receipts the same way. Very organised household:):)
Amateur Radio, D-STAR/VK3HAF
- Fangbeast
- PureBasic Protozoa
- Posts: 4747
- Joined: Fri Apr 25, 2003 3:08 pm
- Location: Not Sydney!!! (Bad water, no goats)
Re: RegExp help needed
I don't understand it but I'll try it. That's how I code anyway (hehehehe)Zebuddi123 wrote:Hi Fangbeast I would just change Marc56us`s regex quantifier + with * (zero or more) to capture any mistakes in the spacing of the " - " serperators ie:
(?P<author>[^_-]+)\s*-\s*\[(?P<series>[A-Za-z ]+)(?P<series_index>[0-9]*)\]\s*-\s*(?P<title>[A-Za-z ]+)
zebuddi.
Amateur Radio, D-STAR/VK3HAF