Hashing Optical Media for Unique IDs ???
Posted: Sat Jun 08, 2013 11:28 pm
Hi all.
I am wondering about simple, efficient, and accurate methods to create a unique ID for a DVD or BluRay optical disc. However I am not sure where to begin, as this is not something I have attempted before. I also have a few requirements for how I'd like it to work..
1: Must be able to tell the difference between a DVD and a BluRay disc. Cannot be hardware/drive based because of more complex cases like Combo DVD/BR reading/burning drives.
2: Needs to be able to uniquely ID unique movie titles (Batman vs Star Wars), but also not confuse two different discs of the SAME title (I put in my Batman blu-ray and someone else puts in their Batman Blu-ray - but they are the same movie).
3: Possibly need a way to identify that a file was not generated by a Disc - to help prevent duplicate entries and also erroneous entries by troublemakers (you NEED to own the disc to submit)
#3 I can probably work out on my own.. But I am looking for help/guidance on #1 and #2. Is there any part of commercial stamped DVD & BluRay discs that will guarantee I can correctly identify two copies of the same movie, but differentiate from a later release such as a "Special Edition" "Director's Cut" etc ??
As for identifying them in the DB and keeping entries unique, I was thinking of a simple MD5 of the unique part of the disc (Disc Title?) + an MD5 of the name the user entered for the title?
If anyone can offer ideas, or guidance it would be greatly appreciated.
What prompted me to think about starting this project
A while ago I set myself up an HTPC, and as I have an on/off relationship with video encoding, and the Doom9 forums, one thing that has struck me is that there is no reliable source for DVD and BluRay chapters on the Internet. With the advent of containers such as MKV (mp4 too?) that support Chapters containing both the traditional time stops, as well as text names, this is a hugely neglected field.
There is one project out there, called "Chapter Extractor" which can both pull chapter timings from a disc, and also reference Internet databases for timing/name information. However it is done in "free time" like most projects these days, and is IMHO woefully lacking; although I do use it when possible. It is lacking in a lot of areas, namely having a robust system to identify accurate records, and having a simple way to upload / update previously existing records. It also has weird unidentifiable rules for what it considers a valid name vs invalid (for the record itself).
For instance - I painstakingly took the time to copy chapter names from my Princess Bride: 25th Anniversary Edition, BluRay. I saved my local chapters file and it uploaded it. Then I realized I had a typo in one of the chapter names.. So I popped my disc back in and opened it, the program found and matched my entry for my disc in the database, and I made the typo correct, then re-saved my file.. Saving triggers an upload, yet it appears I can't even update/correct record entries that came from MY account/API key!!
So.. I'm thinking about making my own database. It'll be local to start with, and once if I get far enough to implement a better / more functional user experience, then I'll start thinking about maintaining an online database, or something like that...
I am wondering about simple, efficient, and accurate methods to create a unique ID for a DVD or BluRay optical disc. However I am not sure where to begin, as this is not something I have attempted before. I also have a few requirements for how I'd like it to work..
1: Must be able to tell the difference between a DVD and a BluRay disc. Cannot be hardware/drive based because of more complex cases like Combo DVD/BR reading/burning drives.
2: Needs to be able to uniquely ID unique movie titles (Batman vs Star Wars), but also not confuse two different discs of the SAME title (I put in my Batman blu-ray and someone else puts in their Batman Blu-ray - but they are the same movie).
3: Possibly need a way to identify that a file was not generated by a Disc - to help prevent duplicate entries and also erroneous entries by troublemakers (you NEED to own the disc to submit)
#3 I can probably work out on my own.. But I am looking for help/guidance on #1 and #2. Is there any part of commercial stamped DVD & BluRay discs that will guarantee I can correctly identify two copies of the same movie, but differentiate from a later release such as a "Special Edition" "Director's Cut" etc ??
As for identifying them in the DB and keeping entries unique, I was thinking of a simple MD5 of the unique part of the disc (Disc Title?) + an MD5 of the name the user entered for the title?
If anyone can offer ideas, or guidance it would be greatly appreciated.
What prompted me to think about starting this project
A while ago I set myself up an HTPC, and as I have an on/off relationship with video encoding, and the Doom9 forums, one thing that has struck me is that there is no reliable source for DVD and BluRay chapters on the Internet. With the advent of containers such as MKV (mp4 too?) that support Chapters containing both the traditional time stops, as well as text names, this is a hugely neglected field.
There is one project out there, called "Chapter Extractor" which can both pull chapter timings from a disc, and also reference Internet databases for timing/name information. However it is done in "free time" like most projects these days, and is IMHO woefully lacking; although I do use it when possible. It is lacking in a lot of areas, namely having a robust system to identify accurate records, and having a simple way to upload / update previously existing records. It also has weird unidentifiable rules for what it considers a valid name vs invalid (for the record itself).
For instance - I painstakingly took the time to copy chapter names from my Princess Bride: 25th Anniversary Edition, BluRay. I saved my local chapters file and it uploaded it. Then I realized I had a typo in one of the chapter names.. So I popped my disc back in and opened it, the program found and matched my entry for my disc in the database, and I made the typo correct, then re-saved my file.. Saving triggers an upload, yet it appears I can't even update/correct record entries that came from MY account/API key!!
So.. I'm thinking about making my own database. It'll be local to start with, and once if I get far enough to implement a better / more functional user experience, then I'll start thinking about maintaining an online database, or something like that...