Hi, I develop intelligent file format identification software and search any codes for reading structures of files for better intelligence. It works on more levels:
1. identify file type by known signature ( like MZ on executable, BP on bitmap, etc.. )
2. for known formats like executable, bitmap, load their structures and checking if it's correct
3. for text files search for defined strings and identify their encoding and content ( like XHTML 1.0 by the text - DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" )
4. for archive read list of files stored in archive (yet I had the codes for zip, rar and search/programm for more types )
5. ....
If you have some usefull code then please post it here.
File Format Identification
Re: File Format Identification
file format information: http://www.wotsit.org/
Checking the structure of a file for errors is a huge project itself for some file formats like Portable Executable (.exe)
Checking the structure of a file for errors is a huge project itself for some file formats like Portable Executable (.exe)
Re: File Format Identification
Hi,
I know wotsit.org. My software is not determined for checking errors in files. I use to loading structures for checking content of files. If you use only signature checking, then detection can be mystified. If you detect executable file by two characters MZ then text file is started with MZ can be identified as executable ( It's only example ). But if you load structure IMAGE_DOS_HEADER and check their values, then you confirm file as executable file.
Peter
I know wotsit.org. My software is not determined for checking errors in files. I use to loading structures for checking content of files. If you use only signature checking, then detection can be mystified. If you detect executable file by two characters MZ then text file is started with MZ can be identified as executable ( It's only example ). But if you load structure IMAGE_DOS_HEADER and check their values, then you confirm file as executable file.
Peter
Re: File Format Identification
Ah ok, i misunderstood you.peterb wrote: I know wotsit.org. My software is not determined for checking errors in files. I use to loading structures for checking content of files. If you use only signature checking, then detection can be mystified. If you detect executable file by two characters MZ then text file is started with MZ can be identified as executable ( It's only example ). But if you load structure IMAGE_DOS_HEADER and check their values, then you confirm file as executable file.
Checking the values might be problematic. You need a deep understanding of the file format and it's content, if you check for possible values. I see a big potential for false negativs there.
For identifing a text file: Thats just impossible. A plain text file isnt a file format. It's just a form of viewing a file. Every file can be viewed as a text file.
Re: File Format Identification
And false positives,too.Thorium wrote:I see a big potential for false negativs there.

Re: File Format Identification
svg, html, xml, .... is text file with defined format and you can determine which format is it.
I didn't write about PLAIN text
I didn't write about PLAIN text

Re: File Format Identification
Hi peterb,
Here are some examples of determining some graphics file formats (in Blitz, easily portable to PB of course)...
http://www.blitzbasic.com/codearcs/code ... ?code=2477
(Search for "File format tests" on that page to find the right part -- GotBMP, GotGIF, GotJPEG, etc.)
Here's some code that reads the Gimp's .xcf file format:
http://www.blitzbasic.com/codearcs/code ... ?code=2585
(Search for "Main function code" to find the read section.)
Finally, here's some Win32 executable reading code, though it sounds like you have this covered...
http://www.blitzbasic.com/codearcs/code ... ?code=2628
(I had to do some hacking with the structures as Blitz unfortunately doesn't have the option to use plain structs like PB, but it'll be easy to see how to use it from PB. As you'll see in the comments, I was helped by code from 'thefool' on this forum originally!)
Here are some examples of determining some graphics file formats (in Blitz, easily portable to PB of course)...
http://www.blitzbasic.com/codearcs/code ... ?code=2477
(Search for "File format tests" on that page to find the right part -- GotBMP, GotGIF, GotJPEG, etc.)
Here's some code that reads the Gimp's .xcf file format:
http://www.blitzbasic.com/codearcs/code ... ?code=2585
(Search for "Main function code" to find the read section.)
Finally, here's some Win32 executable reading code, though it sounds like you have this covered...
http://www.blitzbasic.com/codearcs/code ... ?code=2628
(I had to do some hacking with the structures as Blitz unfortunately doesn't have the option to use plain structs like PB, but it'll be easy to see how to use it from PB. As you'll see in the comments, I was helped by code from 'thefool' on this forum originally!)