Topic: Reading a given type of file / trial and error?

Everything else that doesn't fall into one of the other PB categories.
Zach
Addict
Addict
Posts: 1678
Joined: Sun Dec 12, 2010 12:36 am
Location: Somewhere in the midwest
Contact:

Topic: Reading a given type of file / trial and error?

Post by Zach »

Hi,

I have another question, in what I am sure will continue to be a long line of absolutely n00b questions... But I'm not a Pro, so forgive me!

I am a little curious about File formats. There are a LOT of files out there these days, and a lot of seemingly different "formats" of files (*.EXT).
I have always wondered how people "come up with" a format to store their data, and why there are so many file extensions.

Of course there are the easy cases where someone has just renamed a text file or some such.. But it seems to me like we only really have a handful of ways we can actually write data to a file? (Text / Binary / and?????).

I have always been interested in the idea of opening up files and peeking at what is inside... usually its just a bunch of ugly looking symbols or "null" symbols, etc. Or I get lucky and come across a renamed text file..

But what makes the true "file formats" like say a JPEG file, or a Photoshop .PSD file, or a 3D Model from a modelling app, or a Microsoft Word document; what makes them so different from each other?? How do people "reverse engineer" file readers for a format they really want to crack open so they can look at the data?

This interest mainly comes out of all the older utilities from the early days I would play with, which were specifically made to open game resource files, or open individual files and show you the image inside it, or play the sound, or whatever.

Do you need some really learned skills to build your own file reader? From my perspective it would probably be hard enough for me even if I were trying to build a reader for an established file type with an Open Standard that is published.. I can't imagine how people can do it for proprietary or non-published formats; or how the process even starts in either case.

I really like NES and other console music for instance, and people have utilities that will aid in extracting the song data from the actual ROMS; but even doing that with the tools made for you, they say a background in ASM (or 6502 ASM in particular for the NES) and stuff like that is strongly recommended, or even required.


I just find this whole subject very interesting.. I have so many ideas for tutorials that experienced PB coders could be writing, its not funny (and yes building your own file reader/parser is one of them) :mrgreen:
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: Topic: Reading a given type of file / trial and error?

Post by IdeasVacuum »

First port of call is the published format description/definition, and many are freely available on the internet. If the format is not published, then it is likely that the authors of the file type do not want you to read their files without using their software, so you could be waltzing into trouble (big company = big trouble).

However, if the format definition is not in the public domain, the thing to do first is ask the authors! I needed to read/write a propriety format just recently, it was not published (and isn't going to be) but the owners have given me the details anyway (with reasonable caveats).

Working out the format of a file is easier if you have the means to write to the format. For example, if it's a vector graphics file format, you could save a file that consisted only of a single entity, say a line of specific length, and then you could read that file in a text/hex editor and work out how your line is described. Tedious, but not that difficult once you have established some basics.
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re: Topic: Reading a given type of file / trial and error?

Post by MachineCode »

IdeasVacuum wrote:If the format is not published, then it is likely that the authors of the file type do not want you to read their files without using their software, so you could be waltzing into trouble (big company = big trouble).
Nope. Anyone can legally study a file format and then make use of it once they work it out. This is assuming they don't reverse engineer the software used to create the file, but rather just study the file itself through trial and error.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: Topic: Reading a given type of file / trial and error?

Post by IdeasVacuum »

I think you are right in most cases MachineCode, but the laws do vary in different countries and an important aspect seems to be how much work/technology has been put into trying to prevent Reverse Engineering. In some circumstances it could be fair that a company should try to defend their files, especially if the format amounts to an advantage over competitors, or it is specifically designed to protect their customer's sensitive data. However, even if it is not fair, my point about big companies holds true - they can ruin smaller companies financially with legal action that the smaller fry, especially one-man-band developers, simply cannot afford to combat.

Some interesting links on the subject of reverse engineering file formats:

http://en.wikibooks.org/wiki/Reverse_En ... le_Formats
http://en.wikipedia.org/wiki/File_format

A list that includes links to published formats
http://en.wikipedia.org/wiki/List_of_file_formats

Create a list of web sources for a specific format:
http://file-extension.net/seeker/
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
Zach
Addict
Addict
Posts: 1678
Joined: Sun Dec 12, 2010 12:36 am
Location: Somewhere in the midwest
Contact:

Re: Topic: Reading a given type of file / trial and error?

Post by Zach »

Not that I don't appreciate a good debate on ethics and protectionism in file formats, but I was trying to approach the topic from a more mechanical aspect versus philosophical...

"Not SHOULD I, but HOW do I?" (where do you start, good reading, whatever, etc...)
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: Topic: Reading a given type of file / trial and error?

Post by IdeasVacuum »

Well, I have delivered some links for the how-to Zach. It's always important to qualify anything that might infringe laws. I'm sure the majority of PB Developers really value this forum and PB itself - if the forum gives the impression that it supports illegal activity, then it could be shut-down. I think that would be a disaster.
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
User avatar
Danilo
Addict
Addict
Posts: 3036
Joined: Sat Apr 26, 2003 8:26 am
Location: Planet Earth

Re: Topic: Reading a given type of file / trial and error?

Post by Danilo »

You can find informations for many existing file formats @ http://wotsit.org/
Trond
Always Here
Always Here
Posts: 7446
Joined: Mon Sep 22, 2003 6:45 pm
Location: Norway

Re: Topic: Reading a given type of file / trial and error?

Post by Trond »

But it seems to me like we only really have a handful of ways we can actually write data to a file? (Text / Binary / and?????).
Text is just a subset of binary. As the OS is concerned, the file is just a number of bytes. A file can't be less than one byte. A plain text file is a file where all bytes have a value between 30 and 255, or is 9, 10 or 13.
Post Reply