Detect file type by content?

Everything else that doesn't fall into one of the other PB categories.
Little John
Addict
Addict
Posts: 4781
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Detect file type by content?

Post by Little John »

Hi,

I have “inherited” an external hard disk with several files that have no extension in their name. I assume that these are mainly files which were created with Word, Excel or PowerPoint, as well as raster graphics. Do these files have a header or something that allows them to be clearly identified programmatically?
User avatar
StarBootics
Addict
Addict
Posts: 1006
Joined: Sun Jul 07, 2013 11:35 am
Location: Canada

Re: Detect file type by content?

Post by StarBootics »

Maybe they have a magic number at the beginning you can try to read.

Beside that I don't know what else can be used to identify the files.

Best regards
StarBootics
The Stone Age did not end due to a shortage of stones !
User avatar
jacdelad
Addict
Addict
Posts: 2001
Joined: Wed Feb 03, 2021 12:46 pm
Location: Riesa

Re: Detect file type by content?

Post by jacdelad »

TrID: https://mark0.net/soft-trid-e.html
...if you don't necessarily want to program something yourself.
Good morning, that's a nice tnetennba!

PureBasic 6.21/Windows 11 x64/Ryzen 7900X/32GB RAM/3TB SSD
Synology DS1821+/DX517, 130.9TB+50.8TB+2TB SSD
User avatar
ChrisR
Addict
Addict
Posts: 1466
Joined: Sun Jan 08, 2017 10:27 pm
Location: France

Re: Detect file type by content?

Post by ChrisR »

TrID - File Identifier looks like a very interesting tool :)
User avatar
jacdelad
Addict
Addict
Posts: 2001
Joined: Wed Feb 03, 2021 12:46 pm
Location: Riesa

Re: Detect file type by content?

Post by jacdelad »

I've used it several times. It's a bit fiddly, but since we're programmers and its output should be automatically readable it should be possible to create some kind of gui easily.
Good morning, that's a nice tnetennba!

PureBasic 6.21/Windows 11 x64/Ryzen 7900X/32GB RAM/3TB SSD
Synology DS1821+/DX517, 130.9TB+50.8TB+2TB SSD
AZJIO
Addict
Addict
Posts: 2166
Joined: Sun May 14, 2017 1:48 am

Re: Detect file type by content?

Post by AZJIO »

Little John wrote: Sat Sep 21, 2024 7:33 am Word, Excel or PowerPoint
This is a zip archive. I previously wrote a program to replace text in Word files, for this I unpacked the file using the zip module, and after the replacement I packed it back and it worked, the files opened.
Little John
Addict
Addict
Posts: 4781
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Detect file type by content?

Post by Little John »

jacdelad wrote: Sat Sep 21, 2024 9:37 am TrID: https://mark0.net/soft-trid-e.html
...if you don't necessarily want to program something yourself.
That looks very promising. I'll see how far I get with it next week in the office. Many thanks for the tip!
Many thanks also to StarBootics for the link to the magic numbers!
User avatar
NicTheQuick
Addict
Addict
Posts: 1514
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: Detect file type by content?

Post by NicTheQuick »

Under Linux you can simply use the "file" tool to recognize any file. Linux does it that way anyway.
Or you also can try "binwalk" which can also find other files within a binar data blob. This is especially interesting when you want to reverse engineer file formats or firmware files and stuff like that.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
Little John
Addict
Addict
Posts: 4781
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Detect file type by content?

Post by Little John »

NicTheQuick wrote: Sat Sep 21, 2024 8:25 pm Under Linux you can simply use the "file" tool to recognize any file. Linux does it that way anyway.
Or you also can try "binwalk" which can also find other files within a binar data blob. This is especially interesting when you want to reverse engineer file formats or firmware files and stuff like that.
This is valuable information, thank you. It seems that I am too focused on Windows. :x
DarkDragon
Addict
Addict
Posts: 2345
Joined: Mon Jun 02, 2003 9:16 am
Location: Germany
Contact:

Re: Detect file type by content?

Post by DarkDragon »

Little John wrote: Sat Sep 21, 2024 10:43 pm
NicTheQuick wrote: Sat Sep 21, 2024 8:25 pm Under Linux you can simply use the "file" tool to recognize any file. Linux does it that way anyway.
Or you also can try "binwalk" which can also find other files within a binar data blob. This is especially interesting when you want to reverse engineer file formats or firmware files and stuff like that.
This is valuable information, thank you. It seems that I am too focused on Windows. :x
I use file on windows, too, if I have to. Usually within a cygwin, but nowadays you could try WSL/WSL2. There's also MinGW: https://sourceforge.net/projects/mingw/ ... le-5.04-1/
bye,
Daniel
Little John
Addict
Addict
Posts: 4781
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Detect file type by content?

Post by Little John »

DarkDragon wrote: Sun Sep 22, 2024 7:44 pm I use file on windows, too, if I have to. Usually within a cygwin, but nowadays you could try WSL/WSL2. There's also MinGW: https://sourceforge.net/projects/mingw/ ... le-5.04-1/
I wanted to try WSL anyway, but had almost forgotten about it. Thanks for reminding me! :-)
Quin
Addict
Addict
Posts: 1132
Joined: Thu Mar 31, 2022 7:03 pm
Location: Colorado, United States
Contact:

Re: Detect file type by content?

Post by Quin »

Little John wrote: Mon Sep 23, 2024 6:26 pm
DarkDragon wrote: Sun Sep 22, 2024 7:44 pm I use file on windows, too, if I have to. Usually within a cygwin, but nowadays you could try WSL/WSL2. There's also MinGW: https://sourceforge.net/projects/mingw/ ... le-5.04-1/
I wanted to try WSL anyway, but had almost forgotten about it. Thanks for reminding me! :-)
Believe it or not, all my browsing of the PB forums is done within WSL, using edbrowse.
I have to disable Javascript or the ads make it hang for like 30 seconds on every page load, but that's expected for edbrowse.
Post Reply