Detect file type by content?
-
- Addict
- Posts: 4781
- Joined: Thu Jun 07, 2007 3:25 pm
- Location: Berlin, Germany
Detect file type by content?
Hi,
I have “inherited” an external hard disk with several files that have no extension in their name. I assume that these are mainly files which were created with Word, Excel or PowerPoint, as well as raster graphics. Do these files have a header or something that allows them to be clearly identified programmatically?
I have “inherited” an external hard disk with several files that have no extension in their name. I assume that these are mainly files which were created with Word, Excel or PowerPoint, as well as raster graphics. Do these files have a header or something that allows them to be clearly identified programmatically?
- StarBootics
- Addict
- Posts: 1006
- Joined: Sun Jul 07, 2013 11:35 am
- Location: Canada
Re: Detect file type by content?
Maybe they have a magic number at the beginning you can try to read.
Beside that I don't know what else can be used to identify the files.
Best regards
StarBootics
Beside that I don't know what else can be used to identify the files.
Best regards
StarBootics
The Stone Age did not end due to a shortage of stones !
Re: Detect file type by content?
TrID: https://mark0.net/soft-trid-e.html
...if you don't necessarily want to program something yourself.
...if you don't necessarily want to program something yourself.
Good morning, that's a nice tnetennba!
PureBasic 6.21/Windows 11 x64/Ryzen 7900X/32GB RAM/3TB SSD
Synology DS1821+/DX517, 130.9TB+50.8TB+2TB SSD
PureBasic 6.21/Windows 11 x64/Ryzen 7900X/32GB RAM/3TB SSD
Synology DS1821+/DX517, 130.9TB+50.8TB+2TB SSD
Re: Detect file type by content?
TrID - File Identifier looks like a very interesting tool 

Re: Detect file type by content?
I've used it several times. It's a bit fiddly, but since we're programmers and its output should be automatically readable it should be possible to create some kind of gui easily.
Good morning, that's a nice tnetennba!
PureBasic 6.21/Windows 11 x64/Ryzen 7900X/32GB RAM/3TB SSD
Synology DS1821+/DX517, 130.9TB+50.8TB+2TB SSD
PureBasic 6.21/Windows 11 x64/Ryzen 7900X/32GB RAM/3TB SSD
Synology DS1821+/DX517, 130.9TB+50.8TB+2TB SSD
Re: Detect file type by content?
This is a zip archive. I previously wrote a program to replace text in Word files, for this I unpacked the file using the zip module, and after the replacement I packed it back and it worked, the files opened.
-
- Addict
- Posts: 4781
- Joined: Thu Jun 07, 2007 3:25 pm
- Location: Berlin, Germany
Re: Detect file type by content?
That looks very promising. I'll see how far I get with it next week in the office. Many thanks for the tip!jacdelad wrote: Sat Sep 21, 2024 9:37 am TrID: https://mark0.net/soft-trid-e.html
...if you don't necessarily want to program something yourself.
Many thanks also to StarBootics for the link to the magic numbers!
- NicTheQuick
- Addict
- Posts: 1514
- Joined: Sun Jun 22, 2003 7:43 pm
- Location: Germany, Saarbrücken
- Contact:
Re: Detect file type by content?
Under Linux you can simply use the "file" tool to recognize any file. Linux does it that way anyway.
Or you also can try "binwalk" which can also find other files within a binar data blob. This is especially interesting when you want to reverse engineer file formats or firmware files and stuff like that.
Or you also can try "binwalk" which can also find other files within a binar data blob. This is especially interesting when you want to reverse engineer file formats or firmware files and stuff like that.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
-
- Addict
- Posts: 4781
- Joined: Thu Jun 07, 2007 3:25 pm
- Location: Berlin, Germany
Re: Detect file type by content?
This is valuable information, thank you. It seems that I am too focused on Windows.NicTheQuick wrote: Sat Sep 21, 2024 8:25 pm Under Linux you can simply use the "file" tool to recognize any file. Linux does it that way anyway.
Or you also can try "binwalk" which can also find other files within a binar data blob. This is especially interesting when you want to reverse engineer file formats or firmware files and stuff like that.

-
- Addict
- Posts: 2345
- Joined: Mon Jun 02, 2003 9:16 am
- Location: Germany
- Contact:
Re: Detect file type by content?
I use file on windows, too, if I have to. Usually within a cygwin, but nowadays you could try WSL/WSL2. There's also MinGW: https://sourceforge.net/projects/mingw/ ... le-5.04-1/Little John wrote: Sat Sep 21, 2024 10:43 pmThis is valuable information, thank you. It seems that I am too focused on Windows.NicTheQuick wrote: Sat Sep 21, 2024 8:25 pm Under Linux you can simply use the "file" tool to recognize any file. Linux does it that way anyway.
Or you also can try "binwalk" which can also find other files within a binar data blob. This is especially interesting when you want to reverse engineer file formats or firmware files and stuff like that.![]()
bye,
Daniel
Daniel
-
- Addict
- Posts: 4781
- Joined: Thu Jun 07, 2007 3:25 pm
- Location: Berlin, Germany
Re: Detect file type by content?
I wanted to try WSL anyway, but had almost forgotten about it. Thanks for reminding me!DarkDragon wrote: Sun Sep 22, 2024 7:44 pm I use file on windows, too, if I have to. Usually within a cygwin, but nowadays you could try WSL/WSL2. There's also MinGW: https://sourceforge.net/projects/mingw/ ... le-5.04-1/

Re: Detect file type by content?
Believe it or not, all my browsing of the PB forums is done within WSL, using edbrowse.Little John wrote: Mon Sep 23, 2024 6:26 pmI wanted to try WSL anyway, but had almost forgotten about it. Thanks for reminding me!DarkDragon wrote: Sun Sep 22, 2024 7:44 pm I use file on windows, too, if I have to. Usually within a cygwin, but nowadays you could try WSL/WSL2. There's also MinGW: https://sourceforge.net/projects/mingw/ ... le-5.04-1/![]()
I have to disable Javascript or the ads make it hang for like 30 seconds on every page load, but that's expected for edbrowse.