Page 1 of 1
Detect file type by content?
Posted: Sat Sep 21, 2024 7:33 am
by Little John
Hi,
I have “inherited” an external hard disk with several files that have no extension in their name. I assume that these are mainly files which were created with Word, Excel or PowerPoint, as well as raster graphics. Do these files have a header or something that allows them to be clearly identified programmatically?
Re: Detect file type by content?
Posted: Sat Sep 21, 2024 8:52 am
by StarBootics
Maybe they have a
magic number at the beginning you can try to read.
Beside that I don't know what else can be used to identify the files.
Best regards
StarBootics
Re: Detect file type by content?
Posted: Sat Sep 21, 2024 9:37 am
by jacdelad
TrID:
https://mark0.net/soft-trid-e.html
...if you don't necessarily want to program something yourself.
Re: Detect file type by content?
Posted: Sat Sep 21, 2024 1:00 pm
by ChrisR
TrID - File Identifier looks like a very interesting tool

Re: Detect file type by content?
Posted: Sat Sep 21, 2024 1:21 pm
by jacdelad
I've used it several times. It's a bit fiddly, but since we're programmers and its output should be automatically readable it should be possible to create some kind of gui easily.
Re: Detect file type by content?
Posted: Sat Sep 21, 2024 3:55 pm
by AZJIO
Little John wrote: Sat Sep 21, 2024 7:33 am
Word, Excel or PowerPoint
This is a zip archive. I previously wrote a
program to replace text in Word files, for this I unpacked the file using the zip module, and after the replacement I packed it back and it worked, the files opened.
Re: Detect file type by content?
Posted: Sat Sep 21, 2024 4:43 pm
by Little John
That looks very promising. I'll see how far I get with it next week in the office. Many thanks for the tip!
Many thanks also to StarBootics for the link to the magic numbers!
Re: Detect file type by content?
Posted: Sat Sep 21, 2024 8:25 pm
by NicTheQuick
Under Linux you can simply use the "file" tool to recognize any file. Linux does it that way anyway.
Or you also can try "binwalk" which can also find other files within a binar data blob. This is especially interesting when you want to reverse engineer file formats or firmware files and stuff like that.
Re: Detect file type by content?
Posted: Sat Sep 21, 2024 10:43 pm
by Little John
NicTheQuick wrote: Sat Sep 21, 2024 8:25 pm
Under Linux you can simply use the "file" tool to recognize any file. Linux does it that way anyway.
Or you also can try "binwalk" which can also find other files within a binar data blob. This is especially interesting when you want to reverse engineer file formats or firmware files and stuff like that.
This is valuable information, thank you. It seems that I am too focused on Windows.

Re: Detect file type by content?
Posted: Sun Sep 22, 2024 7:44 pm
by DarkDragon
Little John wrote: Sat Sep 21, 2024 10:43 pm
NicTheQuick wrote: Sat Sep 21, 2024 8:25 pm
Under Linux you can simply use the "file" tool to recognize any file. Linux does it that way anyway.
Or you also can try "binwalk" which can also find other files within a binar data blob. This is especially interesting when you want to reverse engineer file formats or firmware files and stuff like that.
This is valuable information, thank you. It seems that I am too focused on Windows.
I use file on windows, too, if I have to. Usually within a cygwin, but nowadays you could try WSL/WSL2. There's also MinGW:
https://sourceforge.net/projects/mingw/ ... le-5.04-1/
Re: Detect file type by content?
Posted: Mon Sep 23, 2024 6:26 pm
by Little John
I wanted to try WSL anyway, but had almost forgotten about it. Thanks for reminding me!

Re: Detect file type by content?
Posted: Mon Sep 23, 2024 7:02 pm
by Quin
Little John wrote: Mon Sep 23, 2024 6:26 pm
I wanted to try WSL anyway, but had almost forgotten about it. Thanks for reminding me!
Believe it or not, all my browsing of the PB forums is done within WSL, using
edbrowse.
I have to disable Javascript or the ads make it hang for like 30 seconds on every page load, but that's expected for edbrowse.