Decode e.g. docx via an iFilter?

Just starting out? Need help? Post your questions and find answers here.
forumuser
User
User
Posts: 98
Joined: Wed Apr 18, 2018 8:24 am

Decode e.g. docx via an iFilter?

Post by forumuser »

Hi,

has anyone ever tried to get the (textual) content of an arbitrary format (e.g. *.docx, *.pptx, etc.)
via an (installed) iFilter under MS Windows? MS Office automatically installs the necessary iFilters
for the supported document types

query.dll contains the functions LoadIFilter / LoadIFilterEx functions but judging from:
; "pwcsPath" - Pointer to the full path of an object for which an IFilter Interface pointer is to be returned
; "pUnkOuter" - Pointer to the controlling IUnknown Interface of the aggregate in which this storage object exists
; "ppIUnk" - Pointer to an output variable that receives the IFilter Interface pointer
it seems the whole thing can only be handled via COM (or in PB's case: COMate Plus)?
ivega718
User
User
Posts: 15
Joined: Mon Feb 25, 2013 9:29 pm

Re: Decode e.g. docx via an iFilter?

Post by ivega718 »

You can read a file DOCX, XLSX, PTTX
Just rename the file to ZIP extension and unzip it. Then open the text files that the file has. Inside you will also find image files or others. The format of the information is in XML that can be read with many tools of this forum.
forumuser
User
User
Posts: 98
Joined: Wed Apr 18, 2018 8:24 am

Re: Decode e.g. docx via an iFilter?

Post by forumuser »

Thanks but this is not what I'm looking for.

iFilters exist for a lot of document formats, e.g. *.msg, *.pdf, and even metadata information for images

They allow easy and fast access to the text inside those document types without the need for the user
to extract archives, scanning files for content (e.g. look at the files of an extracted .msg type...) / reading
and parsing xml structures.
Post Reply