Page 1 of 1

Decode e.g. docx via an iFilter?

Posted: Mon May 14, 2018 4:46 pm
by forumuser
Hi,

has anyone ever tried to get the (textual) content of an arbitrary format (e.g. *.docx, *.pptx, etc.)
via an (installed) iFilter under MS Windows? MS Office automatically installs the necessary iFilters
for the supported document types

query.dll contains the functions LoadIFilter / LoadIFilterEx functions but judging from:
; "pwcsPath" - Pointer to the full path of an object for which an IFilter Interface pointer is to be returned
; "pUnkOuter" - Pointer to the controlling IUnknown Interface of the aggregate in which this storage object exists
; "ppIUnk" - Pointer to an output variable that receives the IFilter Interface pointer
it seems the whole thing can only be handled via COM (or in PB's case: COMate Plus)?

Re: Decode e.g. docx via an iFilter?

Posted: Mon May 14, 2018 11:34 pm
by ivega718
You can read a file DOCX, XLSX, PTTX
Just rename the file to ZIP extension and unzip it. Then open the text files that the file has. Inside you will also find image files or others. The format of the information is in XML that can be read with many tools of this forum.

Re: Decode e.g. docx via an iFilter?

Posted: Tue May 15, 2018 6:33 am
by forumuser
Thanks but this is not what I'm looking for.

iFilters exist for a lot of document formats, e.g. *.msg, *.pdf, and even metadata information for images

They allow easy and fast access to the text inside those document types without the need for the user
to extract archives, scanning files for content (e.g. look at the files of an extracted .msg type...) / reading
and parsing xml structures.