Extract XML data from MS .docx file
-
- Always Here
- Posts: 6426
- Joined: Fri Oct 23, 2009 2:33 am
- Location: Wales, UK
- Contact:
Extract XML data from MS .docx file
Extract XML data from MS .docx file - anybody know how to do this programmatically?
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
If it sounds simple, you have not grasped the complexity.
Re: Extract XML data from MS .docx file
It is nothing else than a zipped folder and you can use PB functions to open and extract the content (like opening a ZIP file).
More about the content here: http://forensicswiki.org/wiki/Word_Document_(DOCX)
More about the content here: http://forensicswiki.org/wiki/Word_Document_(DOCX)
-
- Always Here
- Posts: 6426
- Joined: Fri Oct 23, 2009 2:33 am
- Location: Wales, UK
- Contact:
Re: Extract XML data from MS .docx file
Yeah, I had assumed it was, similar to an XLSX file.
I have in recent past processed XLSX files with code like this:
..... in that case, I knew the names of the packed files required. I don't know the names of the files stored in the DOCX files to be processed. So, tried to initially unpack all files but no joy at the moment, no filenames are returned:
.... and the very reliable BandiZip does not recognize them as Zip files either. So, perhaps they have been scrambled.
Edit: Nope, SoftMaker Office (TextMaker) opens the files with ease, so clearly they can be found.
I have in recent past processed XLSX files with code like this:
Code: Select all
;Open the packed file
If OpenPack(#Pack, sgCopyXLSX, #PB_PackerPlugin_Zip)
If ExaminePack(#Pack)
While NextPackEntry(#Pack)
If FindString(PackEntryName(#Pack), "sheet1.xml", 1, #PB_String_NoCase)
sFileXML.s = PackEntryName(#Pack)
UncompressPackFile(#Pack, sFileXML, PackEntryName(#Pack))
Break
EndIf
Wend
EndIf
ClosePack(#Pack)
Code: Select all
#Pack = 0
sDOCX.s = "C:\MY TEMP\Sample.docx"
If OpenPack(#Pack, sDOCX, #PB_PackerPlugin_Zip)
If ExaminePack(#Pack)
While NextPackEntry(#Pack)
Debug PackEntryName(#Pack)
Wend
EndIf
ClosePack(#Pack)
EndIf
Edit: Nope, SoftMaker Office (TextMaker) opens the files with ease, so clearly they can be found.
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
If it sounds simple, you have not grasped the complexity.
Re: Extract XML data from MS .docx file
open bandizip from "All Programs" and from there choose your file
you'll see the folders
Norm.
you'll see the folders
Code: Select all
_rels
docProps
word
google Translate;Makes my jokes fall flat- Fait mes blagues tombent à plat- Machte meine Witze verpuffen- Eh cumpari ci vo sunari
-
- Always Here
- Posts: 6426
- Joined: Fri Oct 23, 2009 2:33 am
- Location: Wales, UK
- Contact:
Re: Extract XML data from MS .docx file
Hi normeus - Bandizip reputation intact
Bandizip also works if the file extension is changed to .zip, but that makes no difference to PB's Pack function.
7zip can do it effortlessly, so I could use the command line version via RunProgram. Though it would be nice to be able to use PB's Pack function.

Bandizip also works if the file extension is changed to .zip, but that makes no difference to PB's Pack function.
7zip can do it effortlessly, so I could use the command line version via RunProgram. Though it would be nice to be able to use PB's Pack function.
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
If it sounds simple, you have not grasped the complexity.
Re: Extract XML data from MS .docx file
IdeasVacuum: Your code (or at least what you've presented here) doesn't call UseZipPacker(). You need to call that before trying to examine a DOCX or XLSX file's contents.