[Solved] Read a String inside a PDF file

Just starting out? Need help? Post your questions and find answers here.
User avatar
Fig
Enthusiast
Enthusiast
Posts: 352
Joined: Thu Apr 30, 2009 5:23 pm
Location: Côtes d'Azur, France

[Solved] Read a String inside a PDF file

Post by Fig »

I got thousands of pdf files with exotics names.
Inside these files there is a string i would like to read in order to rename the file.

I obviously can't do it manualy.
I read a pdf is a compressed format... Is there any way to read a string from a pdf file in purebasic ? (a user library maybe ?)
Last edited by Fig on Sun Apr 26, 2020 6:19 pm, edited 1 time in total.
There are 2 methods to program bugless.
But only the third works fine.

Win10, Pb x64 5.71 LTS
BarryG
Addict
Addict
Posts: 4121
Joined: Thu Apr 18, 2019 8:17 am

Re: Read a String inside a PDF file

Post by BarryG »

If you can link to one of those PDFs, I would like to try to read it (with PureBasic) and see if I can help.
User avatar
Fig
Enthusiast
Enthusiast
Posts: 352
Joined: Thu Apr 30, 2009 5:23 pm
Location: Côtes d'Azur, France

Re: Read a String inside a PDF file

Post by Fig »

Unfortunatly they are at work. I will bring one on monday.
They are instruments metrological reports.

Thank you.
There are 2 methods to program bugless.
But only the third works fine.

Win10, Pb x64 5.71 LTS
User avatar
GeoTrail
Addict
Addict
Posts: 2794
Joined: Fri Feb 13, 2004 12:45 am
Location: Bergen, Norway
Contact:

Re: Read a String inside a PDF file

Post by GeoTrail »

Did you find a solution for reading PDF files?
I Stepped On A Cornflake!!! Now I'm A Cereal Killer!
loulou2522
Enthusiast
Enthusiast
Posts: 542
Joined: Tue Oct 14, 2014 12:09 pm

Re: Read a String inside a PDF file

Post by loulou2522 »

VeryPDF is a programm to convert PDF to TXT Converter
http://www.verypdf.com/app/pdf-to-txt-c ... index.html it cost only 38 $ and allow with a command line to convert quickly a pDF into a text file and after you have only to search your word in the file
Hoep it answer to your question
You can tst the programm with a free version downloadin on the site
Here is all command line explain
Command line usage:

Code: Select all

PDF2TXT <input PDF file> [output TXT file] [-logfile] [-open] [-space] [-html] [-format] [-silent] [-blankline] [-summary] [-zoom <num>] [-?] [-h]

<input PDF file>      : Open an existing PDF file to convert.
[output TXT file]     : Write to TEXT file, the default is same filename of input PDF file.
[-first <page number>]: Specify the first page number.
[-last  <page number>]: Specify the last page number.
[-logfile]            : Write log to "C:\pdf2txt.log" file.
[-open]               : Auto open the text file after it be created.
[-space]              : Auto insert spaces into text file.
[-html]               : Output to a HTML file, not a text file.
[-format]             : Keep the page layout in the generated TXT file.
[-silent]             : Disable error and warning messages.
[-blankline]          : Auto delete blank line in the generated TXT file.
[-summary]            : Get PDF document summary.
[-zoom <num>]         : Set zoom ratio, the range is from 50 to 200.
[-unicode]            : Create UTF-8 encoding text file.
[-?]                  : Help.
[-h]                  : Help.

For example:

C:\>PDF2TXT C:\input.pdf
C:\>PDF2TXT C:\input.pdf -unicode
C:\>PDF2TXT C:\input.pdf -first 10 -last 12
C:\>PDF2TXT C:\input.pdf C:\output.txt
C:\>PDF2TXT C:\input.pdf -open -silent -logfile  -zoom 150
C:\>PDF2TXT C:\input.pdf C:\output.txt -open -silent
C:\>PDF2TXT C:\*.pdf
C:\>PDF2TXT C:\*.pdf C:\*.txt
C:\>PDF2TXT C:\test\*.pdf C:\test\*.txt
I use it with Purebasic
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: Read a String inside a PDF file

Post by IdeasVacuum »

Try Google's PDFium dll

Code: Select all

Prototype protoInitLibrary()
Prototype protoLoadDocument(documentpath.p-utf8, password.p-utf8)
Prototype protoGetPageCount(document)
Prototype protoLoadPage(document, page_index)
Prototype protoGetMediaBox(page, *pLeft, *pBtm, *pRight, *pTop)
Prototype protoLoadTextPage(textpage)
Prototype protoCloseTextPage(textpage)
Prototype protoCountChars(textpage)
Prototype protoGetText(textpage, istart, iCharCnt, *result)
Prototype protoGetRotation(page)
Prototype protoGetPageHeight(page)
Prototype protoGetPageWidth(page)
Prototype protoGetMetaText(document, tag.p-utf8, buffer, buflen)
Prototype protoRenderPage(hDC, page, start_x, start_y, size_x, size_y, rotate, flags)
Prototype protoCloseDocument(document)

OpenLibrary(#PdfLib, "pdfium.dll")

Global       InitLibrary.protoInitLibrary = GetFunction(#PdfLib,"_FPDF_InitLibrary@0")
Global     LoadDocument.protoLoadDocument = GetFunction(#PdfLib,"_FPDF_LoadDocument@8")
Global     GetPageCount.protoGetPageCount = GetFunction(#PdfLib,"_FPDF_GetPageCount@4")
Global             LoadPage.protoLoadPage = GetFunction(#PdfLib,"_FPDF_LoadPage@8")
Global     LoadTextPage.protoLoadTextPage = GetFunction(#PdfLib,"_FPDFText_LoadPage@4")
Global   CloseTextPage.protoCloseTextPage = GetFunction(#PdfLib,"_FPDFText_ClosePage@4")
Global         CountChars.protoCountChars = GetFunction(#PdfLib,"_FPDFText_CountChars@4")
Global       GetMetaText.protoGetMetaText = GetFunction(#PdfLib,"_FPDF_GetMetaText@16")
Global               GetText.protoGetText = GetFunction(#PdfLib,"_FPDFText_GetText@16")
Global       GetMediaBox.protoGetMediaBox = GetFunction(#PdfLib,"_FPDFPage_GetMediaBox@20")
Global       GetRotation.protoGetRotation = GetFunction(#PdfLib,"_FPDFPage_GetRotation@4")
Global   GetPageHeight.protoGetPageHeight = GetFunction(#PdfLib,"_FPDF_GetPageHeight@4")
Global     GetPageWidth.protoGetPageWidth = GetFunction(#PdfLib,"_FPDF_GetPageWidth@4")
Global         RenderPage.protoRenderPage = GetFunction(#PdfLib,"_FPDF_RenderPage@32")
Global   CloseDocument.protoCloseDocument = GetFunction(#PdfLib,"_FPDF_CloseDocument@4")
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
loulou2522
Enthusiast
Enthusiast
Posts: 542
Joined: Tue Oct 14, 2014 12:09 pm

Re: Read a String inside a PDF file

Post by loulou2522 »

Hi IdeasVacuum,
Very interesting i will try it
Where can i download the dll for windows ?
THanks in advance
User avatar
Fig
Enthusiast
Enthusiast
Posts: 352
Joined: Thu Apr 30, 2009 5:23 pm
Location: Côtes d'Azur, France

Re: Read a String inside a PDF file

Post by Fig »

IdeasVacuum,It's what I am looking for...
Do you have any snipet how to initialise it and use it ?
I got a [18:02:33] [ERROR] Invalid memory access. (write error at address 0) when I initialise it...
Maybe the dll is not the good one....

(I got the dll from the sdk package Zip: https://pdfium.patagames.com/downloads/ )

Code: Select all

Prototype protoInitLibrary()
Prototype protoLoadDocument(documentpath.p-utf8, password.p-utf8)
Prototype protoGetPageCount(document)
Prototype protoLoadPage(document, page_index)
Prototype protoGetMediaBox(page, *pLeft, *pBtm, *pRight, *pTop)
Prototype protoLoadTextPage(textpage)
Prototype protoCloseTextPage(textpage)
Prototype protoCountChars(textpage)
Prototype protoGetText(textpage, istart, iCharCnt, *result)
Prototype protoGetRotation(page)
Prototype protoGetPageHeight(page)
Prototype protoGetPageWidth(page)
Prototype protoGetMetaText(document, tag.p-utf8, buffer, buflen)
Prototype protoRenderPage(hDC, page, start_x, start_y, size_x, size_y, rotate, flags)
Prototype protoCloseDocument(document)
PdfLib=OpenLibrary(PdfLib, "pdfium.dll")

Global     InitLibrary.protoInitLibrary = GetFunction(PdfLib,"_FPDF_InitLibrary@0")
Global     LoadDocument.protoLoadDocument = GetFunction(PdfLib,"_FPDF_LoadDocument@8")
Global     GetPageCount.protoGetPageCount = GetFunction(PdfLib,"_FPDF_GetPageCount@4")
Global             LoadPage.protoLoadPage = GetFunction(PdfLib,"_FPDF_LoadPage@8")
Global     LoadTextPage.protoLoadTextPage = GetFunction(PdfLib,"_FPDFText_LoadPage@4")
Global   CloseTextPage.protoCloseTextPage = GetFunction(PdfLib,"_FPDFText_ClosePage@4")
Global         CountChars.protoCountChars = GetFunction(PdfLib,"_FPDFText_CountChars@4")
Global       GetMetaText.protoGetMetaText = GetFunction(PdfLib,"_FPDF_GetMetaText@16")
Global               GetText.protoGetText = GetFunction(PdfLib,"_FPDFText_GetText@16")
Global       GetMediaBox.protoGetMediaBox = GetFunction(PdfLib,"_FPDFPage_GetMediaBox@20")
Global       GetRotation.protoGetRotation = GetFunction(PdfLib,"_FPDFPage_GetRotation@4")
Global   GetPageHeight.protoGetPageHeight = GetFunction(PdfLib,"_FPDF_GetPageHeight@4")
Global     GetPageWidth.protoGetPageWidth = GetFunction(PdfLib,"_FPDF_GetPageWidth@4")
Global         RenderPage.protoRenderPage = GetFunction(PdfLib,"_FPDF_RenderPage@32")
Global   CloseDocument.protoCloseDocument = GetFunction(PdfLib,"_FPDF_CloseDocument@4")
InitLibrary()
LoadDocument("C:\Users\utilisateur\Desktop\pdf-test\CO-WHCA.pdf","")
Last edited by Fig on Sun Apr 26, 2020 5:07 pm, edited 1 time in total.
There are 2 methods to program bugless.
But only the third works fine.

Win10, Pb x64 5.71 LTS
RASHAD
PureBasic Expert
PureBasic Expert
Posts: 4944
Joined: Sun Apr 12, 2009 6:27 am

Re: Read a String inside a PDF file

Post by RASHAD »

If your PDF files not just an image
Try GhostPDF
Or MuPDF
Egypt my love
loulou2522
Enthusiast
Enthusiast
Posts: 542
Joined: Tue Oct 14, 2014 12:09 pm

Re: Read a String inside a PDF file

Post by loulou2522 »

IdeasVacuum I have the same problem than FIG.
How can we install the dll ?
User avatar
Fig
Enthusiast
Enthusiast
Posts: 352
Joined: Thu Apr 30, 2009 5:23 pm
Location: Côtes d'Azur, France

Re: Read a String inside a PDF file

Post by Fig »

RASHAD wrote:If your PDF files not just an image
Try GhostPDF
Or MuPDF
I managed to convert pdf in txt files with Mupdf and inline command.
It will do the trick, but it's too bad pdfium doesn't work. :cry:

Thank you. :D
There are 2 methods to program bugless.
But only the third works fine.

Win10, Pb x64 5.71 LTS
infratec
Always Here
Always Here
Posts: 7575
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: Read a String inside a PDF file

Post by infratec »

Code: Select all

Global     InitLibrary.protoInitLibrary = GetFunction(PdfLib,"_FPDF_InitLibrary@0")
results in #Null

So this is the wrong dll.
loulou2522
Enthusiast
Enthusiast
Posts: 542
Joined: Tue Oct 14, 2014 12:09 pm

Re: Read a String inside a PDF file

Post by loulou2522 »

Where can we find the right dll ?
User avatar
chi
Addict
Addict
Posts: 1087
Joined: Sat May 05, 2007 5:31 pm
Location: Austria

Re: Read a String inside a PDF file

Post by chi »

Fig wrote:(I got the dll from the sdk package Zip: https://pdfium.patagames.com/downloads/ )
That's the Pdfium.Net SDK

https://github.com/bblanchon/pdfium-binaries

Code: Select all

Import "pdfium.dll.lib"
  FPDF_InitLibrary()
  FPDF_LoadDocument(documentpath.p-utf8, password.p-utf8)
  FPDF_GetPageCount(document)
  FPDF_CloseDocument(document)
EndImport 

FPDF_InitLibrary()
doc = FPDF_LoadDocument("D:\Program Files\PureBasic5.72(x64)\Compilers\FASM.PDF","")
Debug FPDF_GetPageCount(doc)
FPDF_CloseDocument(doc)
Et cetera is my worst enemy
User avatar
Fig
Enthusiast
Enthusiast
Posts: 352
Joined: Thu Apr 30, 2009 5:23 pm
Location: Côtes d'Azur, France

Re: Read a String inside a PDF file

Post by Fig »

It's works ! Thank you chi !
Last edited by Fig on Sun Apr 26, 2020 6:32 pm, edited 1 time in total.
There are 2 methods to program bugless.
But only the third works fine.

Win10, Pb x64 5.71 LTS
Post Reply