[Solved] Read a String inside a PDF file
[Solved] Read a String inside a PDF file
I got thousands of pdf files with exotics names.
Inside these files there is a string i would like to read in order to rename the file.
I obviously can't do it manualy.
I read a pdf is a compressed format... Is there any way to read a string from a pdf file in purebasic ? (a user library maybe ?)
Inside these files there is a string i would like to read in order to rename the file.
I obviously can't do it manualy.
I read a pdf is a compressed format... Is there any way to read a string from a pdf file in purebasic ? (a user library maybe ?)
Last edited by Fig on Sun Apr 26, 2020 6:19 pm, edited 1 time in total.
There are 2 methods to program bugless.
But only the third works fine.
Win10, Pb x64 5.71 LTS
But only the third works fine.
Win10, Pb x64 5.71 LTS
Re: Read a String inside a PDF file
If you can link to one of those PDFs, I would like to try to read it (with PureBasic) and see if I can help.
Re: Read a String inside a PDF file
Unfortunatly they are at work. I will bring one on monday.
They are instruments metrological reports.
Thank you.
They are instruments metrological reports.
Thank you.
There are 2 methods to program bugless.
But only the third works fine.
Win10, Pb x64 5.71 LTS
But only the third works fine.
Win10, Pb x64 5.71 LTS
Re: Read a String inside a PDF file
Did you find a solution for reading PDF files?
I Stepped On A Cornflake!!! Now I'm A Cereal Killer!
-
- Enthusiast
- Posts: 542
- Joined: Tue Oct 14, 2014 12:09 pm
Re: Read a String inside a PDF file
VeryPDF is a programm to convert PDF to TXT Converter
http://www.verypdf.com/app/pdf-to-txt-c ... index.html it cost only 38 $ and allow with a command line to convert quickly a pDF into a text file and after you have only to search your word in the file
Hoep it answer to your question
You can tst the programm with a free version downloadin on the site
Here is all command line explain
Command line usage:
I use it with Purebasic
http://www.verypdf.com/app/pdf-to-txt-c ... index.html it cost only 38 $ and allow with a command line to convert quickly a pDF into a text file and after you have only to search your word in the file
Hoep it answer to your question
You can tst the programm with a free version downloadin on the site
Here is all command line explain
Command line usage:
Code: Select all
PDF2TXT <input PDF file> [output TXT file] [-logfile] [-open] [-space] [-html] [-format] [-silent] [-blankline] [-summary] [-zoom <num>] [-?] [-h]
<input PDF file> : Open an existing PDF file to convert.
[output TXT file] : Write to TEXT file, the default is same filename of input PDF file.
[-first <page number>]: Specify the first page number.
[-last <page number>]: Specify the last page number.
[-logfile] : Write log to "C:\pdf2txt.log" file.
[-open] : Auto open the text file after it be created.
[-space] : Auto insert spaces into text file.
[-html] : Output to a HTML file, not a text file.
[-format] : Keep the page layout in the generated TXT file.
[-silent] : Disable error and warning messages.
[-blankline] : Auto delete blank line in the generated TXT file.
[-summary] : Get PDF document summary.
[-zoom <num>] : Set zoom ratio, the range is from 50 to 200.
[-unicode] : Create UTF-8 encoding text file.
[-?] : Help.
[-h] : Help.
For example:
C:\>PDF2TXT C:\input.pdf
C:\>PDF2TXT C:\input.pdf -unicode
C:\>PDF2TXT C:\input.pdf -first 10 -last 12
C:\>PDF2TXT C:\input.pdf C:\output.txt
C:\>PDF2TXT C:\input.pdf -open -silent -logfile -zoom 150
C:\>PDF2TXT C:\input.pdf C:\output.txt -open -silent
C:\>PDF2TXT C:\*.pdf
C:\>PDF2TXT C:\*.pdf C:\*.txt
C:\>PDF2TXT C:\test\*.pdf C:\test\*.txt
-
- Always Here
- Posts: 6426
- Joined: Fri Oct 23, 2009 2:33 am
- Location: Wales, UK
- Contact:
Re: Read a String inside a PDF file
Try Google's PDFium dll
Code: Select all
Prototype protoInitLibrary()
Prototype protoLoadDocument(documentpath.p-utf8, password.p-utf8)
Prototype protoGetPageCount(document)
Prototype protoLoadPage(document, page_index)
Prototype protoGetMediaBox(page, *pLeft, *pBtm, *pRight, *pTop)
Prototype protoLoadTextPage(textpage)
Prototype protoCloseTextPage(textpage)
Prototype protoCountChars(textpage)
Prototype protoGetText(textpage, istart, iCharCnt, *result)
Prototype protoGetRotation(page)
Prototype protoGetPageHeight(page)
Prototype protoGetPageWidth(page)
Prototype protoGetMetaText(document, tag.p-utf8, buffer, buflen)
Prototype protoRenderPage(hDC, page, start_x, start_y, size_x, size_y, rotate, flags)
Prototype protoCloseDocument(document)
OpenLibrary(#PdfLib, "pdfium.dll")
Global InitLibrary.protoInitLibrary = GetFunction(#PdfLib,"_FPDF_InitLibrary@0")
Global LoadDocument.protoLoadDocument = GetFunction(#PdfLib,"_FPDF_LoadDocument@8")
Global GetPageCount.protoGetPageCount = GetFunction(#PdfLib,"_FPDF_GetPageCount@4")
Global LoadPage.protoLoadPage = GetFunction(#PdfLib,"_FPDF_LoadPage@8")
Global LoadTextPage.protoLoadTextPage = GetFunction(#PdfLib,"_FPDFText_LoadPage@4")
Global CloseTextPage.protoCloseTextPage = GetFunction(#PdfLib,"_FPDFText_ClosePage@4")
Global CountChars.protoCountChars = GetFunction(#PdfLib,"_FPDFText_CountChars@4")
Global GetMetaText.protoGetMetaText = GetFunction(#PdfLib,"_FPDF_GetMetaText@16")
Global GetText.protoGetText = GetFunction(#PdfLib,"_FPDFText_GetText@16")
Global GetMediaBox.protoGetMediaBox = GetFunction(#PdfLib,"_FPDFPage_GetMediaBox@20")
Global GetRotation.protoGetRotation = GetFunction(#PdfLib,"_FPDFPage_GetRotation@4")
Global GetPageHeight.protoGetPageHeight = GetFunction(#PdfLib,"_FPDF_GetPageHeight@4")
Global GetPageWidth.protoGetPageWidth = GetFunction(#PdfLib,"_FPDF_GetPageWidth@4")
Global RenderPage.protoRenderPage = GetFunction(#PdfLib,"_FPDF_RenderPage@32")
Global CloseDocument.protoCloseDocument = GetFunction(#PdfLib,"_FPDF_CloseDocument@4")
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
If it sounds simple, you have not grasped the complexity.
-
- Enthusiast
- Posts: 542
- Joined: Tue Oct 14, 2014 12:09 pm
Re: Read a String inside a PDF file
Hi IdeasVacuum,
Very interesting i will try it
Where can i download the dll for windows ?
THanks in advance
Very interesting i will try it
Where can i download the dll for windows ?
THanks in advance
Re: Read a String inside a PDF file
IdeasVacuum,It's what I am looking for...
Do you have any snipet how to initialise it and use it ?
I got a [18:02:33] [ERROR] Invalid memory access. (write error at address 0) when I initialise it...
Maybe the dll is not the good one....
(I got the dll from the sdk package Zip: https://pdfium.patagames.com/downloads/ )
Do you have any snipet how to initialise it and use it ?
I got a [18:02:33] [ERROR] Invalid memory access. (write error at address 0) when I initialise it...
Maybe the dll is not the good one....
(I got the dll from the sdk package Zip: https://pdfium.patagames.com/downloads/ )
Code: Select all
Prototype protoInitLibrary()
Prototype protoLoadDocument(documentpath.p-utf8, password.p-utf8)
Prototype protoGetPageCount(document)
Prototype protoLoadPage(document, page_index)
Prototype protoGetMediaBox(page, *pLeft, *pBtm, *pRight, *pTop)
Prototype protoLoadTextPage(textpage)
Prototype protoCloseTextPage(textpage)
Prototype protoCountChars(textpage)
Prototype protoGetText(textpage, istart, iCharCnt, *result)
Prototype protoGetRotation(page)
Prototype protoGetPageHeight(page)
Prototype protoGetPageWidth(page)
Prototype protoGetMetaText(document, tag.p-utf8, buffer, buflen)
Prototype protoRenderPage(hDC, page, start_x, start_y, size_x, size_y, rotate, flags)
Prototype protoCloseDocument(document)
PdfLib=OpenLibrary(PdfLib, "pdfium.dll")
Global InitLibrary.protoInitLibrary = GetFunction(PdfLib,"_FPDF_InitLibrary@0")
Global LoadDocument.protoLoadDocument = GetFunction(PdfLib,"_FPDF_LoadDocument@8")
Global GetPageCount.protoGetPageCount = GetFunction(PdfLib,"_FPDF_GetPageCount@4")
Global LoadPage.protoLoadPage = GetFunction(PdfLib,"_FPDF_LoadPage@8")
Global LoadTextPage.protoLoadTextPage = GetFunction(PdfLib,"_FPDFText_LoadPage@4")
Global CloseTextPage.protoCloseTextPage = GetFunction(PdfLib,"_FPDFText_ClosePage@4")
Global CountChars.protoCountChars = GetFunction(PdfLib,"_FPDFText_CountChars@4")
Global GetMetaText.protoGetMetaText = GetFunction(PdfLib,"_FPDF_GetMetaText@16")
Global GetText.protoGetText = GetFunction(PdfLib,"_FPDFText_GetText@16")
Global GetMediaBox.protoGetMediaBox = GetFunction(PdfLib,"_FPDFPage_GetMediaBox@20")
Global GetRotation.protoGetRotation = GetFunction(PdfLib,"_FPDFPage_GetRotation@4")
Global GetPageHeight.protoGetPageHeight = GetFunction(PdfLib,"_FPDF_GetPageHeight@4")
Global GetPageWidth.protoGetPageWidth = GetFunction(PdfLib,"_FPDF_GetPageWidth@4")
Global RenderPage.protoRenderPage = GetFunction(PdfLib,"_FPDF_RenderPage@32")
Global CloseDocument.protoCloseDocument = GetFunction(PdfLib,"_FPDF_CloseDocument@4")
InitLibrary()
LoadDocument("C:\Users\utilisateur\Desktop\pdf-test\CO-WHCA.pdf","")
Last edited by Fig on Sun Apr 26, 2020 5:07 pm, edited 1 time in total.
There are 2 methods to program bugless.
But only the third works fine.
Win10, Pb x64 5.71 LTS
But only the third works fine.
Win10, Pb x64 5.71 LTS
Re: Read a String inside a PDF file
If your PDF files not just an image
Try GhostPDF
Or MuPDF
Try GhostPDF
Or MuPDF
Egypt my love
-
- Enthusiast
- Posts: 542
- Joined: Tue Oct 14, 2014 12:09 pm
Re: Read a String inside a PDF file
IdeasVacuum I have the same problem than FIG.
How can we install the dll ?
How can we install the dll ?
Re: Read a String inside a PDF file
I managed to convert pdf in txt files with Mupdf and inline command.RASHAD wrote:If your PDF files not just an image
Try GhostPDF
Or MuPDF
It will do the trick, but it's too bad pdfium doesn't work.

Thank you.

There are 2 methods to program bugless.
But only the third works fine.
Win10, Pb x64 5.71 LTS
But only the third works fine.
Win10, Pb x64 5.71 LTS
Re: Read a String inside a PDF file
Code: Select all
Global InitLibrary.protoInitLibrary = GetFunction(PdfLib,"_FPDF_InitLibrary@0")
So this is the wrong dll.
-
- Enthusiast
- Posts: 542
- Joined: Tue Oct 14, 2014 12:09 pm
Re: Read a String inside a PDF file
Where can we find the right dll ?
Re: Read a String inside a PDF file
That's the Pdfium.Net SDKFig wrote:(I got the dll from the sdk package Zip: https://pdfium.patagames.com/downloads/ )
https://github.com/bblanchon/pdfium-binaries
Code: Select all
Import "pdfium.dll.lib"
FPDF_InitLibrary()
FPDF_LoadDocument(documentpath.p-utf8, password.p-utf8)
FPDF_GetPageCount(document)
FPDF_CloseDocument(document)
EndImport
FPDF_InitLibrary()
doc = FPDF_LoadDocument("D:\Program Files\PureBasic5.72(x64)\Compilers\FASM.PDF","")
Debug FPDF_GetPageCount(doc)
FPDF_CloseDocument(doc)
Et cetera is my worst enemy
Re: Read a String inside a PDF file
It's works ! Thank you chi !
Last edited by Fig on Sun Apr 26, 2020 6:32 pm, edited 1 time in total.
There are 2 methods to program bugless.
But only the third works fine.
Win10, Pb x64 5.71 LTS
But only the third works fine.
Win10, Pb x64 5.71 LTS