pdftotext.exe with RunProgram() fails

Just starting out? Need help? Post your questions and find answers here.
Axolotl
Addict
Addict
Posts: 802
Joined: Wed Dec 31, 2008 3:36 pm

pdftotext.exe with RunProgram() fails

Post by Axolotl »

Hi there

I had a really strange idea on my pdf-files downloaded from several places.
My idea was to rename the files by information inside the file.
for that i found the tools of xpdfreader (especially pdftotext.exe, Link http://www.xpdfreader.com/pdftotext-man.html )
So far so good.
On console this tool is working like a charme.
commandline > pdftotext.exe -f 1 -l 1 test.pdf -

and on RunProgram there is only the Error: pdftotext version 4.02.
Nothing more.
To create an output text file is not the way I would like to go yet (only if no other solution is found:)
Maybe someone of you have some further hints?

Thanks in advance.
Just because it worked doesn't mean it works.
PureBasic 6.04 (x86) and <latest stable version and current alpha/beta> (x64) on Windows 11 Home. Now started with Linux (VM: Ubuntu 22.04).
Marc56us
Addict
Addict
Posts: 1600
Joined: Sat Feb 08, 2014 3:26 pm

Re: pdftotext.exe with RunProgram() fails

Post by Marc56us »

This works for me

Code: Select all

; pdftotext.exe and test.pdf are in c:\tmp

SetCurrentDirectory("c:\tmp")

RunProgram("pdftotext.exe", "-f 1 -l 1 test.pdf", "")
To create an output text file is not the way I would like to go yet (only if no other solution is found:)
You can output text to console, so use it without file
Pdftotext converts Portable Document Format (PDF) files to plain text.
Pdftotext reads the PDF file, PDF-file, and writes a text file, text-file. If text-file is not specified, pdftotext converts file.pdf to file.txt. If text-file is ´-’, the text is sent to stdout.
Sample output to variable

Code: Select all

; pdftotext.exe and test.pdf are in c:\tmp

SetCurrentDirectory("c:\tmp")

Compiler = RunProgram("pdftotext.exe", "-f 1 -l 1 test.pdf -", "", #PB_Program_Open | #PB_Program_Read)
If Compiler
    While ProgramRunning(Compiler)
        If AvailableProgramOutput(Compiler)
            Output$ + ReadProgramString(Compiler) + Chr(13)
        EndIf
    Wend
    Output$ + Chr(13) + Chr(13)

    Debug Output$
    
    CloseProgram(Compiler) 
EndIf
Enjoy
:wink:
Axolotl
Addict
Addict
Posts: 802
Joined: Wed Dec 31, 2008 3:36 pm

Re: pdftotext.exe with RunProgram() fails

Post by Axolotl »

Hi Marc56us
thanks for your feedback.
Then the problem is probably 40 cm in front of the monitor and between the ears.
Just because it worked doesn't mean it works.
PureBasic 6.04 (x86) and <latest stable version and current alpha/beta> (x64) on Windows 11 Home. Now started with Linux (VM: Ubuntu 22.04).
Marc56us
Addict
Addict
Posts: 1600
Joined: Sat Feb 08, 2014 3:26 pm

Re: pdftotext.exe with RunProgram() fails

Post by Marc56us »

Glad to have been helpful :wink: and thanks for the feedback.
Thanks also for reminding me of this great little program, I'll probably need it someday.

PS. Remember to download the latest version: 4.03 there are new options (4.02 is 2 years old)

4.03 (2021-jan-28)
------------------
Implemented selection extension via shift-click, and word/line
selection via double/triple click.
Added default bindings for ctrl-mousewheel-up/down to zoom in/out.
Added the "-nofonts" option to pdftohtml.
Added the "simple2" mode to pdftotext.
Added the "-rot" flag to xpdf, pdftoppm, and pdftopng.
Added the "-listencodings" flag to pdftotext.
Added the 'copyLinkTarget' command.
Added the 'selectionColor' xpdfrc setting.
Added the 'initialSidebarWidth' xpdfrc setting.
Added support for @"..." strings in xpdfrc files. This includes using

[...]
Axolotl
Addict
Addict
Posts: 802
Joined: Wed Dec 31, 2008 3:36 pm

Re: pdftotext.exe with RunProgram() fails (solved)

Post by Axolotl »

As expected: My fault. Now here is a small tool that allows you to convert pdf texts completely or just the first page.
@marc56us: now i can update to the latest version. (I also saw it during my research.)

Code: Select all

EnableExplicit 
DebugLevel 9 

#ProductName$      = "Test-PDF2Text" 
#ProductVersion$   = "0." + #PB_Editor_BuildCount + "." + #PB_Editor_CompileCount 
#ProductCopyright$ = "Copyright (c) 2021 by AHa " 

;' Window Title or Caption 
#MainCaption$    = #ProductName$ + " " + #ProductVersion$ 
#Caption$        = #ProductName$ + " " 


;' Program Flags 
#PB_Program_Flags = #PB_Program_Open|#PB_Program_Read|#PB_Program_Error|#PB_Program_Ascii|#PB_Program_Hide 

#ExecProg$ = "C:\Tools\Xpdf\bin64\pdftotext.exe"  ;' @TODO please adjust here to your own needs
#DefaultFile$ = "C:\Tools\Xpdf\bin64\test.pdf"    ;' @TODO please adjust here to your own needs


Enumeration EWindow 1 ;' starting with 1 because zero is reserved for Errors, Defaults, etc. 
  #WINDOW_Main  
EndEnumeration 

Enumeration EGadget 1 ;' I do not use zero here as well 
  #BUTTON_AllPages 
  #BUTTON_FirstPage  
  #STRING_File 
  #LIST_Text 
EndEnumeration 

Procedure RunPDFToText(File$, xAllPages) 
  Protected result, pid, stdout$, stderr$, tmp$  

  If FindString(File$, " ") > 0 
    File$ = #DQUOTE$ + File$ + #DQUOTE$                                        :Debug "INFO : take care of spaces in Filename "
  EndIf 

  tmp$ = "" 
  If xAllPages = 0 
    tmp$ = "-f 1 -l 1 " 
  EndIf 
  
  pid = RunProgram(#ExecProg$, tmp$ + File$ + " -", "", #PB_Program_Flags) 
  If IsProgram(pid)                                                            :Debug "INFO : running " 
    While ProgramRunning(pid) 
      If AvailableProgramOutput(pid) 
        stdout$ = ReadProgramString(pid) 
        If stdout$ 
          AddGadgetItem(#LIST_Text, -1, stdout$)                              ;:Debug ""+LSet(stdout$, 112)+" ; = "+RSet(Str(Len(stdout$)), 3)+" Chars " 
        EndIf 
        stderr$ = ReadProgramError(pid, #PB_Ascii)  : If stderr$ :Debug "ERROR: " + stderr$ : EndIf 
      EndIf 
    Wend ;' ProgramRunning() 
    result = ProgramExitCode(pid)                                              :Debug "INFO : ExitCode = " + result  
    CloseProgram(pid)                                                          :Debug "INFO : done" 
  EndIf 
  ProcedureReturn result 
EndProcedure ;() 


;=== Main Window ======================================================================================================

Procedure OnEventSizeMainWindow() 
  Protected ww, wh 

  ww = WindowWidth(#WINDOW_Main) : wh = WindowHeight(#WINDOW_Main) 
  ResizeGadget(#STRING_File, #PB_Ignore, #PB_Ignore, ww-172, #PB_Ignore) 

  ResizeGadget(#BUTTON_AllPages, ww-164, #PB_Ignore, #PB_Ignore, #PB_Ignore) 
  ResizeGadget(#BUTTON_FirstPage, ww-84, #PB_Ignore, #PB_Ignore, #PB_Ignore) 

  ResizeGadget(#LIST_Text, #PB_Ignore, #PB_Ignore, ww-8, wh-36) 
EndProcedure 

Procedure OpenMainWindow() 
  Protected hwnd, ww, wh  

  ww = 800 : wh = 600 
  hwnd = OpenWindow(#WINDOW_Main, 0, 0, ww, wh , #MainCaption$, #PB_Window_SystemMenu|#PB_Window_ScreenCentered|#PB_Window_SizeGadget) 
  If hwnd <> 0 
    StringGadget(#STRING_File, 4, 5, ww-172, 20, #DefaultFile$)  
    SendMessage_(GadgetID(#STRING_File), #EM_SETCUEBANNER, #Null, @"filename of the pdf file (copy the filename here)") 
    SHAutoComplete_(GadgetID(#STRING_File), $50000021) ;' #SHACF_AUTOAPPEND_FORCE_ON | #SHACF_AUTOSUGGEST_FORCE_ON | #SHACF_FILESYSTEM | #SHACF_FILESYS_DIRS 

    ButtonGadget(#BUTTON_AllPages, ww-164, 4, 76, 24, "All Pages") 
    ButtonGadget(#BUTTON_FirstPage, ww-84, 4, 76, 24, "First Page") 

    ListViewGadget(#LIST_Text, 4, 32, ww-8, wh-36) 
    BindEvent(#PB_Event_SizeWindow, @OnEventSizeMainWindow(), #WINDOW_Main) 
    EnableGadgetDrop(#STRING_File, #PB_Drop_Files, #PB_Drag_Copy) ;' make it droppable for filenames 
  EndIf 

  ProcedureReturn hwnd 
EndProcedure 

Procedure main() 
  Protected tt$ 

  CoInitialize_(#Null) ;' for autocomplete in stringgadgets  
  If OpenMainWindow() 
    Repeat 
      Select WaitWindowEvent() 
        Case #PB_Event_CloseWindow 
          Break 

        Case #PB_Event_Gadget 
          Select EventGadget() 
            Case #BUTTON_FirstPage 
              tt$ = GetGadgetText(#STRING_File) 
              ClearGadgetItems(#LIST_Text) 
              RunPDFToText(tt$, 0) 

            Case #BUTTON_AllPages  
              tt$ = GetGadgetText(#STRING_File) 
              ClearGadgetItems(#LIST_Text) 
              RunPDFToText(tt$, 1) 
          EndSelect 

        Case #PB_Event_GadgetDrop 
          Select EventGadget()
            Case #STRING_File  
              tt$ = EventDropFiles() 
              tt$ = StringField(tt$, 1, #LF$) 
              SetGadgetText(#STRING_File, tt$) 
              ClearGadgetItems(#LIST_Text) 
              RunPDFToText(tt$, 0) 
          EndSelect 
         EndSelect ; WaitWindowEvent() 
    ForEver 
  EndIf ; OpenWindow()  

  CoUninitialize_()  ;' autocomplete 
  ProcedureReturn 0 
EndProcedure 

End main() 
;BoF 
Just because it worked doesn't mean it works.
PureBasic 6.04 (x86) and <latest stable version and current alpha/beta> (x64) on Windows 11 Home. Now started with Linux (VM: Ubuntu 22.04).
Post Reply