pdftotext.exe with RunProgram() fails

Just starting out? Need help? Post your questions and find answers here.
Axolotl
Enthusiast
Enthusiast
Posts: 435
Joined: Wed Dec 31, 2008 3:36 pm

pdftotext.exe with RunProgram() fails

Post by Axolotl »

Hi there

I had a really strange idea on my pdf-files downloaded from several places.
My idea was to rename the files by information inside the file.
for that i found the tools of xpdfreader (especially pdftotext.exe, Link http://www.xpdfreader.com/pdftotext-man.html )
So far so good.
On console this tool is working like a charme.
commandline > pdftotext.exe -f 1 -l 1 test.pdf -

and on RunProgram there is only the Error: pdftotext version 4.02.
Nothing more.
To create an output text file is not the way I would like to go yet (only if no other solution is found:)
Maybe someone of you have some further hints?

Thanks in advance.
Mostly running PureBasic <latest stable version and current alpha/beta> (x64) on Windows 11 Home
Marc56us
Addict
Addict
Posts: 1477
Joined: Sat Feb 08, 2014 3:26 pm

Re: pdftotext.exe with RunProgram() fails

Post by Marc56us »

This works for me

Code: Select all

; pdftotext.exe and test.pdf are in c:\tmp

SetCurrentDirectory("c:\tmp")

RunProgram("pdftotext.exe", "-f 1 -l 1 test.pdf", "")
To create an output text file is not the way I would like to go yet (only if no other solution is found:)
You can output text to console, so use it without file
Pdftotext converts Portable Document Format (PDF) files to plain text.
Pdftotext reads the PDF file, PDF-file, and writes a text file, text-file. If text-file is not specified, pdftotext converts file.pdf to file.txt. If text-file is ´-’, the text is sent to stdout.
Sample output to variable

Code: Select all

; pdftotext.exe and test.pdf are in c:\tmp

SetCurrentDirectory("c:\tmp")

Compiler = RunProgram("pdftotext.exe", "-f 1 -l 1 test.pdf -", "", #PB_Program_Open | #PB_Program_Read)
If Compiler
    While ProgramRunning(Compiler)
        If AvailableProgramOutput(Compiler)
            Output$ + ReadProgramString(Compiler) + Chr(13)
        EndIf
    Wend
    Output$ + Chr(13) + Chr(13)

    Debug Output$
    
    CloseProgram(Compiler) 
EndIf
Enjoy
:wink:
Axolotl
Enthusiast
Enthusiast
Posts: 435
Joined: Wed Dec 31, 2008 3:36 pm

Re: pdftotext.exe with RunProgram() fails

Post by Axolotl »

Hi Marc56us
thanks for your feedback.
Then the problem is probably 40 cm in front of the monitor and between the ears.
Mostly running PureBasic <latest stable version and current alpha/beta> (x64) on Windows 11 Home
Marc56us
Addict
Addict
Posts: 1477
Joined: Sat Feb 08, 2014 3:26 pm

Re: pdftotext.exe with RunProgram() fails

Post by Marc56us »

Glad to have been helpful :wink: and thanks for the feedback.
Thanks also for reminding me of this great little program, I'll probably need it someday.

PS. Remember to download the latest version: 4.03 there are new options (4.02 is 2 years old)

4.03 (2021-jan-28)
------------------
Implemented selection extension via shift-click, and word/line
selection via double/triple click.
Added default bindings for ctrl-mousewheel-up/down to zoom in/out.
Added the "-nofonts" option to pdftohtml.
Added the "simple2" mode to pdftotext.
Added the "-rot" flag to xpdf, pdftoppm, and pdftopng.
Added the "-listencodings" flag to pdftotext.
Added the 'copyLinkTarget' command.
Added the 'selectionColor' xpdfrc setting.
Added the 'initialSidebarWidth' xpdfrc setting.
Added support for @"..." strings in xpdfrc files. This includes using

[...]
Axolotl
Enthusiast
Enthusiast
Posts: 435
Joined: Wed Dec 31, 2008 3:36 pm

Re: pdftotext.exe with RunProgram() fails (solved)

Post by Axolotl »

As expected: My fault. Now here is a small tool that allows you to convert pdf texts completely or just the first page.
@marc56us: now i can update to the latest version. (I also saw it during my research.)

Code: Select all

EnableExplicit 
DebugLevel 9 

#ProductName$      = "Test-PDF2Text" 
#ProductVersion$   = "0." + #PB_Editor_BuildCount + "." + #PB_Editor_CompileCount 
#ProductCopyright$ = "Copyright (c) 2021 by AHa " 

;' Window Title or Caption 
#MainCaption$    = #ProductName$ + " " + #ProductVersion$ 
#Caption$        = #ProductName$ + " " 


;' Program Flags 
#PB_Program_Flags = #PB_Program_Open|#PB_Program_Read|#PB_Program_Error|#PB_Program_Ascii|#PB_Program_Hide 

#ExecProg$ = "C:\Tools\Xpdf\bin64\pdftotext.exe"  ;' @TODO please adjust here to your own needs
#DefaultFile$ = "C:\Tools\Xpdf\bin64\test.pdf"    ;' @TODO please adjust here to your own needs


Enumeration EWindow 1 ;' starting with 1 because zero is reserved for Errors, Defaults, etc. 
  #WINDOW_Main  
EndEnumeration 

Enumeration EGadget 1 ;' I do not use zero here as well 
  #BUTTON_AllPages 
  #BUTTON_FirstPage  
  #STRING_File 
  #LIST_Text 
EndEnumeration 

Procedure RunPDFToText(File$, xAllPages) 
  Protected result, pid, stdout$, stderr$, tmp$  

  If FindString(File$, " ") > 0 
    File$ = #DQUOTE$ + File$ + #DQUOTE$                                        :Debug "INFO : take care of spaces in Filename "
  EndIf 

  tmp$ = "" 
  If xAllPages = 0 
    tmp$ = "-f 1 -l 1 " 
  EndIf 
  
  pid = RunProgram(#ExecProg$, tmp$ + File$ + " -", "", #PB_Program_Flags) 
  If IsProgram(pid)                                                            :Debug "INFO : running " 
    While ProgramRunning(pid) 
      If AvailableProgramOutput(pid) 
        stdout$ = ReadProgramString(pid) 
        If stdout$ 
          AddGadgetItem(#LIST_Text, -1, stdout$)                              ;:Debug ""+LSet(stdout$, 112)+" ; = "+RSet(Str(Len(stdout$)), 3)+" Chars " 
        EndIf 
        stderr$ = ReadProgramError(pid, #PB_Ascii)  : If stderr$ :Debug "ERROR: " + stderr$ : EndIf 
      EndIf 
    Wend ;' ProgramRunning() 
    result = ProgramExitCode(pid)                                              :Debug "INFO : ExitCode = " + result  
    CloseProgram(pid)                                                          :Debug "INFO : done" 
  EndIf 
  ProcedureReturn result 
EndProcedure ;() 


;=== Main Window ======================================================================================================

Procedure OnEventSizeMainWindow() 
  Protected ww, wh 

  ww = WindowWidth(#WINDOW_Main) : wh = WindowHeight(#WINDOW_Main) 
  ResizeGadget(#STRING_File, #PB_Ignore, #PB_Ignore, ww-172, #PB_Ignore) 

  ResizeGadget(#BUTTON_AllPages, ww-164, #PB_Ignore, #PB_Ignore, #PB_Ignore) 
  ResizeGadget(#BUTTON_FirstPage, ww-84, #PB_Ignore, #PB_Ignore, #PB_Ignore) 

  ResizeGadget(#LIST_Text, #PB_Ignore, #PB_Ignore, ww-8, wh-36) 
EndProcedure 

Procedure OpenMainWindow() 
  Protected hwnd, ww, wh  

  ww = 800 : wh = 600 
  hwnd = OpenWindow(#WINDOW_Main, 0, 0, ww, wh , #MainCaption$, #PB_Window_SystemMenu|#PB_Window_ScreenCentered|#PB_Window_SizeGadget) 
  If hwnd <> 0 
    StringGadget(#STRING_File, 4, 5, ww-172, 20, #DefaultFile$)  
    SendMessage_(GadgetID(#STRING_File), #EM_SETCUEBANNER, #Null, @"filename of the pdf file (copy the filename here)") 
    SHAutoComplete_(GadgetID(#STRING_File), $50000021) ;' #SHACF_AUTOAPPEND_FORCE_ON | #SHACF_AUTOSUGGEST_FORCE_ON | #SHACF_FILESYSTEM | #SHACF_FILESYS_DIRS 

    ButtonGadget(#BUTTON_AllPages, ww-164, 4, 76, 24, "All Pages") 
    ButtonGadget(#BUTTON_FirstPage, ww-84, 4, 76, 24, "First Page") 

    ListViewGadget(#LIST_Text, 4, 32, ww-8, wh-36) 
    BindEvent(#PB_Event_SizeWindow, @OnEventSizeMainWindow(), #WINDOW_Main) 
    EnableGadgetDrop(#STRING_File, #PB_Drop_Files, #PB_Drag_Copy) ;' make it droppable for filenames 
  EndIf 

  ProcedureReturn hwnd 
EndProcedure 

Procedure main() 
  Protected tt$ 

  CoInitialize_(#Null) ;' for autocomplete in stringgadgets  
  If OpenMainWindow() 
    Repeat 
      Select WaitWindowEvent() 
        Case #PB_Event_CloseWindow 
          Break 

        Case #PB_Event_Gadget 
          Select EventGadget() 
            Case #BUTTON_FirstPage 
              tt$ = GetGadgetText(#STRING_File) 
              ClearGadgetItems(#LIST_Text) 
              RunPDFToText(tt$, 0) 

            Case #BUTTON_AllPages  
              tt$ = GetGadgetText(#STRING_File) 
              ClearGadgetItems(#LIST_Text) 
              RunPDFToText(tt$, 1) 
          EndSelect 

        Case #PB_Event_GadgetDrop 
          Select EventGadget()
            Case #STRING_File  
              tt$ = EventDropFiles() 
              tt$ = StringField(tt$, 1, #LF$) 
              SetGadgetText(#STRING_File, tt$) 
              ClearGadgetItems(#LIST_Text) 
              RunPDFToText(tt$, 0) 
          EndSelect 
         EndSelect ; WaitWindowEvent() 
    ForEver 
  EndIf ; OpenWindow()  

  CoUninitialize_()  ;' autocomplete 
  ProcedureReturn 0 
EndProcedure 

End main() 
;BoF 
Mostly running PureBasic <latest stable version and current alpha/beta> (x64) on Windows 11 Home
Post Reply