Page 1 of 2

Replace text in .pdf file?

Posted: Fri Dec 20, 2024 5:08 am
by camille
Hello!

Is it possible with PB to search and replace text in a .pdf file and save the modified .pdf afterwards?

The content of the pdf has a graphic at the top left and all other text is editable but of course the text is formatted like: "this text part is blue", "this text has a different font size than other lines", etc.

I would like to prefill this document with words to find like:

Code: Select all

recipient_adress_name
recipient_adress_street
recipient_adress_city
...
and read the belonging values e.g. from a .ini file

Code: Select all

[Recipient]
recipient_adress_name=Walter Stanfield
recipient_adress_name=Sesamestreet 1-8
recipient_adress_city=Dallas
and afterwards search for the words and replace it with the belonging value without breaking the design of the pdf at the end...

Merci beacoup!

Re: Replace text in .pdf file?

Posted: Fri Dec 20, 2024 11:50 am
by Axolotl
Hello back!
when it comes to software, I would never say never or I would never say it can't be done.
It's not easy and (as far as I know) there is no PB library or PB functions.
Suggestion based on my application experience:
Take a file with editor format (e.g. .rtf) and generate a PDF file from it after you have made the desired changes.
I use placeholders such as $ProgramName$ or similar in the text.

Anyway, a PDF Library I know is MuPDF -- no interface to PB right now. (I think.)

Re: Replace text in .pdf file?

Posted: Fri Dec 20, 2024 12:18 pm
by Axolotl
A little search for PDF Libraries (C or C++ based) resulted in the following list:

Projects on github
* PDF-Writer
* PoDoFo
* VersyPDF
* libharu-pdf

To be honest, I never tried any of them, but maybe there is already something from the pros.

Re: Replace text in .pdf file?

Posted: Fri Dec 20, 2024 12:51 pm
by highend
Thank you, Axolotl!

I've tried to modify it via Python and PyMuPDF and in general, it works...

The problem: It uses e.g. a different font than the one that is used on that line where the replacement happens (and the font is in the .pdf file and additionally installed on the computer where I've used the python script)...

I guess I can't use .rtf ;(
E.g. I have a graphic (a logo) in the top left corner and I need to change text in a block on the right side of it.

When I open the .rtf with Wordpad and add the logo, I can't have a multiline text block on the right hand side of it.

I guess everything could be automated perfectly (by first searching and replacing text in a human readable text file and then creating a .pdf out of it), e.g. by using LaTEX but that would require a gigabyte of software to install and enough time to dive into LaTEX...

What I really do not want: An office software to install (MS Office, Libre Office or anything like that).
Too much bloatware...

:mrgreen:

Re: Replace text in .pdf file?

Posted: Fri Dec 20, 2024 1:37 pm
by Axolotl
Understandable.

I guess you can manage the .rtf way by using a table like this:
The table is a two column and one row one and the separator lines are invisible.
Insert the picture (logo) in the first column and add the text in the others.

Code: Select all

{\rtf1\ansi\ansicpg1252\deff0\nouicompat\deflang1031{\fonttbl{\f0\froman\fcharset0 Times New Roman;}}
{\*\generator Riched20 10.0.22621}\viewkind4\uc1\trowd\trgaph108\trpaddl108\trpaddr108\trpaddfl3\trpaddfr3
\cellx4820\cellx9640 
\pard\intbl\nowidctlpar\hyphpar0\charscalex100\kerning1\f0\fs20 <Pict>\cell Name\par
City\par
something\cell\row 
\pard\nowidctlpar\hyphpar0\par
}
My way of doing this.
  1. Start my word processor and enter the stuff I want show and store as Doc1.rtf
  2. Open Doc1.rtf in Wordpad check the result and store it as Doc1-1.rtf
  3. Open Doc1-1.rtf in wordpad and do all the stuff I want with Placeholders and so on
The second step is important, because my word processor is very very talkative (means: the rtf contains far too many details that are not needed)
Bear in mind, that EditorGadget() is not able to deal with tables. But you can do changes with any text editor as well.

Re: Replace text in .pdf file?

Posted: Fri Dec 20, 2024 10:38 pm
by morosh
May be this isn't what you search exactly, but you can fill fields in pdf programmatically:
using cpdf https://community.coherentpdf.com/, which is free for personnal use, you can add comments programmatically.
with the help of Infratec, I tried the following and it works:

Code: Select all

CompilerIf #PB_Compiler_IsMainFile
  EnableExplicit
CompilerEndIf

Structure cpdf_position 
  cpdf_anchor.i;    /* Position anchor */
  cpdf_coord1.d;    /* Parameter one */
  cpdf_coord2.d;    /* Parameter two */
EndStructure
Global pos.cpdf_position
  #kk=28.346457

Enumeration cpdf_anchor 
  #cpdf_posCentre;      /* Absolute centre */
  #cpdf_posLeft  ;       /* Absolute left */
  #cpdf_posRight ;       /* Absolute right */
  #cpdf_top      ;            /* Top top centre of the page */
  #cpdf_topLeft  ;        /* The top left of the page */
  #cpdf_topRight ;       /* The top right of the page */
  #cpdf_left     ;           /* The left hand side of the page, halfway down */
  #cpdf_bottomLeft;     /* The bottom left of the page */
  #cpdf_bottom    ;         /* The bottom middle of the page */
  #cpdf_bottomRight;    /* The bottom right of the page */
  #cpdf_right      ;          /* The right hand side of the page, halfway down */
  #cpdf_diagonal   ;       /* Diagonal, bottom left To top right */
  #cpdf_reverseDiagonal; /* Diagonal, top left To bottom right */
EndEnumeration

Enumeration cpdf_font 
  #cpdf_timesRoman   ;           /* Times Roman */
  #cpdf_timesBold    ;            /* Times Bold */
  #cpdf_timesItalic  ;          /* Times Italic */
  #cpdf_timesBoldItalic  ;      /* Times Bold Italic */
  #cpdf_helvetica        ;            /* Helvetica */
  #cpdf_helveticaBold    ;        /* Helvetica Bold */
  #cpdf_helveticaOblique ;     /* Helvetica Oblique */
  #cpdf_helveticaBoldOblique ; /* Helvetica Bold Oblique */
  #cpdf_courier              ;              /* Courier */
  #cpdf_courierBold          ;          /* Courier Bold */
  #cpdf_courierOblique       ;       /* Courier Oblique */
  #cpdf_courierBoldOblique   ;    /* Courier Bold Oblique */
EndEnumeration

Enumeration cpdf_justification
  #cpdf_leftJustify    ;   /* Left justify */
  #cpdf_CentreJustify  ; /* Centre justify */
  #cpdf_RightJustify   ;   /* Right justify */
EndEnumeration

PrototypeC.i prototype_cpdf_version()
PrototypeC prototype_cpdf_startup(*argv)
PrototypeC.i prototype_cpdf_fromFile(filename.p-utf8, userpw.p-utf8)
PrototypeC prototype_cpdf_clearError()
PrototypeC.i prototype_cpdf_mergeSimple(*pdfs, length.l)
PrototypeC prototype_cpdf_toFile(pdf.l, filename.p-utf8, linearize.l, make_id.l)
PrototypeC prototype_cpdf_pages(pdf.l)
PrototypeC.l prototype_cpdf_range(n1.l, n2.l)
PrototypeC prototype_cpdf_addText(flag_add.l, pdf.l, page.l, txt.p-utf8, *position.cpdf_position, linespacing.d,
                                  stbates.l, font.i, size.d, red.d, green.d, blue.d, flagunder.l, flagcrop.l, flagoutline.l, opacity.d,
                                  justification.i, flag_midline.l, flag_topline.l, fname.p-utf8, linewidth.d, embed.l)
PrototypeC prototype_cpdf_addTextSimple(pdf.l, page.l, txt.p-utf8, *position.cpdf_position, font.i, size.d)

Global cpdf_library.i
Global cpdf_version_.prototype_cpdf_version
Global cpdf_startup.prototype_cpdf_startup
Global cpdf_fromFile.prototype_cpdf_fromFile
Global cpdf_clearError.prototype_cpdf_clearError
Global cpdf_mergeSimple.prototype_cpdf_mergeSimple
Global cpdf_toFile.prototype_cpdf_toFile
Global cpdf_pages.prototype_cpdf_pages
Global cpdf_range.prototype_cpdf_range
Global cpdf_addText.prototype_cpdf_addText
Global cpdf_addTextSimple.prototype_cpdf_addTextSimple

Global.i cpdf_lastError_
Global.i cpdf_lastErrorString_

Macro cpdf_version()
  PeekS(cpdf_version_(), -1, #PB_UTF8)
EndMacro

Macro cpdf_lastError
  PeekI(cpdf_lastError_)
EndMacro

Macro cpdf_lastErrorString
  PeekS(cpdf_lastErrorString_, -1, #PB_UTF8)
EndMacro


Procedure cpdf_addTextSimple2(pdf.l, page.l, txt.s, posx.f, posy.f, font.i, size.d)
  pos\cpdf_coord1=#kk*posx            ; /* Parameter one */
  pos\cpdf_coord2=#kk*(29.7-posy)          ; /* Parameter two */  
  cpdf_addTextSimple(pdf, page, txt, @pos, font.i, size.d)
EndProcedure


Procedure.i OpenCPDF()
  
  If Not IsLibrary(cpdf_library)
    CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
      cpdf_library = OpenLibrary(#PB_Any, "E:\install\dvd2\PDF utilities\sdk\cpdf\win32\libcpdf.dll")
    CompilerElseIf #PB_Compiler_Processor = #PB_Processor_x64
      cpdf_library = OpenLibrary(#PB_Any, "E:\install\dvd2\PDF utilities\sdk\cpdf\win64\libcpdf.dll")
    CompilerEndIf
    
    If cpdf_library
      cpdf_version_         = GetFunction(cpdf_library, "cpdf_version")
      cpdf_startup          = GetFunction(cpdf_library, "cpdf_startup")
      cpdf_fromFile         = GetFunction(cpdf_library, "cpdf_fromFile")
      cpdf_clearError       = GetFunction(cpdf_library, "cpdf_clearError")
      cpdf_mergeSimple      = GetFunction(cpdf_library, "cpdf_mergeSimple")
      cpdf_toFile           = GetFunction(cpdf_library, "cpdf_toFile")
      cpdf_pages            = GetFunction(cpdf_library, "cpdf_pages")
      cpdf_range            = GetFunction(cpdf_library, "cpdf_range")
      cpdf_addText          = GetFunction(cpdf_library, "cpdf_addText")
      cpdf_addTextSimple    = GetFunction(cpdf_library, "cpdf_addTextSimple")
      
      cpdf_lastError_       = GetFunction(cpdf_library, "cpdf_lastError")
      cpdf_lastErrorString_ = GetFunction(cpdf_library, "cpdf_lastErrorString")
    Else 
      MessageRequester("Error","no libcpdf.dll present")
    EndIf
  EndIf
  
  ProcedureReturn cpdf_library
  
EndProcedure

Procedure CloseCPDF()
  If IsLibrary(cpdf_library)
    CloseLibrary(cpdf_library)
    cpdf_library = #Null
  EndIf
EndProcedure

CompilerIf #PB_Compiler_IsMainFile
  
  Define *argv
  Define.i orig_pdf, output, i
  Dim pdfs.l(2)
  Define range.l
  ;1pt = 1/72 inch = 0.0352777 cm    1cm=28.346457  
  pos\cpdf_anchor=#cpdf_posLeft     ;    /* Position anchor */
  pos\cpdf_coord1=#kk*6            ; /* Parameter one */
  pos\cpdf_coord2=#kk*10          ; /* Parameter two */
  
  If OpenCPDF()
    
    ; Initialise cpdf
    cpdf_startup(@*argv)
    
    ;Debug cpdf_version()
    
    ; We will take the input hello.pdf And Repeat it three times
    orig_pdf = cpdf_fromFile("toto.pdf", "")
    
    ; Check the error state
    If cpdf_lastError = 1
      Debug cpdf_lastErrorString
      End 1
    EndIf
    
    ; Clear the error state
    cpdf_clearError()
    
    ; The Array of PDFs To merge
    pdfs(0) = orig_pdf
    pdfs(1) = orig_pdf
    pdfs(2) = orig_pdf
    
    ; Merge them
    output = cpdf_mergeSimple(pdfs(), 1)
    
    ; Check the error state
    If cpdf_lastError = 1
      Debug cpdf_lastErrorString
      End 1
    EndIf
    
    cpdf_clearError()
    
    range=cpdf_range(1, 1)
    cpdf_addText(#False, output, range, "hahaha!!!", @pos, 1, 0, #cpdf_timesBold, 14, 1, 0.0, 0.0, #False, #False,
                      #False, 1, #cpdf_CentreJustify, #False, #False, "xxx", 5, #False)
    For i=1 To 5
      cpdf_addTextSimple2(output, range, "Hello!!!"+Str(i), 5, i, #cpdf_timesBold, 14)
    Next  
    ; Write output
    cpdf_toFile(output, "output.pdf", #False, #False)
    Debug cpdf_pages(output)
    ; Check the error state
    If cpdf_lastError = 1
      Debug cpdf_lastErrorString
      End 1
    EndIf
    
    CloseCPDF()
  EndIf
CompilerEndIf
RunProgram("output.pdf")
May be an alternative
HTH

Re: Replace text in .pdf file?

Posted: Sat Dec 21, 2024 12:08 pm
by camille
Great, really, thanks a lot!

rtf is really a mess to read but at last I can do text replacements with minimal effort and it doesn't require huge apps / installations.

I didn't want to have a hard time converting my pdf into rtf so I looked around and stumbled upon "AbleWord" (http://www.ableword.net/).

It can read pdf and save it as rtf. The layout was reproduced a 100%, I've just had to do font adjustments. Cool thing and afaik it's free to use...

The bad thing: Wordpad can of course open that rtf file but it messes up the content really bad so I know need to rely on AbleWord to make changes.
Maybe it's because of the text boxes that it used to replicate the layout...

Re: Replace text in .pdf file?

Posted: Sat Dec 21, 2024 12:09 pm
by camille
@morosh

Thank you!

cpdf does not seem to be capable of finding and replacing text but it could be used to add text to a (blank, apart from already present & fixed elements) pdf file.

I'm not that firm with api access, it seems it's not possible to define a different font to add text apart from the three default ones Times New Roman, Helvetica and Courier)?

Re: Replace text in .pdf file?

Posted: Sat Dec 21, 2024 3:37 pm
by Axolotl
@camille
It is great if you find your way.
Thanks for the hint. I have never heard about AbleWord.

Re: Replace text in .pdf file?

Posted: Wed Jan 08, 2025 6:09 am
by tua
Your exact use case is not entirely clear from your post...

Do you control the PDF? Do you merely want to fill in, say some name & address into a fillable PDF that someone else (often government agencies) provides? Then do some processing when filled, such as saving, emailing etc.?

If so, then there's nothing easier than using Adobe's FDF mechanism (unless you want to spring for hundreds of $$$ for some sophisticated library).
FDF just requires writing a simple text file with FDF extension.

Re: Replace text in .pdf file?

Posted: Sat Jul 05, 2025 4:01 pm
by camille
Hi!

Sorry, I've missed that there was a new posting here :oops:

It's my own pdf that I want to create invoices with.
I've created it / them with PDF-XChange Editor in the first place.

I've stumbled upon https://github.com/pdfcpu/pdfcpu
but it doesn't create a 100% pixel-perfect repro of the original form when you use non-standard fonts (in my case e.g. "PB Sans").
Neither did any other CLI / Python tool (that don't cost a fortune)...

I've used LaTeX but found it quite cumbersome for this task and I don't like that a standard installation of e.g. TeXLive wants to use a whopping few gigabytes^^

But I then found: https://typst.app/home
It's a <40 MB standalone CLI binary which is able to produce 100% pixel-perfect layouts, allows scripting inside the .typ document (crazy^^), has external package support for e.g. creating SEPA QR-Codes in the document, etc.

It wasn't overly complicated to recreate my invoice template file and I can now create perfect (for my usecase) .pdf's with it in a blink of an eye :mrgreen:

Regards,
Camille

Re: Replace text in .pdf file?

Posted: Sat Jul 05, 2025 5:05 pm
by infratec
Have you ever looked at scribus?
I don't know if it can do what you want, but it is 100% free.

www.scribus.net

https://wiki.scribus.net/canvas/Get_Sta ... _Scribus:1

Re: Replace text in .pdf file?

Posted: Sat Jul 05, 2025 10:15 pm
by highend
@infratec It seems she's not looking for (another) Desktop app to create / fill a form but a tool that allows to automate the whole process via command line interface (and still creates a 100% reproducible output even with non-standard fonts)?

Re: Replace text in .pdf file?

Posted: Sun Jul 06, 2025 10:44 am
by infratec
You can fully control scribus by script from cli.

Re: Replace text in .pdf file?

Posted: Sun Jul 06, 2025 12:30 pm
by camille
Hi again!

At infratec: I'll take a look at Scribus, thank you!
At highend: It's true, I'd like to have a cli solution only but I guess I should at least try every single suggestion :P