Convert txt file to Acrobat pdf

Share your advanced PureBasic knowledge/code with the community.
User avatar
doctorized
Addict
Addict
Posts: 882
Joined: Fri Mar 27, 2009 9:41 am
Location: Athens, Greece

Convert txt file to Acrobat pdf

Post by doctorized »

Here is an other code I 've writen. It converts txt files to adobe pdf.

Attention: Some languages, such as greek and Chinese, are not supported.

Code: Select all

; Author: Wicker Man (doctorized in PB)
; Date: May 22 2009
; OS: Windows
; Demo: No
; More codes at: www.geocities.com/kc2000labs/pb/pb.htm


;The following code creates pdf files version 1.2 (Acrobat 3.0).

Global Position.l, pageNo.l, lineNo.l
Global Dim location.l(5000)
Global Dim pageObj.l(5000)
Global lines.l, obj.l, Tpages.l, encoding.l, resources.l, pages.l, pointSize.q
Global vertSpace.d, info.l, root.l, npagex.d, npagey.l, linelen.l, cache.s
Global FileTXT.s, FilePDF.s
Global AppName.s, Author.s, Creator.s, Keywords.s, Subject.s, Title.s, BaseFont.s, rotate.l, pageWidth.d, pageHeight.d

Declare.l StartPage()
Declare.s endpage(streamstart.l)

Procedure writepdf(stre.s, flush.l=0)
Position + Len(stre)
cache + stre + Chr(13)
If Len(cache) > 32000 Or flush > 0
	OpenFile(0, FilePDF)
	FileSeek(0, Lof(0))
	WriteStringN(0, cache)
	CloseFile(0)
	cache = ""
EndIf
EndProcedure
  
Procedure WriteStart()
  writepdf ("%PDF-1.2")
  writepdf ("%βγΟΣ")
EndProcedure

Procedure WriteHead()
CreationDate.s = "D:" + FormatDate( "%YYYY%MM%DD%HH%II%SS",Date())
obj + 1
location(obj) = Position
info = obj

writepdf (Str(obj) + " 0 obj")
writepdf ("<<")
writepdf ("/Author (" + Author + ")")
writepdf ("/CreationDate (" + CreationDate + ")")
writepdf ("/Creator (" + Creator + ")")
writepdf ("/Producer (" + AppName + ")")
writepdf ("/Title (" + Title + ")")
writepdf ("/Subject (" + Subject + ")")
writepdf ("/Keywords (" + Keywords + ")")
writepdf (">>")
writepdf ("endobj")

obj + 1
root = obj
obj + 1
Tpages = obj
encoding = obj + 2
resources = obj + 3

obj + 1
location(obj) = Position
writepdf (Str(obj) + " 0 obj")
writepdf ("<<")
writepdf ("/Type /Font")
writepdf ("/Subtype /Type1")
writepdf ("/Name /F1")
writepdf ("/Encoding " + Str(encoding) + " 0 R")
writepdf ("/BaseFont /" + BaseFont)
writepdf (">>")
writepdf ("endobj")

obj + 1
location(obj) = Position
writepdf (Str(obj) + " 0 obj")
writepdf ("<<")
writepdf ("/Type /Encoding")
writepdf ("/BaseEncoding /WinAnsiEncoding")
writepdf (">>")
writepdf ("endobj")

obj + 1
location(obj) = Position
writepdf (Str(obj) + " 0 obj")
writepdf ("<<")
writepdf ("  /Font << /F1 " + Str(obj - 2) + " 0 R >>")
writepdf ("  /ProcSet [ /PDF /Text ]")
writepdf (">>")
writepdf ("endobj")
EndProcedure
  
Procedure WritePages()
line.s: tmpline.s: beginstream.l
If ReadFile(1, FileTXT)
	beginstream = StartPage()
	lineNo = -1
	While Not Eof(1)
		line = ReadString(1)
      lineNo + 1
        
        ;page Break
        If lineNo >= lines Or FindString(line, Chr(12),1) > 0
          writepdf("1 0 0 1 " + Str(npagex) + " " + Str(npagey) + " Tm")
          writepdf("(" + Str(pageNo) + ") Tj")
          writepdf("/F1 " + Str(pointSize) + " Tf")
          endpage (beginstream)
          beginstream = StartPage()
        EndIf
        
        line = ReplaceString(ReplaceString(ReplaceString(line, "\", "\\"), "(", "\("), ")", "\)")
        
        If Len(line) > linelen
          
          ;word wrap
          While Len(line) > linelen
            tmpline = Left(line, linelen)
            For i = Len(tmpline) To (Len(tmpline) / 2) Step -1
              If FindString("*+^%$#,. ;<=>[])}!" + Chr(34), Mid(tmpline, i, 1),1)
                tmpline = Left(tmpline, i)
                Break
              EndIf
            Next
            
            line = Mid(line, Len(tmpline) + 1)
            writepdf("T* (" + tmpline + Chr(13) + Chr(10) + ") Tj")
            lineNo = lineNo + 1
            
            ;page Break
            If lineNo >= lines Or FindString(line, Chr(12),1) > 0
              writepdf("1 0 0 1 " + Str(npagex) + " " + Str(npagey) + " Tm")
              writepdf("(" + Str(pageNo) + ") Tj")
              writepdf("/F1 " + Str(pointSize) + " Tf")
              endpage (beginstream)
              beginstream = StartPage()
            EndIf
          Wend
          lineNo + 1
          writepdf("T* (" + line + Chr(13) + Chr(10) + ") Tj")
        Else
          writepdf("T* (" + line + Chr(13) + Chr(10) + ") Tj")
        EndIf
	Wend
	CloseFile(1)
EndIf
writepdf("1 0 0 1 " + Str(npagex) + " " + Str(npagey) + " Tm")
writepdf("(" + Str(pageNo) + ") Tj")
writepdf("/F1 " + Str(pointSize) + " Tf")
endpage (beginstream)
EndProcedure

Procedure.l StartPage()
  strmpos.l
  obj + 1
  location(obj) = Position
  pageNo + 1
  pageObj(pageNo) = obj
  
  writepdf(Str(obj) + " 0 obj")
  writepdf("<<")
  writepdf("/Type /Page")
  writepdf("/Parent " + Str(Tpages) + " 0 R")
  writepdf("/Resources " + Str(resources) + " 0 R")
  obj + 1
  writepdf("/Contents " + Str(obj) + " 0 R")
  writepdf("/Rotate " + Str(rotate))
  writepdf(">>")
  writepdf("endobj")
  
  location(obj) = Position
  writepdf(Str(obj) + " 0 obj")
  writepdf("<<")
  writepdf("/Length " + Str(obj + 1) + " 0 R")
  writepdf(">>")
  writepdf("stream")
  strmpos = Position
  writepdf("BT")
  writepdf("/F1 " + Str(pointSize) + " Tf")
  writepdf("1 0 0 1 50 " + Str(pageHeight - 40) + " Tm")
  writepdf(ReplaceString(StrD(vertSpace), ",", ".") + " TL")
  
  ProcedureReturn strmpos
EndProcedure

Procedure.s endpage(streamstart.l)
streamEnd.l
writepdf("ET")
streamEnd = Position
writepdf("endstream")
writepdf("endobj")
obj + 1
location(obj) = Position
writepdf(Str(obj) + " 0 obj")
writepdf(Str(streamEnd - streamstart))
writepdf ("endobj")
lineNo = 0
EndProcedure

Procedure endpdf()
ty.s: xreF.l
location(root) = Position
writepdf(Str(root) + " 0 obj")
writepdf("<<")
writepdf("/Type /Catalog")
writepdf("/Pages " + Str(Tpages) + " 0 R")
writepdf(">>")
writepdf("endobj")
location(Tpages) = Position
writepdf(Str(Tpages) + " 0 obj")
writepdf("<<")
writepdf("/Type /Pages")
writepdf("/Count " + Str(pageNo))
writepdf("/MediaBox [ 0 0 " + Str(pageWidth) + " " + Str(pageHeight) + " ]")
ty = ("/Kids [ ")
For i = 1 To pageNo
ty + Str(pageObj(i)) + " 0 R "
Next
ty + "]"
writepdf(ty)
writepdf(">>")
writepdf("endobj")
xreF = Position
writepdf("0 " + Str(obj + 1))
writepdf("0000000000 65535 f ")
For i = 1 To obj
writepdf(RSet(Str(location(i)), 10, "0") + " 00000 n ")
Next
writepdf("trailer")
writepdf("<<")
writepdf("/Size " + Str(obj + 1))
writepdf("/Root " + Str(root) + " 0 R")
writepdf("/Info " + Str(info) + " 0 R")
writepdf(">>")
writepdf("startxref")
writepdf(Str(xreF))
writepdf("%%EOF", 1)
EndProcedure

Procedure ConvertToPDF(sFileTXT.s, sFilePDF.s, sAppName.s="", sAuthor.s="", sCreator.s="", sKeywords.s="", sSubject.s="", sTitle.s="", sBaseFont.s="Courier", lpointSize.l = 12, lrotate.l=0, dpageWidth.d = 8.5, lpageHeight.l = 11)
If ReadFile(0,sFileTXT) = 0
	MessageRequester("Error","File " + Chr(34) + sFileTXT + Chr(34) + " not found.",#MB_ICONERROR)
	ProcedureReturn
EndIf
CloseFile(0)

;initialize
FileTXT = sFileTXT
FilePDF= sFilePDF
AppName = sAppName
Author = sAuthor
Creator = sCreator
Keywords = sKeywords
Subject = sSubject
Title = sTitle
BaseFont = sBaseFont
pointsize = lpointsize
rotate = lrotate
pageHeight = lpageHeight * 72
pageWidth = dpageWidth * 72
obj=0
Position = 0
cache = ""
vertSpace = pointsize * 1.2 ; Vertical spacing
lines = (pageHeight - 72) / vertSpace ; no of lines on one page
If FindString(LCase(BaseFont), "courier",1) ; for Courier font
	linelen = 1.5 * pageWidth / pointSize
ElseIf FindString(LCase(BaseFont), "arial",1) ; for Arial font
	linelen = 2 * pageWidth / pointSize
ElseIf FindString(LCase(BaseFont), "times-roman",1) ; for Time New Roman font
	linelen = 2.2 * pageWidth / pointSize
Else
	linelen = 2.2 * pageWidth / pointSize ; any other font
EndIf

npagex = pageWidth / 2
npagey = 25

WriteStart()
WriteHead()
WritePages()
endpdf()
EndProcedure

If OpenWindow(0, 300, 300, 350, 80,"PDF Converter", #PB_Window_SystemMenu | #PB_Window_MinimizeGadget | #PB_Window_ScreenCentered)

 If CreateGadgetList(WindowID(0))
 StringGadget(0,10,10,330,20,GetPathPart(ProgramFilename()) + "testTXT.txt"); use your own txt file.
 StringGadget(1,10,30,330,20,GetPathPart(ProgramFilename()) + "testPDF.pdf"); set the path you want to save the file.
 ButtonGadget(2,10,50,60,25,"Convert")
 EndIf
 
 Repeat
    EventID = WaitWindowEvent()

    If EventID = #PB_Event_CloseWindow  ; If the user has pressed on the close button
      Quit = 1
    ElseIf EventID = #PB_Event_Gadget
    	If EventGadget() = 2
    		ConvertToPDF(GetGadgetText(0),GetGadgetText(1))
    	EndIf
    EndIf
  Until Quit = 1
 
EndIf

End 
User avatar
Kiffi
Addict
Addict
Posts: 1504
Joined: Tue Mar 02, 2004 1:20 pm
Location: Amphibios 9

Re: Convert txt file to Acrobat pdf

Post by Kiffi »

doctorized wrote:Here is an other code I 've writen. It converts txt files to adobe pdf.
cool! Thanks for sharing! Image

Greetings ... Kiffi
Hygge
User avatar
doctorized
Addict
Addict
Posts: 882
Joined: Fri Mar 27, 2009 9:41 am
Location: Athens, Greece

Re: Convert txt file to Acrobat pdf

Post by doctorized »

Kiffi wrote:
doctorized wrote:Here is an other code I 've writen. It converts txt files to adobe pdf.
cool! Thanks for sharing! Image

Greetings ... Kiffi
I am working on a code that creates pdf version 1.3 with importing images, rotating texts and more. I only want to find out a way to import fonts and the project will be completed. Stay connected!
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Post by srod »

Interesting.

Will you be supporting unicode fonts?
I may look like a mule, but I'm not a complete ass.
User avatar
doctorized
Addict
Addict
Posts: 882
Joined: Fri Mar 27, 2009 9:41 am
Location: Athens, Greece

Post by doctorized »

srod wrote:Interesting.

Will you be supporting unicode fonts?

I want to fully support all languages. I want to believe that all type of fonts will be supported. I am working on it.
PB
PureBasic Expert
PureBasic Expert
Posts: 7581
Joined: Fri Apr 25, 2003 5:24 pm

Post by PB »

Pretty good so far! Is there a way to change the font size? I need it smaller
for my app. I don't know anything about PDFs though, so not sure.

Also, does this infringe on any of Adobe's patents or anything? I wouldn't
want to get sued if the source code is (c) to Adobe or something. Is the
code 100% your own work?
I compile using 5.31 (x86) on Win 7 Ultimate (64-bit).
"PureBasic won't be object oriented, period" - Fred.
User avatar
doctorized
Addict
Addict
Posts: 882
Joined: Fri Mar 27, 2009 9:41 am
Location: Athens, Greece

Post by doctorized »

PB wrote:Pretty good so far! Is there a way to change the font size? I need it smaller
for my app.
Procedure ConvertToPDF can take many optional parameters, ot only the txt and pdf file names. If you write: ConvertToPDF(GetGadgetText(0),GetGadgetText(1),"","","","","","","",6) you can change the font size. The default size is 12. Use here any number you want.
PB wrote: Is the code 100% your own work?
I found the code some years ago somewhere writen in VisualBasic 6 and there was no info about the author. I mean that the code is not coming from Adobe or any other company with any kind of copyright. If anyone knows the writer, let as know his name.
PB
PureBasic Expert
PureBasic Expert
Posts: 7581
Joined: Fri Apr 25, 2003 5:24 pm

Post by PB »

> The default size is 12. Use here any number you want.

Oops, I didn't see that at first due to the long line (another reason why a
line continuation char would be nice for the IDE). Thanks for telling me! :)
I compile using 5.31 (x86) on Win 7 Ultimate (64-bit).
"PureBasic won't be object oriented, period" - Fred.
Seymour Clufley
Addict
Addict
Posts: 1265
Joined: Wed Feb 28, 2007 9:13 am
Location: London

Post by Seymour Clufley »

This is great work, Doctorized. Thanks!
PB
PureBasic Expert
PureBasic Expert
Posts: 7581
Joined: Fri Apr 25, 2003 5:24 pm

Post by PB »

Any more progress on this? :)
I compile using 5.31 (x86) on Win 7 Ultimate (64-bit).
"PureBasic won't be object oriented, period" - Fred.
User avatar
doctorized
Addict
Addict
Posts: 882
Joined: Fri Mar 27, 2009 9:41 am
Location: Athens, Greece

Post by doctorized »

PB wrote:Any more progress on this? :)
I have stuck with the unicodes generally, not only the fonts.

Is there a way to translate "StrConv(ImgColor, vbUnicode)" form VB? ImgColor is byte array.
WilliamL
Addict
Addict
Posts: 1252
Joined: Mon Aug 04, 2008 10:56 pm
Location: Seattle, USA

Post by WilliamL »

I am using a Mac and trying to see if your program will run. I ran into a problem with "%βγΟΣ" which I could write as "%"+chr(?)+chr(?)+chr?". I think it might run on a Mac.

What I really need is a pdf to text code. Any ideas?
MacBook Pro-M1 (2021), Sequoia 15.4, PB 6.20
User avatar
doctorized
Addict
Addict
Posts: 882
Joined: Fri Mar 27, 2009 9:41 am
Location: Athens, Greece

Post by doctorized »

WilliamL wrote:I am using a Mac and trying to see if your program will run. I ran into a problem with "%βγΟΣ" which I could write as "%"+chr(?)+chr(?)+chr?". I think it might run on a Mac.
use: "%" + Chr(226) + Chr(227) + Chr(207) + Chr(211)
What I really need is a pdf to text code. Any ideas?
what version of pdf you need to convert to text? If we are talking for simple format of version 1.2, the same as my code creates, it is very simple to get the text from it. The only thing you should do to this case, is to open the pdf file with notepad to see what is inside to be able to write a code to get the text. If the file is not so simple, if for example there are embedded fonts or images, the things are more compicated.
User avatar
talisman
Enthusiast
Enthusiast
Posts: 231
Joined: Sat May 23, 2009 9:33 am

Post by talisman »

As a side note many e-books and the like are scans of the actual works, in which case you need a text recognition application... that means... complication!
WilliamL
Addict
Addict
Posts: 1252
Joined: Mon Aug 04, 2008 10:56 pm
Location: Seattle, USA

Post by WilliamL »

I'm going to try the CHR() commands when I get a chance and get back to you. [later] Yes, it appears to work (on a Mac)! I used OpenFileRequester() and it was easier to get the file name.
open the pdf file with notepad to see what is inside to be able to write a code to get the text
I did that and it worked but it seemed clumsy. I just thought there might be some logic to the distribution to the text which would make it easier to find the starting point of the text. I wondered if there was some character sequence that always precedes the text? The main thing is that my effort worked for my needs.
MacBook Pro-M1 (2021), Sequoia 15.4, PB 6.20
Post Reply