Make a lib (from C-Code) and get Esgrid!

For everything that's not in any way related to PureBasic. General chat etc...
milan1612
Addict
Addict
Posts: 894
Joined: Thu Apr 05, 2007 12:15 am
Location: Nuremberg, Germany
Contact:

Post by milan1612 »

Xombie wrote:milan1612 - will you release your converted source code?
I didn't change much and the main problem was the compilation & linking process
but if you really like to see it: http://rapidshare.com/files/171898788/pdf.cpp.html

And Marco, if Srod doesn't want the money feel free to donate it to Fantaisie Software...
Windows 7 & PureBasic 4.4
User avatar
Rook Zimbabwe
Addict
Addict
Posts: 4322
Joined: Tue Jan 02, 2007 8:16 pm
Location: Cypress TX
Contact:

Post by Rook Zimbabwe »

Xombie... a possibility also: Could it be Page as Image?
Binarily speaking... it takes 10 to Tango!!!

Image
http://www.bluemesapc.com/
gnozal
PureBasic Expert
PureBasic Expert
Posts: 4229
Joined: Sat Apr 26, 2003 8:27 am
Location: Strasbourg / France
Contact:

Post by gnozal »

I just tried milan1612's release.
It doesn't seem to work correctly with non english text, i.e. using characters like éà...
For free libraries and tools, visit my web site (also home of jaPBe V3 and PureFORM).
milan1612
Addict
Addict
Posts: 894
Joined: Thu Apr 05, 2007 12:15 am
Location: Nuremberg, Germany
Contact:

Post by milan1612 »

gnozal wrote:I just tried milan1612's release.
It doesn't seem to work correctly with non english text, i.e. using characters like éà...
It's because the original C++ code is ASCII only, no unicode :?
Windows 7 & PureBasic 4.4
srod
PureBasic Expert
PureBasic Expert
Posts: 10589
Joined: Wed Oct 29, 2003 4:35 pm
Location: Beyond the pale...

Post by srod »

Well, pdf is not Unicode by default; it essentially uses a 7-bit Ascii encoding. My limited understanding of the pdf format indicates that to support unicode you essentially have to create tables of character codes (to map character codes to font glyphs etc.) and from what I saw of the c-source, the library doesn't seem equipped at all to deal with unicode character sets. Whether the library uses a wide-character string representation for it's variables or not is, I thus think, immaterial.
I may look like a mule, but I'm not a complete ass.
milan1612
Addict
Addict
Posts: 894
Joined: Thu Apr 05, 2007 12:15 am
Location: Nuremberg, Germany
Contact:

Post by milan1612 »

Found another possibility to extract text from a pdf: xpdf (Direct download)
It is a pack of opensource utilities containing, among others, a pdf2txt commandline
utility which I must say works much better than the C++ library.
Windows 7 & PureBasic 4.4
Marco2007
Enthusiast
Enthusiast
Posts: 648
Joined: Tue Jun 12, 2007 10:30 am
Location: not there...

Post by Marco2007 »

@Milan: Could you do it with the better one, please? :D

...as I wrote, there are problems with this one:
There`s always 240 in the txt-File:
SCHNITT-240
SPALT
LINSEN-240
BRENNWEITE
DUESEN-240
DURCHMESSER
MAX. LASER-240
LEISTUNG
EINSTELL-240
TEILENUMMER:240
TEILE-ID:240

Any ideas, why?

...and txt-File is empty with this pdf (created with PurePdf):
http://www.free-space.at/elke/Marco.pdf

You decide, what happen -> Fantaisie Software will get the Donation and we all have something, what we can use. That`s good!

Any chance for the better pdfextracter? Now it`s a little problematic...

Thank you!
PureBasic for Windows
milan1612
Addict
Addict
Posts: 894
Joined: Thu Apr 05, 2007 12:15 am
Location: Nuremberg, Germany
Contact:

Post by milan1612 »

@Marco
I had another look on the tool mentioned above today. I managed to compile it
from source, but before I even try to make a library out of it I have to know
if the utility is good enough for you. Do you mind trying if it works for you?
Windows 7 & PureBasic 4.4
Marco2007
Enthusiast
Enthusiast
Posts: 648
Joined: Tue Jun 12, 2007 10:30 am
Location: not there...

Post by Marco2007 »

Of Course! I pm you...
PureBasic for Windows
Xombie
Addict
Addict
Posts: 898
Joined: Thu Jul 01, 2004 2:51 am
Location: Tacoma, WA
Contact:

Post by Xombie »

I'm watching this thread with an eagle eye. It would be very useful to my work-work project to extract text from PDF files.

Let me know if y'all need any additional testing or help on compiling and such.
milan1612
Addict
Addict
Posts: 894
Joined: Thu Apr 05, 2007 12:15 am
Location: Nuremberg, Germany
Contact:

Post by milan1612 »

Xombie wrote:I'm watching this thread with an eagle eye. It would be very useful to my work-work project to extract text from PDF files.

Let me know if y'all need any additional testing or help on compiling and such.
The library conversion of the tool mentioned above is finished, Marco and I
are currently testing various PDFs. It's working much better than my first
conversion, if you pm me your e-mail I can send you the library...
The more testers the better the result :wink:
Windows 7 & PureBasic 4.4
Marco2007
Enthusiast
Enthusiast
Posts: 648
Joined: Tue Jun 12, 2007 10:30 am
Location: not there...

Post by Marco2007 »

It`s brilliant. I tested it with Pdfs created with PurePDF and I testet different pdfs. Milan`s work is really great!! ...no problems.

@Fantaisie Software: ...could take til Monday (i have to reload my electron prepaid visa for PayPal).

Thanks a lot to Milan! :D
PureBasic for Windows
milan1612
Addict
Addict
Posts: 894
Joined: Thu Apr 05, 2007 12:15 am
Location: Nuremberg, Germany
Contact:

Post by milan1612 »

No problem Marco, it was fun to refresh my C++ knowledge. Please don't forget that
the major work on this library wasn't done by me but by the original authors of Xpdf.
For all the others here is the link: http://rapidshare.com/files/172190796/pdf2text.zip
Windows 7 & PureBasic 4.4
MachineCode
Addict
Addict
Posts: 1482
Joined: Tue Feb 22, 2011 1:16 pm

Re:

Post by MachineCode »

milan1612 wrote:For all the others here is the link: http://rapidshare.com/files/172190796/pdf2text.zip
Anyone got this pdftext.zip file? This link is 404.
Microsoft Visual Basic only lasted 7 short years: 1991 to 1998.
PureBasic: Born in 1998 and still going strong to this very day!
Little John
Addict
Addict
Posts: 4777
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Re:

Post by Little John »

MachineCode wrote:Anyone got this pdftext.zip file?
Yes. :)

And many thanks to Milan!

Regards, Little John
Post Reply