Make a lib (from C-Code) and get Esgrid!

Marco2007 · Post by **Marco2007** » Tue Dec 09, 2008 7:32 pm

Hello to everyone,

need some help from C-User. I`m very busy at work now, so I can`t try it by myself. ...and I`m a little too bad

.

I need a lib and dll for extracting text from pdf.
Procedure (pdf.s, outputtxt.s) or something like that.

Here`s the code: http://www.codeproject.com/KB/cpp/ExtractPDFText.aspx

First one, who will do that for me, will get Esgrid (I will buy a new licence for him -> Srod will send him/her the key).

thanx
Marco

Here`s the Source: http://www.free-space.at/elke/ExtractPDFText_src.zip

SFSxOI · Post by **SFSxOI** » Tue Dec 09, 2008 8:10 pm

http://www.rentacoder.com/RentACoder/Do ... fault.aspx

Just kidding

Anyway, I was just looking at doing something like this...maybe, if i could do it quickly enough. I have about around 3000 .pdf documents that need the text extracted and archived. I think what the boss wants to end up doing is have Adobe do it in some way. If you come up with something please let the rest of us know.

You know why they call it Adobe Acrobat? Because you have to be an acrobat to use it. Ughhhh...I hate .pdf to begin with.

Marco2007 · Post by **Marco2007** » Tue Dec 09, 2008 8:15 pm

The exe, which is on that site works really good.
If someone could do a lib (of course it must work) -> it should be for everyone.

srod · Post by **srod** » Tue Dec 09, 2008 8:59 pm

I'd do it, but I already own a copy of EsGRID!

Marco2007 · Post by **Marco2007** » Tue Dec 09, 2008 9:06 pm

@Srod: I would like if you`d do it! Whatcha want?

srod · Post by **srod** » Tue Dec 09, 2008 10:04 pm

Sorry mate - haven't the time right now.

From what I know about the pdf format though I really don't think it would be difficult to code such a routine from scratch. Something I'd be interested in looking at when I get time.

Marco2007 · Post by **Marco2007** » Tue Dec 09, 2008 10:08 pm

Anyone else?

Marco2007 · Post by **Marco2007** » Tue Dec 09, 2008 10:23 pm

Ok! I got a solution, because the code from Codeproject doesn`t work perfectly like I want with my pdfs.

My solution: RunProgram the pdf -> Stringmark all -> Copy and paste it then into a textfile -> not the best solution, but it works.

milan1612 · Post by **milan1612** » Tue Dec 09, 2008 10:37 pm

http://rapidshare.com/files/171884768/pdftext.zip.html

There you are, I tested it briefly and didn't find any bugs. Let me now if you find one.
As I already have an EsGrid license I want you to donate the money to Srod,
he truly deserves it!

Xombie · Post by **Xombie** » Tue Dec 09, 2008 10:53 pm

Caught this thread by a happy accident. @milan1612 - I tested your code on two different PDF files and it only wrote a 0 byte text file. Do you have a small PDF file that worked on your system for me to test on mine?

milan1612 · Post by **milan1612** » Tue Dec 09, 2008 10:57 pm

Here is the Call of Duty 4 manual:
http://rapidshare.com/files/171890493/manual.pdf.html
Works quite well here...

srod · Post by **srod** » Tue Dec 09, 2008 11:09 pm

milan1612 wrote:http://rapidshare.com/files/171884768/pdftext.zip.html

There you are, I tested it briefly and didn't find any bugs. Let me now if you find one.
As I already have an EsGrid license I want you to donate the money to Srod,
he truly deserves it!

Marco, please - if I can, whilst it's a very kind offer and much appreciated, would you mind donating to Purebasic instead; I think that Fred and co are more deserving than I.

Xombie · Post by **Xombie** » Tue Dec 09, 2008 11:10 pm

Can you try the file here: http://www.esri.com/library/whitepapers ... pefile.pdf

I've only found one PDF file on my system that works out of 10 so far.

srod · Post by **srod** » Tue Dec 09, 2008 11:14 pm

Yes that particular pdf file must be using one of the alternative compression schemes for object streams than that supported by this c library.

Xombie · Post by **Xombie** » Tue Dec 09, 2008 11:15 pm

Or some protection in place?

milan1612 - will you release your converted source code?