It is currently Sun May 19, 2013 11:54 am

All times are UTC + 1 hour




Post new topic Reply to topic  [ 33 posts ]  Go to page 1, 2, 3  Next
Author Message
 Post subject: Make a lib (from C-Code) and get Esgrid!
PostPosted: Tue Dec 09, 2008 7:32 pm 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Tue Jun 12, 2007 10:30 am
Posts: 615
Location: not there...
Hello to everyone,

need some help from C-User. I`m very busy at work now, so I can`t try it by myself. ...and I`m a little too bad :wink: .

I need a lib and dll for extracting text from pdf.
Procedure (pdf.s, outputtxt.s) or something like that.

Here`s the code: http://www.codeproject.com/KB/cpp/ExtractPDFText.aspx

First one, who will do that for me, will get Esgrid (I will buy a new licence for him -> Srod will send him/her the key).

thanx
Marco

Here`s the Source: http://www.free-space.at/elke/ExtractPDFText_src.zip

_________________
4.51 final:-) - Windows 7 :-) and sometimes still XP :-)


Top
 Profile  
 
 Post subject:
PostPosted: Tue Dec 09, 2008 8:10 pm 
Offline
Addict
Addict

Joined: Sat Dec 31, 2005 5:24 pm
Posts: 2970
Location: Where ya would never look.....
http://www.rentacoder.com/RentACoder/Do ... fault.aspx

Just kidding :)

Anyway, I was just looking at doing something like this...maybe, if i could do it quickly enough. I have about around 3000 .pdf documents that need the text extracted and archived. I think what the boss wants to end up doing is have Adobe do it in some way. If you come up with something please let the rest of us know.

You know why they call it Adobe Acrobat? Because you have to be an acrobat to use it. Ughhhh...I hate .pdf to begin with.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Dec 09, 2008 8:15 pm 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Tue Jun 12, 2007 10:30 am
Posts: 615
Location: not there...
The exe, which is on that site works really good.
If someone could do a lib (of course it must work) -> it should be for everyone.

_________________
4.51 final:-) - Windows 7 :-) and sometimes still XP :-)


Top
 Profile  
 
 Post subject:
PostPosted: Tue Dec 09, 2008 8:59 pm 
Offline
PureBasic Expert
PureBasic Expert
User avatar

Joined: Wed Oct 29, 2003 4:35 pm
Posts: 9870
Location: Beyond the pale...
I'd do it, but I already own a copy of EsGRID! :wink:

_________________
I may look like a mule, but I'm not a complete ass.

eScript
Arctic Reports
nxSoftware


Top
 Profile  
 
 Post subject:
PostPosted: Tue Dec 09, 2008 9:06 pm 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Tue Jun 12, 2007 10:30 am
Posts: 615
Location: not there...
@Srod: I would like if you`d do it! Whatcha want?

_________________
4.51 final:-) - Windows 7 :-) and sometimes still XP :-)


Top
 Profile  
 
 Post subject:
PostPosted: Tue Dec 09, 2008 10:04 pm 
Offline
PureBasic Expert
PureBasic Expert
User avatar

Joined: Wed Oct 29, 2003 4:35 pm
Posts: 9870
Location: Beyond the pale...
Sorry mate - haven't the time right now. :)

From what I know about the pdf format though I really don't think it would be difficult to code such a routine from scratch. Something I'd be interested in looking at when I get time.

_________________
I may look like a mule, but I'm not a complete ass.

eScript
Arctic Reports
nxSoftware


Top
 Profile  
 
 Post subject:
PostPosted: Tue Dec 09, 2008 10:08 pm 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Tue Jun 12, 2007 10:30 am
Posts: 615
Location: not there...
:(

Anyone else?

_________________
4.51 final:-) - Windows 7 :-) and sometimes still XP :-)


Top
 Profile  
 
 Post subject:
PostPosted: Tue Dec 09, 2008 10:23 pm 
Offline
Enthusiast
Enthusiast
User avatar

Joined: Tue Jun 12, 2007 10:30 am
Posts: 615
Location: not there...
Ok! I got a solution, because the code from Codeproject doesn`t work perfectly like I want with my pdfs.

My solution: RunProgram the pdf -> Stringmark all -> Copy and paste it then into a textfile -> not the best solution, but it works.

_________________
4.51 final:-) - Windows 7 :-) and sometimes still XP :-)


Top
 Profile  
 
 Post subject:
PostPosted: Tue Dec 09, 2008 10:37 pm 
Offline
Addict
Addict
User avatar

Joined: Thu Apr 05, 2007 12:15 am
Posts: 899
Location: Nuremberg, Germany
http://rapidshare.com/files/171884768/pdftext.zip.html

There you are, I tested it briefly and didn't find any bugs. Let me now if you find one.
As I already have an EsGrid license I want you to donate the money to Srod,
he truly deserves it!

_________________
Windows 7 & PureBasic 4.4


Top
 Profile  
 
 Post subject:
PostPosted: Tue Dec 09, 2008 10:53 pm 
Offline
Addict
Addict
User avatar

Joined: Thu Jul 01, 2004 2:51 am
Posts: 905
Location: Tacoma, WA
Caught this thread by a happy accident. @milan1612 - I tested your code on two different PDF files and it only wrote a 0 byte text file. Do you have a small PDF file that worked on your system for me to test on mine?


Top
 Profile  
 
 Post subject:
PostPosted: Tue Dec 09, 2008 10:57 pm 
Offline
Addict
Addict
User avatar

Joined: Thu Apr 05, 2007 12:15 am
Posts: 899
Location: Nuremberg, Germany
Here is the Call of Duty 4 manual:
http://rapidshare.com/files/171890493/manual.pdf.html
Works quite well here...

_________________
Windows 7 & PureBasic 4.4


Top
 Profile  
 
 Post subject:
PostPosted: Tue Dec 09, 2008 11:09 pm 
Offline
PureBasic Expert
PureBasic Expert
User avatar

Joined: Wed Oct 29, 2003 4:35 pm
Posts: 9870
Location: Beyond the pale...
milan1612 wrote:
http://rapidshare.com/files/171884768/pdftext.zip.html

There you are, I tested it briefly and didn't find any bugs. Let me now if you find one.
As I already have an EsGrid license I want you to donate the money to Srod,
he truly deserves it!


Marco, please - if I can, whilst it's a very kind offer and much appreciated, would you mind donating to Purebasic instead; I think that Fred and co are more deserving than I. :)

_________________
I may look like a mule, but I'm not a complete ass.

eScript
Arctic Reports
nxSoftware


Top
 Profile  
 
 Post subject:
PostPosted: Tue Dec 09, 2008 11:10 pm 
Offline
Addict
Addict
User avatar

Joined: Thu Jul 01, 2004 2:51 am
Posts: 905
Location: Tacoma, WA
Can you try the file here: http://www.esri.com/library/whitepapers ... pefile.pdf

I've only found one PDF file on my system that works out of 10 so far.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Dec 09, 2008 11:14 pm 
Offline
PureBasic Expert
PureBasic Expert
User avatar

Joined: Wed Oct 29, 2003 4:35 pm
Posts: 9870
Location: Beyond the pale...
Yes that particular pdf file must be using one of the alternative compression schemes for object streams than that supported by this c library.

_________________
I may look like a mule, but I'm not a complete ass.

eScript
Arctic Reports
nxSoftware


Top
 Profile  
 
 Post subject:
PostPosted: Tue Dec 09, 2008 11:15 pm 
Offline
Addict
Addict
User avatar

Joined: Thu Jul 01, 2004 2:51 am
Posts: 905
Location: Tacoma, WA
Or some protection in place?

milan1612 - will you release your converted source code?


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 33 posts ]  Go to page 1, 2, 3  Next

All times are UTC + 1 hour


Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  

 


Powered by phpBB © 2008 phpBB Group
subSilver+ theme by Canver Software, sponsor Sanal Modifiye