help with preparing an image for Tesseract OCR

Just starting out? Need help? Post your questions and find answers here.
loulou2522
Enthusiast
Enthusiast
Posts: 553
Joined: Tue Oct 14, 2014 12:09 pm

help with preparing an image for Tesseract OCR

Post by loulou2522 »

I don't know how to improve the quality of an image in order to prepare an efficient Tesseract OCR ?
DarkDragon
Addict
Addict
Posts: 2347
Joined: Mon Jun 02, 2003 9:16 am
Location: Germany
Contact:

Re: help with preparing an image for Tesseract OCR

Post by DarkDragon »

loulou2522 wrote: Wed Aug 28, 2024 4:16 pm I don't know how to improve the quality of an image in order to prepare an efficient Tesseract OCR ?
Try a different size, tesseract prefers high resolution scans. Screenshots are usually too low res if that's your input.
bye,
Daniel
loulou2522
Enthusiast
Enthusiast
Posts: 553
Joined: Tue Oct 14, 2014 12:09 pm

Re: help with preparing an image for Tesseract OCR

Post by loulou2522 »

In fact no my input comes from an PDF and after i treat that with PDFTOPNG and after i submit this image to tesseract like ;
First phasen

Code: Select all

RunProgram("cmd.exe", "/C "+Chr(34)+"Pdftopng  -f "+firstpage+ " -l "+lastpage +" -gray -r 500 "+ file+ " bil"+Chr(34) ,"",#PB_Program_Wait|#PB_Program_Hide)    
Second phase

Code: Select all

RunProgram("cmd.exe", "/C "+Chr(34)+"tesseract.exe BIL-000003.png essai -l fra --psm 12 -preserve_interword_spaces=1 --dpi 500 pdf" +Chr(34) ,"",#PB_Program_Wait|#PB_Program_Hide)
third plase

Code: Select all

 RunProgram("cmd.exe", "/C "+Chr(34)+"pdftotext -f 1 -l 1 -marginl 60 -enc UTF-8 -nopgbrk -table  essai.pdf bilanactifscan.txt" +Chr(34) ,"",#PB_Program_Wait|#PB_Program_Hide)
Post Reply