Page 1 of 1

help with preparing an image for Tesseract OCR

Posted: Wed Aug 28, 2024 4:16 pm
by loulou2522
I don't know how to improve the quality of an image in order to prepare an efficient Tesseract OCR ?

Re: help with preparing an image for Tesseract OCR

Posted: Wed Aug 28, 2024 4:22 pm
by DarkDragon
loulou2522 wrote: Wed Aug 28, 2024 4:16 pm I don't know how to improve the quality of an image in order to prepare an efficient Tesseract OCR ?
Try a different size, tesseract prefers high resolution scans. Screenshots are usually too low res if that's your input.

Re: help with preparing an image for Tesseract OCR

Posted: Wed Aug 28, 2024 8:07 pm
by loulou2522
In fact no my input comes from an PDF and after i treat that with PDFTOPNG and after i submit this image to tesseract like ;
First phasen

Code: Select all

RunProgram("cmd.exe", "/C "+Chr(34)+"Pdftopng  -f "+firstpage+ " -l "+lastpage +" -gray -r 500 "+ file+ " bil"+Chr(34) ,"",#PB_Program_Wait|#PB_Program_Hide)    
Second phase

Code: Select all

RunProgram("cmd.exe", "/C "+Chr(34)+"tesseract.exe BIL-000003.png essai -l fra --psm 12 -preserve_interword_spaces=1 --dpi 500 pdf" +Chr(34) ,"",#PB_Program_Wait|#PB_Program_Hide)
third plase

Code: Select all

 RunProgram("cmd.exe", "/C "+Chr(34)+"pdftotext -f 1 -l 1 -marginl 60 -enc UTF-8 -nopgbrk -table  essai.pdf bilanactifscan.txt" +Chr(34) ,"",#PB_Program_Wait|#PB_Program_Hide)