Improving image quality in preparation for Tesseract OCR

Just starting out? Need help? Post your questions and find answers here.
loulou2522
Enthusiast
Enthusiast
Posts: 553
Joined: Tue Oct 14, 2014 12:09 pm

Improving image quality in preparation for Tesseract OCR

Post by loulou2522 »

I generate an image from ghostscript with the following instruction

Code: Select all

gswin64c.exe -q -dNOPAUSE -r400x400 -sDEVICE=tiffgray -dFirstPage=1 -dLastPage=4 -sOutputFile=a.tif 2021.pdf -dContrast=4 -dtextGamma -dBlackText -c quit
then I pass the image into Tesseract

Code: Select all

tesseract.exe a.tif essai -l fra --dpi 400 pdf
and finally I recover the text with PDFTOTEXT

Code: Select all

Pdftotext -f 1 -l 1 -marginl 40 -marginr 10 -margint 140 -enc UTF-8 -table essai.pdf bilanactifscan.txt
Is there one or more PUREBASIC instructions that would allow me to improve the quality of the image for a better OCR result?
And subsidiary are my programmation can be improve ?


Thanks for those who can help me

P.S I'm a beginner at image manipulation