For reference, here are the versions of the various libraries I am testing with
Anils-MacBook-Air:tesseract-test anilmurty$ tesseract -v
tesseract 3.02.02
leptonica-1.71
libgif 4.2.3 : libjpeg 9a : libpng 1.6.18 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0
Anils-MacBook-Air:tesseract-test anilmurty$
For testing the OCR capabilities, I went on google and found a few sample files to read. My ultimate goal is to be able to read receipts and invoices but I figure I'll start with something more basic:
TEST #1: A PNG file with lots of special characters but with no crazy formatting, like you would find on a bill or an invoice
OUTPUT: Pretty impressive. Only messed up uber
Anils-MacBook-Air:tesseract-test anilmurty$ tesseract /Users/anilmurty/Desktop/ocr-test-image-1.png test-png-1
Tesseract Open Source OCR Engine v3.02.02 with LeptonicaAnils-MacBook-Air:tesseract-test anilmurty$ cat test-png-1.txt
The (quick) [brown] {fox} jumps!
Over the $43,456.78
& duck/goose, as 12.5% of E-mail
from aspammer@website.com is spam.
Der ,,schnelle” braune Fuchs springt
fiber den faulen Hund. Le renard brun
«rapide» saute par-dessus le chien
paresseux. La volpe marrone rapida
salta sopra il cane pigro. El zorro
marrén répido salta sobre el perro
perezoso. A raposa marrom rzipida
salta sobre o cfio preguicoso.
Anils-MacBook-Air:tesseract-test anilmurty$
TEST #2: A PNG Format of my Blog's logo:
OUTPUT: Totally messed up the tagline!
Anils-MacBook-Air:tesseract-test anilmurty$ tesseract /Users/anilmurty/Desktop/Geeking-Out.png Geeking-out
Tesseract Open Source OCR Engine v3.02.02 with Leptonica
Anils-MacBook-Air:tesseract-test anilmurty$ cat Geeking-out.txt
Geeking Out
’caz Fm sun a geek at man .)
Anils-MacBook-Air:tesseract-test anilmurty$
Tesseract Open Source OCR Engine v3.02.02 with Leptonica
Anils-MacBook-Air:tesseract-test anilmurty$ cat Geeking-out.txt
Geeking Out
’caz Fm sun a geek at man .)
Anils-MacBook-Air:tesseract-test anilmurty$