Monday, November 2, 2015

Installing Tesseract using Macports

Follow these steps to install Tesseract using Macports:

1. Install Tesseract dependencies: autoconf, automake, libtool, libpng (with support for jpeg and tiff) and leptonica.
2. Install tesseract. (I installed with just the english language support).
3. Set the TESSDATA_PREFIX env variable to point to the location of parent directory that contains the "tessdata" folder, which contains the eng.traineddata file (you may need to do a "find" to locate this file and point it to the correct path).



Last login: Mon Nov  2 10:37:56 on ttys000
Anils-MacBook-Air:~ anilmurty$ 
Anils-MacBook-Air:~ anilmurty$ 
Anils-MacBook-Air:~ anilmurty$ sudo port install autoconf
--->  Computing dependencies for autoconf
--->  Cleaning autoconf
--->  Scanning binaries for linking errors
--->  No broken files found.
Anils-MacBook-Air:~ anilmurty$ sudo port install automake
--->  Cleaning automake
--->  Scanning binaries for linking errors
--->  No broken files found.
Anils-MacBook-Air:~ anilmurty$ sudo port install libtool
--->  Cleaning libtool
--->  Scanning binaries for linking errors
--->  No broken files found.
Anils-MacBook-Air:~ anilmurty$ sudo port install jpeg tiff libpng
--->  Cleaning jpeg
--->  Computing dependencies for tiff
--->  Cleaning tiff
--->  Computing dependencies for libpng
--->  Cleaning libpng
--->  Scanning binaries for linking errors
--->  No broken files found.
Anils-MacBook-Air:~ anilmurty$ sudo port install leptonica
--->  Computing dependencies for leptonica
--->  Cleaning leptonica
--->  Scanning binaries for linking errors
--->  No broken files found.
Anils-MacBook-Air:~ anilmurty$ sudo port selfupdate
--->  Updating MacPorts base sources using rsync
MacPorts base version 2.3.4 installed,
MacPorts base version 2.3.4 downloaded.
--->  Updating the ports tree
--->  MacPorts base is already the latest version

The ports tree has been updated. To upgrade your installed ports, you should run
  port upgrade outdated
Anils-MacBook-Air:~ anilmurty$ port upgrade outdated
Nothing to upgrade.
Anils-MacBook-Air:~ anilmurty$ 


Anils-MacBook-Air:~ anilmurty$ sudo port install tesseract-eng
--->  Computing dependencies for tesseract-eng
--->  Dependencies to be installed: tesseract
--->  Fetching archive for tesseract
--->  Attempting to fetch tesseract-3.02.02_2.darwin_14.x86_64.tbz2 from http://packages.macports.org/tesseract
--->  Attempting to fetch tesseract-3.02.02_2.darwin_14.x86_64.tbz2.rmd160 from http://packages.macports.org/tesseract
--->  Installing tesseract @3.02.02_2
--->  Activating tesseract @3.02.02_2
--->  Cleaning tesseract
--->  Fetching archive for tesseract-eng
--->  Attempting to fetch tesseract-eng-3.02_1.darwin_14.noarch.tbz2 from http://packages.macports.org/tesseract-eng
--->  Attempting to fetch tesseract-eng-3.02_1.darwin_14.noarch.tbz2.rmd160 from http://packages.macports.org/tesseract-eng
--->  Installing tesseract-eng @3.02_1
--->  Activating tesseract-eng @3.02_1
--->  Cleaning tesseract-eng
--->  Updating database of binaries
--->  Scanning binaries for linking errors
--->  No broken files found.
Anils-MacBook-Air:~ anilmurty$ 

Anils-MacBook-Air:/ anilmurty$ export TESSDATA_PREFIX="/opt/local/share"



 Quick Test

Anils-MacBook-Air:/ anilmurty$ tesseract
Usage:tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]

pagesegmode values are:
0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.
-l lang and/or -psm pagesegmode must occur before anyconfigfile.

Single options:
  -v --version: version info
  --list-langs: list available languages for tesseract engine
Anils-MacBook-Air:/ anilmurty$