Skip to content
Home » Extract Text From Image Using pytesseract in Python

Extract Text From Image Using pytesseract in Python

In this tutorial let’s write Python Script to Extract Text From Image Using pytesseract in Python.

Python Code to Extract Text From Image Using pytesseract

Pytesseract often known as Python-tesseract, is a Python Optical Character Recognition (OCR) tool. It can read and recognise text in photos, licence plates, and so on. To read the words from the given image, we will utilize the tesseract software.

Here, three basic steps are involved, as indicated below:

  • Loading an image from the computer or downloading it with a browser and then loading it. (Any image accompanied by text.)
  • Image Binarization (Converting Image to Binary).
  • The image will subsequently be processed by the OCR system.

Install pytesseract and pillow using below commands.

pip install pytesseract

pip install pillow

app.py : You can save the below code with app.py filename.

The below code will Extract Text From Image Using pytesseract in Python.

from PIL import Image 
from pytesseract import pytesseract 


path_to_tesseract = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image_path = r"csv\sample_text.png"

# Opening the image & storing it in an image object 
img = Image.open(image_path) 
pytesseract.tesseract_cmd = path_to_tesseract 

# image_to_string() function
text = pytesseract.image_to_string(img) 

# printing the extracted text 
print(text[:-1])

Output:

To execute the code through command line, you can use python3 app.py

Similar Posts:

How to Convert Text to Speech in Python
Python | How to Play Mp3 File using Python