What Is OCR?

Date First Published: 16th August 2023

Topic: Computer Systems

Subtopic: Computer Software

Article Type: Computer Terms & Definitions

Difficulty: Medium

Difficulty Level: 5/10

CONTENTS

How Does OCR Work?
Advantages and Disadvantages Of OCR

Learn about what OMR is in this article.

Stands for Optical Character Reading or Optical Character Recognition. OCR is a technology that recognises and extracts letters from a digital image or a pre-printed form. It is most commonly used to convert hard-copy documents into digital copies. OCR can be used to recognise both printed and handwritten text.

An example of using OCR is scanning a printed copy of a book or magazine to convert it into an electronic version (soft copy). The pages can then be loaded into an OCR program, which will identify the text and convert the document into an editable text file. Other uses of OCR are scanning printed documents into versions that can be edited with word processors, automating data entry, sorting letters for mail delivery, recognising text within a number plate with a camera, and translating words within a digital image to another language.

How Does OCR Work?

OCR works in the following steps:

A scanner is used to process a document.
The document is converted by the OCR software into a two-colour, or black and white, version.
The scanned-in image or bitmap is analysed for light and dark areas. The dark areas are identified as the characters that need to be recognised and the light areas are identified as the background.
The dark areas are further processed to find alphabetic letters or numeric digits.
Characters are identified using pattern recognition or feature detection. With pattern recognition, OCT software is given examples of text in different fonts and formats. This is used to compare and recognise characters in the scanned document. With feature detection, OCT software applies rules in relation to the features of a specific letter or number to recognise characters in the hard-copy document or digital image.
After identifying characters and numbers, they are converted into an ASCII code that can be used by computer systems for editing.

OCR works in a different way than Optical Mark Recognition (OMR). Instead, OCR extracts letters and words from a pre-printed form. OMR is used to identify marks on paper and cannot identify text.

Advantages and Disadvantages Of OCR

The advantages of OCR are:

It is quicker. Instead of manually writing large amounts of text, the OCR technology can read text without having to manually write it up.
It is more cost-effective. Instead of paying someone to manually enter large amounts of text, OCR can automate the data entry process.
Improved accuracy. OCR reduces the chances of human errors, like typos in data entries.

The disadvantages of OCR are:

If the handwriting is difficult to read, mistakes can happen during the data entry process, like the scanner misreading the text.
The system is limited to the characters on the database of features and patterns. If the character is not included in the database, it may not be recognised.
It may not always maintain the formatting of the original document. The font size, style, spacing, and indentation may not always be considered, requiring more time to be spent formatting the document.