OCR - Optical character recognition
Optical character recognition (optical character reader, OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image (for example from a television broadcast). The basic process of OCR involves examining the text of a document and translating the characters into code that can be used for data processing. OCR is sometimes also referred to as text recognition.
It is widely used as a form of data entry from printed paper data records, whether passport documents, invoices, bank statements, computerised receipts, business cards, mail, printouts of static-data, or any suitable documentation. OCR program works best on text that has already been typed, either in cases where an original printout has been lost, or in scanning sheets typed on a typewriter. Good OCR software may also be able to translate handwritten text, however, although the error rate on this sort of conversion tends to be much higher.
The actual term OCR software is a bit misleading, as most modern versions do not in fact use optical character recognition, but actually use digital character recognition. This is because some years ago the fields effectively merged, and both fields adopted the more attractive term optical character recognition. Character recognition software has advanced a great deal in recent years, with modern programs substantially better than their predecessors at identifying text.
OCR systems are made up of a combination of hardware and software that is used to convert physical documents into machine-readable text. Hardware, such as an optical scanner or specialized circuit board is used to copy or read text while optical recognition software typically handles the advanced processing. OCR technology software can also take advantage of artificial intelligence (AI) to implement more advanced methods of intelligent character recognition (ICR), like identifying languages or styles of handwriting.
The process of OCR is most commonly used to turn hard copy legal or historic documents into PDFs. Once placed in this soft copy, users can edit, format and search the document as if it was created with a word processor.
The first step of OCR is using a scanner to process the physical form of a document. Once all pages are copied, optical character recognition software converts the document into a two-color, or black and white, version. The scanned-in image or bitmap is analyzed for light and dark areas, where the dark areas are identified as characters that need to be recognized and light areas are identified as background.
The dark areas are then processed further to find alphabetic letters or numeric digits. OCR products can vary in their techniques, but typically involve targeting one character, word or block of text at a time. Characters are then identified using one of two algorithms:
Pattern recognition - OCRrecognitionprograms are fed examples of text in various fonts and formats which are then used to compare, and recognize, characters in the scanned document.
Feature detection - OCR programs apply rules regarding the features of a specific letter or number to recognize characters in the scanned document. Features could include the number of angled lines, crossed lines or curves in a character for comparison. For example, the capital letter “A” may be stored as two diagonal lines that meet with a horizontal line across the middle.
When a character is identified, it is converted into an ASCII code that can be used by computer systems to handle further manipulations. Users should correct basic errors, proofread and make sure complex layouts were handled properly before saving the document for future use.
Vendors OCR - Optical character recognition
F.A.Q. about OCR - Optical character recognition
OCR (Optical Character Recognition) use cases
Uses of OCR have a variety of applications, including:
- Scanning printed documents into versions that can be edited with word processors, like Microsoft Word or Google Docs.
- Indexing print material for search engines.
- Automating data entry, extraction and processing.
- Deciphering documents into text that can be read aloud to visually-impaired or blind users.
- Archiving historic information, such as newspapers, magazines or phonebooks, into searchable formats.
- Electronically depositing checks without the need for a bank teller.
- Placing important, signed legal documents into an electronic database.
- Recognizing text, such as license plates, with a camera or software.
- Sorting letters for mail delivery.
- Translating words within an image into a specified language.
Choosing the Right Character Recognition Solution: OCR and ICR
Though recent technology advancements improved data recognition, many offices continue to rely on manual entry and sorting processes. OCR (Optical Character Recognition) and ICR (Intelligent Character Recognition or Intelligent Document Recognition software) are essential components of Advanced Data Capture that digitally capture images and text. Both technologies provide companies with enhanced images and critical business information. So, what are the differences between OCR and ICR, and how do these differences pertain to your documents?
Optical Character Recognition refers to the process where documents are captured and digitally converted into searchable text that you can edit. OCR is primarily used to read machine-generated documents with typed or printed text. The technology reads brightness and text font of these documents, recognizing characters with utmost accuracy. OCR is a perfect solution for companies that need information from paper documents and PDF files, and is great for translating longer documents. For example, if you need to pull quotes from a document for another project, OCR will allow you to easily search, retrieve and extract text.
Intelligent Character Recognition, takes OCR a step further with more intensive character recognition. While OCR mainly encompasses machine-printed characters, the best of best intelligent character recognition software can recognize handwritten text, and translate them into searchable files. Like OCR, ICR scans, reviews and translates the text. ICR is a compatible solution for offices with many checks, timesheets and other handwritten documents.