How to use Text Recognition (OCR) to enhance the text quality

This article explains how to run the text recognition manually when the text is not instantly recognized after importing.

What is Text Recognition (OCR)

Optical Character Recognition (OCR) is a technology that reads documents and transforms any visible text into a format that the computer can read. This helps users extract text from images or scanned documents.

Introduction

DataSnipper uses intelligent Optical Character Recognition (OCR) to run text recognition on documents. This enables you to search and extract text from PDFs, scans, and images. After importing a document in DataSnipper, all texts, images, and PDFs will be analyzed automatically. However, in some cases, the automatic text recognition might not work due to the complexity of the document like analyzing manual writing.

Choose your DataSnipper version to learn more about Text Recognition:

👉 DataSnipper version 4.0 and earlier

👉 DataSnipper version 4.1 and later

or
👉 Visit FAQs on Text Recognition (OCR)

DataSnipper v4.0 - Text Recognition

Use Text Recognition

Run Text Recognition Manually

Let's start

  1. To re-run the text recognition, click the "recognize text" button.
    TR
  2. DataSnipper will indicate that no text was found in the first go.
    Recognize Text pop-up
  3. Run the test to recognize the text in the document again.
    Manually recognize

DataSnipper v4.0 - Facts about Text Recognition 

  • Text Recognition will enhance the quality of the PDF text by default when pages do not contain text
  • Handwritten Text Recognition supports: English, Chinese Simplified, French, German, Italian, Portuguese, and Spanish
  • Text Recognition supports all languages in Latin, Cyrillic, Chinese, Japanese, and Korean scripts
  • Quality and language support of text recognition is always improving, you can find the most up-to-date status per language on the computer vision language support page.

DataSnipper v4.1 - Text Recognition

Use Text Recognition

Run Text Recognition Manually

Let's start

  1. To re-run the text recognition, click the "recognize text" button.
    TR
  2. DataSnipper will indicate that no text was found in the first go.
    Recognize Text pop-up
  3. Run the test to recognize the text in the document again.
    Manually recognize

Facts about Text Recognition 

  • Text Recognition will enhance the quality of the PDF text by default
  • Text Recognition supports all languages in Latin, Cyrillic, Chinese, Japanese, and Korean scripts
  • Handwritten Text Recognition supports English, Chinese Simplified, French, German, Italian, Portuguese, and Spanish

FAQs

  1. Does DataSnipper recognize handwritten text?
    DataSnipper can recognize text in handwritten form as long as it is legible. You can find the most up-to-date status per language on the computer vision language support page.
  2. When is OCR applied by DataSnipper?
    OCR is applied on the following user actions:
        i. Upon importing documents that do not have a text layer (e.g., images and scanned documents), DataSnipper will prompt a notification in which it asks whether you would like Text Recognition to be applied.
        ii. Upon selecting the Text Recognition button in the DataSnipper tab and applying it to the selected documents.
  3. How can I run Text Recognition on a document that already contains text?
    You can use the 'Recognize Text' button and then choose the option 'Recognize'.
    Please note that for versions before v4.1 you need to select the ‘Recognize and overwrite option’.