Learn how OCR technology converts text from images and documents for computer processing, and when to use it in DataSnipper.
What is Text Recognition (OCR)
Optical Character Recognition (OCR) is a technology that reads documents and converts any visible text into a format that a computer can process. This ensures that text is read from images and documents as accurately as possible.
When to run Text Recognition (OCR)
After Importing documents, you will see one of 5 tags beside every file in the Document Organizer. Here are the tags, and what action you need to take:
- Contains Text - DataSnipper has recognized a text layer in the document, so there is no need to run OCR. Only run OCR if you cannot accurately extract information.
- OCR - DataSnipper has already run OCR on the document and recognized all contained text. No further action is needed.
- No Text - Text has not been recognized within the document. You should run OCR for better results.
- Corrupt - The document is detected as corrupted by the system. To fix this, reimport a non-corrupted version or delete the file.
- Missing file - This occurs during co-authoring. For example, if user 1 imports a local document and user 2 interacts with it or snips without having access to it.
Ensure your documents are processed correctly with these tags for optimal text recognition results.
Decision tree for OCR