Overview of Optical Character Recognition (OCR)

13 Jul 20261 minute to read

Optical character recognition (OCR) is a technology used to convert scanned paper documents in the form of PDF files or images into searchable and editable data.

The .NET OCR processor library has extended support to process OCR on scanned PDF documents and images with the help of Google’s Tesseract Optical Character Recognition engine.

An inbuilt image preprocessor has been added to the OCR to prepare images for optimal recognition. This step ensures cleaner input and reduces OCR errors. The preprocessor supports the following enhancements:

Convert to Grayscale – Simplifies image data by removing color information, making text easier to detect.
Deskew – Corrects tilted or rotated text for proper alignment.
Denoise – Removes speckles and artifacts that can interfere with character recognition.
Apply Contrast Adjustment – Enhances text visibility against the background.
Apply Binarize – Converts images to black-and-white for sharper text edges, using advanced thresholding methods

The .NET OCR processor library works seamlessly in various platforms: Azure App Services, Azure Functions, AWS Textract, Docker, WinForms, WPF, Blazor, ASP.NET MVC, ASP.NET Core with Windows, MacOS and Linux.

NOTE

Starting with v20.1.0.x, if you reference Syncfusion^® OCR processor assemblies from the trial setup or the NuGet feed, you also have to include a license key in your projects. Please refer to this link to learn more about registering the Syncfusion^® license key in your application to use its components.

Key features

Create a searchable PDF from scanned PDF.
Zonal text extraction from the scanned PDF.
Preserve Unicode characters.
Extract text from the image.
Create a searchable PDF from large scanned PDF documents.
Create a searchable PDF from rotated scanned PDF.
Get OCRed text and its bounds from a scanned PDF document.
Native call.
Customizing the temp folder.
Performing OCR with different Page Segmentation Mode.
Performing OCR with different OCR Engine Mode.
White List.
Black List.
Image into searchable PDF or PDF/A.
Improved accessibility.
Post-processing.
Compatible with .NET Framework 4.5 and above.
Compatible with .NET Core 2.0 and above.

Search docs

Ask Syncfusion AI Assistant

Search docs

Ask Syncfusion AI Assistant

Overview of Optical Character Recognition (OCR)

Key features