Syncfusion AI Assistant

How can I help you?

Perform OCR in Console Application

17 Jun 20267 minutes to read

The Syncfusion® .NET OCR library is used to extract text from scanned PDFs and images in console application with the help of Google’s Tesseract Optical Character Recognition engine.

Steps to perform OCR on the entire PDF document in Console application

Prerequisites:

  • Install .NET SDK: Ensure that you have the .NET SDK installed on your system. You can download it from the .NET Downloads page.
  • Install Visual Studio: Download and install Visual Studio Code from the official website.

Step 1: Create a new .NET Core console application project. Create Console application

Step 2: In configuration windows, name your project and select Next. Configuration window1 Configuration window2

Step 3: Install the Syncfusion.PDF.OCR.Net.Core NuGet package as a reference to your .NET Standard applications from NuGet.org.
NuGet package installation

NOTE

  1. Beginning from version 21.1.x, the default configuration includes the addition of the TesseractBinaries and Tesseract language data folder paths, eliminating the requirement to explicitly provide these paths.
  2. Starting with v16.2.0.x, if you reference Syncfusion® assemblies from trial setup or from the NuGet feed, you also have to add “Syncfusion.Licensing” assembly reference and include a license key in your projects. Please refer to this link to know about registering Syncfusion® license key in your application to use our components.

Step 4: Include the following namespaces in the Program.cs.

using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Parsing;

Step 5: Include the following code sample in Program.cs using PerformOCR method of the OCRProcessor class.

//Initialize the OCR processor.
using (OCRProcessor processor = new OCRProcessor())
{
    //Load an existing PDF document.
    PdfLoadedDocument document = new PdfLoadedDocument(Path.GetFullPath(@"Data/Input.pdf"));
    //Set OCR language.
    processor.Settings.Language = Languages.English;
    //Perform OCR with input document and tessdata (Language packs).
    processor.PerformOCR(document);
    //Save the PDF document.
    document.Save(Path.GetFullPath(@"Output/Output.pdf"));
    //Close the document.
    document.Close(true);
}

Step 6: Build the project.

Click the Build button in the toolbar or press Ctrl+Shift+B to build the project.

Step 7: Run the project.

Click the Run button (green arrow) in the toolbar or press F5 to run the app.

Prerequisites:

  • Install .NET SDK: Ensure that you have the .NET SDK installed on your system. You can download it from the .NET Downloads page.
  • Install Visual Studio Code: Download and install Visual Studio Code from the official website.
  • Install C# Extension for VS Code: Open Visual Studio Code, go to the Extensions view (Ctrl+Shift+X), and search for ‘C#’. Install the official C# extension provided by Microsoft.

Step 1: Open the terminal (Ctrl+` ) and run the following command to create a new Console Application project.

dotnet new console -n ConsoleApplication

Step 2: Replace **ConsoleApplication with your desired project name.

Step 3: Navigate to the project directory using the following command

cd ConsoleApplication

Step 4: Use the following command in the terminal to add the Syncfusion.PDF.OCR.Net.Core package to your project.

dotnet add package Syncfusion.PDF.OCR.Net.Core

NOTE

  1. Beginning from version 21.1.x, the default configuration includes the addition of the TesseractBinaries and Tesseract language data folder paths, eliminating the requirement to explicitly provide these paths.
  2. Starting with v16.2.0.x, if you reference Syncfusion® assemblies from trial setup or from the NuGet feed, you also have to add “Syncfusion.Licensing” assembly reference and include a license key in your projects. Please refer to this link to know about registering Syncfusion® license key in your application to use our components.

Step 5: Include the following namespaces in the Program.cs.

using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Parsing;

Step 6: Include the following code sample in Program.cs using PerformOCR method of the OCRProcessor class.

//Initialize the OCR processor.
using (OCRProcessor processor = new OCRProcessor())
{
    //Load an existing PDF document.
    PdfLoadedDocument document = new PdfLoadedDocument(Path.GetFullPath(@"Data/Input.pdf"));
    //Set OCR language.
    processor.Settings.Language = Languages.English;
    //Perform OCR with input document and tessdata (Language packs).
    processor.PerformOCR(document);
    //Save the PDF document.
    document.Save(Path.GetFullPath(@"Output/Output.pdf"));
    //Close the document.
    document.Close(true);
}

Step 7: Build the project.

Run the following command in terminal to build the project.

dotnet build

Step 8: Run the project.

Run the following command in terminal to build the project.

dotnet run

Prerequisites:

  • JetBrains Rider.
  • Install .NET 8 SDK or later.

Step 1. Open JetBrains Rider and create a new .NET Core console application project.

  • Launch JetBrains Rider.
  • Click new solution on the welcome screen.

Launch JetBrains Rider

  • In the new Solution dialog, select Project Type as Console.
  • Enter a project name and specify the location.
  • Select the target framework (e.g., .NET 8.0, .NET 9.0 or .NET 10.0).
  • Click create.

Creating a new Console project in JetBrains Rider

Step 2: Install the NuGet package from NuGet.org.

  • Click the NuGet icon in the Rider toolbar and type Syncfusion.PDF.OCR.Net.Core in the search bar.
  • Ensure that “NuGet.org” is selected as the package source.
  • Select the latest Syncfusion.PDF.OCR.Net.Core NuGet package from the list.
  • Click the + (Add) button to add the package.

Select the Syncfusion.PDF.OCR.NET package

NOTE

  1. Beginning from version 21.1.x, the default configuration includes the addition of the TesseractBinaries and Tesseract language data folder paths, eliminating the requirement to explicitly provide these paths.
  2. Starting with v16.2.0.x, if you reference Syncfusion® assemblies from trial setup or from the NuGet feed, you also have to add “Syncfusion.Licensing” assembly reference and include a license key in your projects. Please refer to this link to know about registering Syncfusion® license key in your application to use our components.

Step 3: Include the following namespaces in the Program.cs.

using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Parsing;

Step 4: Include the following code sample in Program.cs using PerformOCR method of the OCRProcessor class.

//Initialize the OCR processor.
using (OCRProcessor processor = new OCRProcessor())
{
    //Load an existing PDF document.
    PdfLoadedDocument document = new PdfLoadedDocument(Path.GetFullPath(@"Data/Input.pdf"));
    //Set OCR language.
    processor.Settings.Language = Languages.English;
    //Perform OCR with input document and tessdata (Language packs).
    processor.PerformOCR(document);
    //Save the PDF document.
    document.Save(Path.GetFullPath(@"Output/Output.pdf"));
    //Close the document.
    document.Close(true);
}

Step 5: Build the project.

Click the Build button in the toolbar or press Ctrl+Shift+B to build the project.

Step 6: Run the project.

Click the Run button (green arrow) in the toolbar or press F5 to run the app.

By executing the program, you will get the PDF document as follows
Console output PDF document

A complete working sample can be downloaded from GitHub.

Click here to explore the rich set of Syncfusion® PDF library features.