Perform OCR in ASP.NET Core

9 Jul 202611 minutes to read

The Syncfusion^® .NET OCR library is used to extract text from the scanned PDFs and images in the ASP.NET Core application with the help of Google’s Tesseract Optical Character Recognition engine.

Steps to perform OCR on entire PDF document in ASP.NET Core application

Prerequisites:

Install .NET SDK: Ensure that you have the .NET SDK installed on your system. You can download it from the .NET Downloads page.
Install Visual Studio: Download and install Visual Studio from the official website.

Step 1: Create a new C# ASP.NET Core Web Application project. Create ASP.NET Core Web application

Step 2: In configuration windows, name your project and click Next. ASP.NET Core project configuration1

ASP.NET Core project configuration2

Step 3: Install the Syncfusion.PDF.OCR.Net.Core NuGet package as a reference to your .NET Standard applications from NuGet.org.
PDF OCR ASP.NET Core NuGet package

NOTE

Beginning from version 21.1.x, the default configuration includes the addition of the TesseractBinaries and Tesseract language data folder paths, eliminating the requirement to explicitly provide these paths.

Starting with v16.2.0.x, if you reference Syncfusion^® assemblies from trial setup or from the NuGet feed, you also have to add “Syncfusion.Licensing” assembly reference and include a license key in your projects. Please refer to this link to know about registering Syncfusion^® license key in your application to use our components.

Step 4: A default controller with the name HomeController.cs gets added to the creation of the ASP.NET Core MVC project. Include the following namespaces in that HomeController.cs file.

C#
using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Parsing;

Step 5: Add a new button in index.cshtml as follows.

CSHTML
@{Html.BeginForm("PerformOCR", "Home", FormMethod.Post);
   {
      <div>
         <input type="submit" value="Perform OCR" style="width:150px;height:27px" />
      </div>
   }
   Html.EndForm();
}

Step 6: Add a new action method named PerformOCR in the HomeController.cs and use the following code sample to perform OCR on the entire PDF document using PerformOCR method of the OCRProcessor class.

C#
public IActionResult PerformOCR()
{
   //Initialize the OCR processor.
   using (OCRProcessor processor = new OCRProcessor())
   {
      FileStream fileStream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read);
      //Load a PDF document.
      PdfLoadedDocument lDoc = new PdfLoadedDocument(fileStream);
      //Set OCR language to process.
      processor.Settings.Language = Languages.English;
      //Process OCR by providing the PDF document.
      processor.PerformOCR(lDoc);
      //Create memory stream.
      MemoryStream stream = new MemoryStream();
      //Save the document to memory stream.
      lDoc.Save(stream);
      lDoc.Close();
      //Set the position as '0'.
      stream.Position = 0;
      //Download the PDF document in the browser.
      FileStreamResult fileStreamResult = new FileStreamResult(stream, "application/pdf");
      fileStreamResult.FileDownloadName = "Sample.pdf";
      return fileStreamResult;
   }
}

Step 7: Build the project.

Click the Build button in the toolbar or press Ctrl+Shift+B to build the project.

Step 8: Run the project.

Click the Run button (green arrow) in the toolbar or press F5 to run the app.

Prerequisites:

Install .NET SDK: Ensure that you have the .NET SDK installed on your system. You can download it from the .NET Downloads page.
Install Visual Studio Code: Download and install Visual Studio Code from the official website.
Install C# Extension for VS Code: Open Visual Studio Code, go to the Extensions view (Ctrl+Shift+X), and search for ‘C#’. Install the official C# extension provided by Microsoft.

Step 1: Open the terminal (Ctrl+` ) and run the following command to create a new C# ASP.NET Core Web Application project.

dotnet new mvc -n CreatePdfASPNETCoreAPP

Step 2: Replace **CreatePdfASPNETCoreAPP with your desired project name.

Step 3: Navigate to the project directory using the following command

cd CreatePdfASPNETCoreAPP

Step 4: Use the following command in the terminal to add the Syncfusion.PDF.OCR.Net.Core package to your project.

dotnet add package Syncfusion.PDF.OCR.NET

NOTE

Beginning from version 21.1.x, the default configuration includes the addition of the TesseractBinaries and Tesseract language data folder paths, eliminating the requirement to explicitly provide these paths.

Starting with v16.2.0.x, if you reference Syncfusion^® assemblies from trial setup or from the NuGet feed, you also have to add “Syncfusion.Licensing” assembly reference and include a license key in your projects. Please refer to this link to know about registering Syncfusion^® license key in your application to use our components.

Step 5: A default controller with the name HomeController.cs gets added to the creation of the ASP.NET Core MVC project. Include the following namespaces in that HomeController.cs file.

C#
using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Parsing;

Step 6: Add a new button in index.cshtml as follows.

CSHTML
@{Html.BeginForm("PerformOCR", "Home", FormMethod.Post);
   {
      <div>
         <input type="submit" value="Perform OCR" style="width:150px;height:27px" />
      </div>
   }
   Html.EndForm();
}

Step 7: Add a new action method named PerformOCR in the HomeController.cs and use the following code sample to perform OCR on the entire PDF document using PerformOCR method of the OCRProcessor class.

C#
public IActionResult PerformOCR()
{
   //Initialize the OCR processor.
   using (OCRProcessor processor = new OCRProcessor())
   {
      FileStream fileStream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read);
      //Load a PDF document.
      PdfLoadedDocument lDoc = new PdfLoadedDocument(fileStream);
      //Set OCR language to process.
      processor.Settings.Language = Languages.English;
      //Process OCR by providing the PDF document.
      processor.PerformOCR(lDoc);
      //Create memory stream.
      MemoryStream stream = new MemoryStream();
      //Save the document to memory stream.
      lDoc.Save(stream);
      lDoc.Close();
      //Set the position as '0'.
      stream.Position = 0;
      //Download the PDF document in the browser.
      FileStreamResult fileStreamResult = new FileStreamResult(stream, "application/pdf");
      fileStreamResult.FileDownloadName = "Sample.pdf";
      return fileStreamResult;
   }
}

Step 8: Build the project.

Run the following command in terminal to build the project.

dotnet build

Step 9: Run the project.

Run the following command in terminal to build the project.

dotnet run

Prerequisites:

JetBrains Rider.
Install .NET 8 SDK or later.

Step 1. Open JetBrains Rider and create a new ASP.NET Core Web application project.

Launch JetBrains Rider.
Click new solution on the welcome screen.

Launch JetBrains Rider

In the new Solution dialog, select Project Type as Web.
Select the target framework (e.g., .NET 8.0, .NET 9.0) and template as Web App(Model-View-Controller).
Enter a project name and specify the location.
Click create.

Creating a new .NET Core console application in JetBrains Rider

Step 2: Install the NuGet package from NuGet.org.

Click the NuGet icon in the Rider toolbar and type Syncfusion.PDF.OCR.Net.Core in the search bar.
Ensure that “nuget.org” is selected as the package source.
Select the latest Syncfusion.PDF.OCR.NET NuGet package from the list.
Click the + (Add) button to add the package.

Select the Syncfusion.PDF.OCR.NET package

Click the Install button to complete the installation.

Install the package

NOTE

Beginning from version 21.1.x, the default configuration includes the addition of the TesseractBinaries and Tesseract language data folder paths, eliminating the requirement to explicitly provide these paths.

Starting with v16.2.0.x, if you reference Syncfusion^® assemblies from trial setup or from the NuGet feed, you also have to add “Syncfusion.Licensing” assembly reference and include a license key in your projects. Please refer to this link to know about registering Syncfusion^® license key in your application to use our components.

Step 4: A default controller with the name HomeController.cs gets added to the creation of the ASP.NET Core MVC project. Include the following namespaces in that HomeController.cs file.

C#
using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Parsing;

Step 5: Add a new button in index.cshtml as follows.

CSHTML
@{Html.BeginForm("PerformOCR", "Home", FormMethod.Post);
   {
      <div>
         <input type="submit" value="Perform OCR" style="width:150px;height:27px" />
      </div>
   }
   Html.EndForm();
}

C#
public IActionResult PerformOCR()
{
   //Initialize the OCR processor.
   using (OCRProcessor processor = new OCRProcessor())
   {
      FileStream fileStream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read);
      //Load a PDF document.
      PdfLoadedDocument lDoc = new PdfLoadedDocument(fileStream);
      //Set OCR language to process.
      processor.Settings.Language = Languages.English;
      //Process OCR by providing the PDF document.
      processor.PerformOCR(lDoc);
      //Create memory stream.
      MemoryStream stream = new MemoryStream();
      //Save the document to memory stream.
      lDoc.Save(stream);
      lDoc.Close();
      //Set the position as '0'.
      stream.Position = 0;
      //Download the PDF document in the browser.
      FileStreamResult fileStreamResult = new FileStreamResult(stream, "application/pdf");
      fileStreamResult.FileDownloadName = "Sample.pdf";
      return fileStreamResult;
   }
}

Step 7: Build the project.

Click the Build button in the toolbar or press Ctrl+Shift+B to build the project.

Step 8: Run the project.

Click the Run button (green arrow) in the toolbar or press F5 to run the app.

By executing the program, you will get a PDF document as follows.
OCR ASP.NET_Core Output

A complete working sample can be downloaded from the Github.

Click here to explore the rich set of Syncfusion^® PDF library features.

Search docs

Ask Syncfusion AI Assistant

Search docs

Ask Syncfusion AI Assistant

Perform OCR in ASP.NET Core

Steps to perform OCR on entire PDF document in ASP.NET Core application