Perform OCR in Blazor

7 Aug 20237 minutes to read

The Syncfusion .NET OCR library is used to extract text from scanned PDFs and images in the Blazor application with the help of Google’s Tesseract Optical Character Recognition engine.

Steps to perform OCR on the entire PDF document in the Blazor application

Step 1: Create a new C# Blazor Server application project. Select Blazor App from the template and click Next.
Blazor server app creation

Step 2: In the project configuration window, name your project and click Create.
Blazor server project configuraion1
Blazor server project configuraion1

Step 3: Install the Syncfusion.PDF.OCR.NET NuGet package as a reference to your Blazor Server application from NuGet.org.
Blazor server NuGet package installation

NOTE

  1. Beginning from version 21.1.x, the default configuration includes the addition of the TesseractBinaries and Tesseract language data folder paths, eliminating the requirement to explicitly provide these paths.
  2. Starting with v16.2.0.x, if you reference Syncfusion assemblies from trial setup or from the NuGet feed, you also have to add “Syncfusion.Licensing” assembly reference and include a license key in your projects. Please refer to this link to know about registering Syncfusion license key in your application to use our components.

Step 4: Create a new class file named ExportService under the Data folder and include the following namespaces in the file.

  • C#
  • using Syncfusion.OCRProcessor;
    using Syncfusion.Pdf.Parsing;
    using System.IO;

    Step 5: Use the following code sample to perform OCR on the entire PDF document using PerformOCR method of the OCRProcessor class in the ExportService file.

  • C#
  • public MemoryStream CreatePdf()
    {   
        //Initialize the OCR processor.
        using (OCRProcessor processor = new OCRProcessor("Tesseractbinaries/Windows"))
        {
            FileStream fileStream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read);
            //Load a PDF document.
            PdfLoadedDocument lDoc = new PdfLoadedDocument(fileStream);
            //Set OCR language to process.
            processor.Settings.Language = Languages.English;
            //Process OCR by providing the PDF document.
            processor.PerformOCR(lDoc, "tessdata/");
            //Create memory stream.
            MemoryStream stream = new MemoryStream();
            //Save the document to memory stream.
            lDoc.Save(stream);
            return stream;
        }
    }

    Step 6: Register your service in the ConfigureServices method available in the Startup.cs class as follows.

  • C#
  • public void ConfigureServices(IServiceCollection services)
    {
        services.AddRazorPages();
        services.AddServerSideBlazor();
        services.AddSingleton<WeatherForecastService>();
        services.AddSingleton<ExportService>();
    }

    Step 7: Inject ExportService into FetchData.razor using the following code.

  • C#
  • @inject ExportService exportService
    @inject Microsoft.JSInterop.IJSRuntime JS
    @using  System.IO;

    Step 8: Create a button in the FetchData.razor using the following code.

  • C#
  • <button class="btn btn-primary" @onclick="@PerformOCR">Perform OCR</button>

    Step 9: Add the PerformOCR method in FetchData.razor page to call the export service.

  • C#
  • @functions
    {
       protected async Task PerformOCR()
       {
           ExportService exportService = new ExportService();
           using (MemoryStream excelStream = exportService.CreatePdf())
           {
               await JS.SaveAs("Output.pdf", excelStream.ToArray());
           }
       }
    }

    Step 10: Create a class file with the FileUtil name and add the following code to invoke the JavaScript action to download the file in the browser.

  • C#
  • public static class FileUtil
    {
        public static ValueTask<object> SaveAs(this IJSRuntime js, string filename, byte[] data)
         => js.InvokeAsync<object>(
             "saveAsFile",
             filename,
             Convert.ToBase64String(data));
    }

    Step 11: Add the following JavaScript function in the _Host.cshtml available under the Pages folder.

  • C#
  • <script type="text/javascript">
        function saveAsFile(filename, bytesBase64) {
            if (navigator.msSaveBlob) {
                //Download document in Edge browser
                var data = window.atob(bytesBase64);
                var bytes = new Uint8Array(data.length);
                for (var i = 0; i < data.length; i++) {
                    bytes[i] = data.charCodeAt(i);
                }
                var blob = new Blob([bytes.buffer], { type: "application/octet-stream" });
                navigator.msSaveBlob(blob, filename);
            }
            else {
                var link = document.createElement('a');
                link.download = filename;
                link.href = "data:application/octet-stream;base64," + bytesBase64;
                document.body.appendChild(link); // Needed for Firefox
                link.click();
                document.body.removeChild(link);
            }
        }
    </script>

    You will get the following output in the browser by executing the program.
    Blazor browser window

    Click the button and get a PDF document with the following output.
    Blazor output PDF document

    A complete working sample can be downloaded from Github.

    Click here to explore the rich set of Syncfusion PDF library features.