OCR Processor Troubleshooting

22 Jan 202510 minutes to read

Tesseract has not been initialized exception

Exception	Tesseract has not been initialized exception.
Reason	The exception may occur if the tesseract binaries and tessdata files are unavailable on the provided path.
Solution1	Set proper tesseract binaries and tessdata folder with all files and inner folders. The tessdata folder name is case-sensitive and should not change. C# [Cross-platform] `//TesseractBinaries - path of the folder tesseract binaries. OCRProcessor processor = new OCRProcessor(@"TesseractBinaries/"); //TessData - path of the folder containing the language pack processor.PerformOCR(lDoc, @"TessData/");`
Solution2	Ensure that your data file version is 3.02 since the OCR processor is built with the Tesseract version 3.02.

Exception has been thrown by the target of an invocation

Exception	Exception has been thrown by the target of an invocation.
Reason	If the tesseract binaries are not in the required structure.
Solution	To resolve this exception, ensure the tesseract binaries are in the following structure. The tessdata and tesseract binaries folder are automatically added to the bin folder of the application. The assemblies should be in the following structure. 1.bin\Debug\net7.0\runtimes\win-x64\native\leptonica-1.80.0.dll,libSyncfusionTesseract.dll 2.bin\Debug\net7.0\runtimes\win-x86\native\leptonica-1.80.0.dll,libSyncfusionTesseract.dll
Reason 1	An exception may occur due to missing or mismatched assemblies of the Tesseract binaries and Tesseract data from the OCR processor.
Reason 2	An exception may occur due to the VC++ 2015 redistributable files missing in the machine where the OCR processor takes place.
Solution	Install the VC++ 2015 redistributable files in your machine to overcome an exception. So, please select both file and install it. Refer to the following screenshot: Please find the download link Visual C++ 2015 Redistributable file, Visual C++ 2015 Redistributable file

Can’t be opened because the developer’s identity cannot be confirmed

Exception	Can't be opened because the developer's identity cannot be confirmed.
Reason	This error may occur during the initial loading of the OCR processor in Mac environments.
Solution	To resolve this issue, refer this link for more details.

The OCR processor doesn’t process languages other than English

Exception	The OCR processor doesn't process languages other than English.
Reason	This issue may occur if the input image has other languages. The language and tessdata are unavailable for those languages.
Solution	The essential^® PDF supports all the languages the Tesseract engine supports in the OCR processor. The dictionary packs for the languages can be downloaded from the following online location: https://code.google.com/p/tesseract-ocr/downloads/list It is also mandatory to change the corresponding language code in the OCRProcessor.Settings.Language property. For example, to perform the optical character recognition in German, the property should be set as "processor.Settings.Language = "deu";"

Exception

The OCR processor doesn't process languages other than English.

Reason

This issue may occur if the input image has other languages. The language and tessdata are unavailable for those languages.

Solution

The essential^® PDF supports all the languages the Tesseract engine supports in the OCR processor. The dictionary packs for the languages can be downloaded from the following online location:
https://code.google.com/p/tesseract-ocr/downloads/list

It is also mandatory to change the corresponding language code in the OCRProcessor.Settings.Language property.
For example, to perform the optical character recognition in German, the property should be set as
"processor.Settings.Language = "deu";"

Text does not recognize properly when performing OCR on a PDF document with low-quality images

Issue	Text does not recognize properly when performing OCR on a PDF document with low-quality images
Reason	The presence of low quality images in the input PDF document may be the cause of this issue.
Solution	By using the best tessdata, we can improve the OCR results. For more information, please refer to the links below. https://github.com/tesseract-ocr/tessdata_best Note: For better performance, kindly use the fast tessdata which is mentioned in below link, https://github.com/tesseract-ocr/tessdata_fast

OCR not working on Mac: Exception has been thrown by the target of an invocation

Issue Syncfusion.Pdf.PdfException: Exception has been thrown by the target of an invocation" in the Mac machine.

Reason The problem occurs due to a mismatch in the dependency package versions on your Mac machine.

Solution

Issue	Syncfusion.Pdf.PdfException: Exception has been thrown by the target of an invocation" in the Mac machine.
Reason	The problem occurs due to a mismatch in the dependency package versions on your Mac machine.
Solution	To resolve this problem, you should install and utilize Tesseract 5 on your Mac machine. Refer to the following steps for installing Tesseract 5 and integrating it into an OCR processing workflow. 1.Execute the following command to install Tesseract 5. C# `brew install tesseract` If the "brew" is not installed on your machine, you can install it using the following command. C# `/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"` 2.Once Tesseract 5 is successfully installed, you can configure the path to the latest binaries by copying the location of the Tesseract folder and setting it as the Tesseract binaries path when setting up the OCR processor. Refer to the example code below: C# `//Initialize the OCR processor by providing the path of tesseract binaries. using (OCRProcessor processor = new OCRProcessor("/opt/homebrew/Cellar/tesseract/5.3.2/lib"))` 3.Add the TessDataPath from bin folder. Refer to the example code below: </br> C# [Cross-platform] using (OCRProcessor processor = new OCRProcessor("/opt/homebrew/Cellar/tesseract/5.3.2/lib")) { FileStream fileStream = new FileStream("../../../Input.pdf", FileMode.Open, FileAccess.Read); //Load a PDF document. PdfLoadedDocument lDoc = new PdfLoadedDocument(fileStream); //Set OCR language to process. processor.Settings.Language = Languages.English; //Process OCR by providing the PDF document. processor.TessDataPath = "runtimes/tessdata"; processor.PerformOCR(lDoc); //Create file stream. using (FileStream outputFileStream = new FileStream("Output.pdf", FileMode.Create, FileAccess.ReadWrite)) { //Save the PDF document to file stream. lDoc.Save(outputFileStream); } //Close the document. lDoc.Close(true); }

To resolve this problem, you should install and utilize Tesseract 5 on your Mac machine. Refer to the following steps for installing Tesseract 5 and integrating it into an OCR processing workflow.

1.Execute the following command to install Tesseract 5.

C#
brew install tesseract

If the "brew" is not installed on your machine, you can install it using the following command.

C#
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

2.Once Tesseract 5 is successfully installed, you can configure the path to the latest binaries by copying the location of the Tesseract folder and setting it as the Tesseract binaries path when setting up the OCR processor. Refer to the example code below:

C#
//Initialize the OCR processor by providing the path of tesseract binaries.
using (OCRProcessor processor = new OCRProcessor("/opt/homebrew/Cellar/tesseract/5.3.2/lib"))

3.Add the TessDataPath from bin folder. Refer to the example code below:
</br>

C# [Cross-platform]
using (OCRProcessor processor = new OCRProcessor("/opt/homebrew/Cellar/tesseract/5.3.2/lib"))
{
    FileStream fileStream = new FileStream("../../../Input.pdf", FileMode.Open, FileAccess.Read);
    //Load a PDF document.
    PdfLoadedDocument lDoc = new PdfLoadedDocument(fileStream);
    //Set OCR language to process.
    processor.Settings.Language = Languages.English;
    //Process OCR by providing the PDF document.
    processor.TessDataPath = "runtimes/tessdata";
    processor.PerformOCR(lDoc);
    //Create file stream.
    using (FileStream outputFileStream = new FileStream("Output.pdf", FileMode.Create, FileAccess.ReadWrite))
    {
        //Save the PDF document to file stream.
        lDoc.Save(outputFileStream);
    }
    //Close the document.
    lDoc.Close(true);
}

Method PerformOCR() causes problems and ignores the tesseract files under WSL.

Issue Method PerformOCR() causes problem and ignores the tesseract files under WSL

Reason Tesseract binaries in WSL are missing.

Solution

Issue	Method PerformOCR() causes problem and ignores the tesseract files under WSL
Reason	Tesseract binaries in WSL are missing.
Solution	To resolve this problem, you should install and utilize Leptonica and Tesseract on your machine. Refer to the following steps for installing Leptonica and Tesseract, 1. Install the leptonica. C# `sudo apt-get install libleptonica-dev` 2.Install the tesseract. C# `sudo apt-get install tesseract-ocr-eng` 3. Copy the binaries (liblept.so and libtesseract.so) to the missing files exception folder in the project location. C# `cp /usr/lib/x86_64-linux-gnu/liblept.so /home/syncfusion/linuxdockersample/linuxdockersample/bin/Debug/net7.0/liblept1753.so` C# `cp /usr/lib/x86_64-linux-gnu/libtesseract.so.4 /home/syncfusion/linuxdockersample/linuxdockersample/bin/Debug/net7.0/libSyncfusionTesseract.so`

To resolve this problem, you should install and utilize Leptonica and Tesseract on your machine. Refer to the following steps for installing Leptonica and Tesseract,

1. Install the leptonica.

C#
sudo apt-get install libleptonica-dev

2.Install the tesseract.

C#
sudo apt-get install tesseract-ocr-eng

3. Copy the binaries (liblept.so and libtesseract.so) to the missing files exception folder in the project location.

C#
cp /usr/lib/x86_64-linux-gnu/liblept.so /home/syncfusion/linuxdockersample/linuxdockersample/bin/Debug/net7.0/liblept1753.so

C#
cp /usr/lib/x86_64-linux-gnu/libtesseract.so.4 /home/syncfusion/linuxdockersample/linuxdockersample/bin/Debug/net7.0/libSyncfusionTesseract.so

OCR not working on Linux: Exception has been thrown by the target of an invocation

Issue Syncfusion.Pdf.PdfException: Exception has been thrown by the target of an invocation" in the Linux machine.

Reason The problem occurs due to the missing prerequisites dependencies on your Linux machine.

Solution

Issue	Syncfusion.Pdf.PdfException: Exception has been thrown by the target of an invocation" in the Linux machine.
Reason	The problem occurs due to the missing prerequisites dependencies on your Linux machine.
Solution	To resolve this problem, you should install all required dependencies in your Linux machine. Refer to the following steps to installing the missing dependencies. Step 1: Execute the following command in terminal window to check dependencies are installed properly. C# `ldd liblept1753.so ldd libSyncfusionTesseract.so` Run the following commands in terminal Step 1: C# `sudo apt-get install libleptonica-dev libjpeg62` Step 2: C# `ln -s /usr/lib/x86_64-linux-gnu/libtiff.so.6 /usr/lib/x86_64-linux-gnu/libtiff.so.5` Step 3: C# `ln -s /lib/x86_64-linux-gnu/libdl.so.2 /usr/lib/x86_64-linux-gnu/libdl.so`

To resolve this problem, you should install all required dependencies in your Linux machine. Refer to the following steps to installing the missing dependencies. Step 1: Execute the following command in terminal window to check dependencies are installed properly.

C#
ldd  liblept1753.so
        ldd  libSyncfusionTesseract.so

Run the following commands in terminal Step 1:

C#
sudo apt-get install libleptonica-dev libjpeg62

Step 2:

C#
ln -s /usr/lib/x86_64-linux-gnu/libtiff.so.6 /usr/lib/x86_64-linux-gnu/libtiff.so.5

Step 3:

C#
ln -s /lib/x86_64-linux-gnu/libdl.so.2 /usr/lib/x86_64-linux-gnu/libdl.so

OCR not working on Docker net 8.0: Exception has been thrown by target of an invocation.

Exception OCR not working on Docker net 8.0: Exception has been thrown by target of an invocation.

Reason The reported issue occurs due to the missing prerequisite dependencies packages in the Docker container in .NET 8.0 version.

Solution

Exception	OCR not working on Docker net 8.0: Exception has been thrown by target of an invocation.
Reason	The reported issue occurs due to the missing prerequisite dependencies packages in the Docker container in .NET 8.0 version.
Solution	We can resolve the reported issue by installing the tesseract required dependencies by using Docker file. Please refer the below commands. C# `FROM mcr.microsoft.com/dotnet/aspnet:8.0 AS base RUN apt-get update && \ apt-get install -yq --no-install-recommends \ libgdiplus libc6-dev libleptonica-dev libjpeg62 RUN ln -s /usr/lib/x86_64-linux-gnu/libtiff.so.6 /usr/lib/x86_64-linux-gnu/libtiff.so.5 RUN ln -s /lib/x86_64-linux-gnu/libdl.so.2 /usr/lib/x86_64-linux-gnu/libdl.so USER app WORKDIR /app EXPOSE 8080 EXPOSE 8081`

We can resolve the reported issue by installing the tesseract required dependencies by using Docker file. Please refer the below commands.

C#
FROM mcr.microsoft.com/dotnet/aspnet:8.0 AS base

RUN apt-get update && \

apt-get install -yq --no-install-recommends \

libgdiplus libc6-dev libleptonica-dev libjpeg62

RUN ln -s /usr/lib/x86_64-linux-gnu/libtiff.so.6 /usr/lib/x86_64-linux-gnu/libtiff.so.5

RUN ln -s /lib/x86_64-linux-gnu/libdl.so.2 /usr/lib/x86_64-linux-gnu/libdl.so

USER app

WORKDIR /app

EXPOSE 8080

EXPOSE 8081

Default path reference for Syncfusion^® OCR packages

When installing the Syncfusion^® OCR NuGet packages, the tessdata and tesseract path binaries are copied into the runtimes folder. The default binaries path references are added in the package itself, so there is no need to set the manual path.

If you are facing any issues with default reference path in your project. Kindly manually set the Tesseract and Tessdata path using the TessdataPath and TesseractPath in OCRProcessor class. You can find the binaries in the below project in your project location.

Tessdata path	Tessdata default path reference is common for all platform. You can find the tessdata in below path in your project. "bin\Debug\net6.0\runtimes\tessdata"
Tesseract Path	Tesseract binaries are different based on the OS platform and bit version . You can find the tesseract path in below path in your project. Windows Platform: bin\Debug\net6.0\runtimes\win-x86\native (or) bin\Debug\net6.0\runtimes\win-x64\native Linux: bin\Debug\net6.0\runtimes\linux\native Mac: bin\Debug\net6.0.\runtimes\osx\native

System.NullReferenceException in Azure linux VM

Exception	System.NullReferenceException in Azure linux VM
Reason	The problem occurs while extracting the Image from PDF without a Skiasharp dependency in a Linux environment.
Solution	Installing the following Skiasharp NuGet for the Linux environment will resolve the System.NullReferenceException while extracting the Images in Linux. </br></br> Please find the NuGet link below, </br> NuGet: https://www.nuget.org/packages/SkiaSharp.NativeAssets.Linux.NoDependencies/2.88.6

OCR not working on Azure App Service Linux Docker Container: Exception has been thrown by the target of an invocation

Exception	Syncfusion.Pdf.PdfException: Exception has been thrown by the target of an invocation while deploying ASP .NET Core applications in Azure App Service Linux Docker Container
Reason	when publishing the ASP.NET Core application to the Azure App Service Linux Docker container, only the .so, .dly, and .dll files are copied from the runtimes folder to the publish folder. Files in other formats are not copied to the publish folder.
Solution	To resolve this problem, the tessdata folder path must be explicitly set relative to the project directory under runtimes/tessdata. The publish folder can be located in your project directory at this path: obj\Docker\publish. Please refer to the screenshot below:

OCR Processor Troubleshooting

Tesseract has not been initialized exception

Exception has been thrown by the target of an invocation

Can’t be opened because the developer’s identity cannot be confirmed

The OCR processor doesn’t process languages other than English

Text does not recognize properly when performing OCR on a PDF document with low-quality images

OCR not working on Mac: Exception has been thrown by the target of an invocation

Method PerformOCR() causes problems and ignores the tesseract files under WSL.

OCR not working on Linux: Exception has been thrown by the target of an invocation

OCR not working on Docker net 8.0: Exception has been thrown by target of an invocation.

Default path reference for Syncfusion® OCR packages

System.NullReferenceException in Azure linux VM

OCR not working on Azure App Service Linux Docker Container: Exception has been thrown by the target of an invocation

Default path reference for Syncfusion^® OCR packages