Perform OCR in WPF (Windows Presentation Foundation)
11 Jan 20232 minutes to read
The Syncfusion .NET OCR library used to extract text from scanned PDFs and images in the WPF application with the help of Google’s Tesseract Optical Character Recognition engine.
Steps to perform OCR on entire PDF document in WPF
Step 1: Create a new WPF application project.
In project configuration window, name your project and select Create.
Step 2: Install the Syncfusion.Pdf.OCR.Wpf NuGet package as a reference to your WPF application from nuget.org.
Step 3: Tesseract assemblies are not added as a reference. They must be kept in the local machine, and the location of the assemblies is passed as a parameter to the OCR processor.
OCRProcessor processor = new OCRProcessor(@"TesseractBinaries/")
Step 4: Place the Tesseract language data {E.g eng.traineddata} in the local system and provide a path to the OCR processor. Please use the OCR language data for other languages using the following link.
OCRProcessor processor = new OCRProcessor("Tesseractbinaries/");
processor.PerformOCR(loadedDocument, "tessdata/");
Step 5: Add a new button in MainWindow.xaml to perform OCR as follows.
<Grid>
<Button Content="Perform OCR" HorizontalAlignment="Left" Margin="279,178,0,0" VerticalAlignment="Top" Height="68" Width="203" Click="Button_Click"/>
</Grid>
Step 6: Include the following namespaces in the MainWindow.xaml.cs file.
using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Parsing;
Step 6: Add the following code to a Button_Click to perform OCR on the entire PDF document using PerformOCR method of the OCRProcessor class.
//Initialize the OCR processor by providing the path of tesseract binaries.
using (OCRProcessor processor = new OCRProcessor(@"TesseractBinaries/"))
{
//Load an existing PDF document.
PdfLoadedDocument loadedDocument = new PdfLoadedDocument("Input.pdf");
//Set the tesseract version.
processor.Settings.TesseractVersion = TesseractVersion.Version4_0;
//Set OCR language to process.
processor.Settings.Language = Languages.English;
//Process OCR by providing the PDF document and Tesseract data.
processor.PerformOCR(loadedDocument, @"Tessdata/");
//Save the OCR processed PDF document in the disk.
loadedDocument.Save("OCR.pdf");
loadedDocument.Close(true);
}
By executing the program, you will get a PDF document as follows.
A complete working sample can be downloaded from GitHub.