- Extract text from a particular page
- Extract text from an entire file
- Extract text with bounds
Contact Support
Extract Text from PDF Files in Windows Forms PDF Viewer
19 Dec 20248 minutes to read
WinForms PDF Viewer allows you to extract the text from a particular page or from the entire PDF file using the ExtractText methods of PdfDocumentView.
NOTE
PDF Viewer uses PDFium as a default rendering engine to extract text from PDF files. Refer to this link for more details about the PDF rendering engines.
Extract text from a particular page
You can extract the text from a page using ExtractText method in PdfDocumentView class. The following code sample explains how to extract the text from the first page.
using Syncfusion.Pdf;
using Syncfusion.Windows.Forms.PdfViewer;
using System.Windows.Forms;
namespace TextExtractionDemo
{
public partial class Form1 : Form
{
public Form1()
{
//Initialize the `PdfDocumentView` control.
PdfDocumentView pdfDocumentView = new PdfDocumentView();
//Load the PDF file.
pdfDocumentView.Load(@"Sample.pdf");
//Extract text from the first page.
TextLines textLines = new TextLines();
string extractedText = pdfDocumentView.ExtractText(0, out textLines);
}
}
}
NOTE
In this method, the text is extracted in the order in which it is written in the document stream and it may not be in the order in which it is viewed in the PDF reader application.
Extract text from an entire file
You can extract text from an entire file by using the following code sample.
using Syncfusion.Pdf;
using Syncfusion.Windows.Forms.PdfViewer;
using System.Windows.Forms;
namespace TextExtractionDemo
{
public partial class Form1 : Form
{
public Form1()
{
//Initialize the `PdfDocumentView` control.
PdfDocumentView pdfDocumentView = new PdfDocumentView();
//Load the PDF file.
pdfDocumentView.Load(@"Sample.pdf");
//Extract text from the file.
TextLines textLines = new TextLines();
string extractedText = string.Empty;
for (int i = 0; i < pdfDocumentView.PageCount; i++)
{
extractedText += pdfDocumentView.ExtractText(i, out textLines);
}
}
}
}
Extract text with bounds
Extract lines
You can get the text line by line along with the bounds using the TextLines property from the ExtractText method. Refer to the following code sample to perform the same.
using Syncfusion.Pdf;
using Syncfusion.Windows.Forms.PdfViewer;
using System.Drawing;
using System.Windows.Forms;
namespace TextExtractionDemo
{
public partial class Form1 : Form
{
public Form1()
{
//Initialize the `PdfDocumentView` control.
PdfDocumentView pdfDocumentView = new PdfDocumentView();
//Load the PDF file.
pdfDocumentView.Load(@"Sample.pdf");
//Initialize the `TextLines`
TextLines textLines = new TextLines();
//Pass the `TextLines` as a parameter to the `ExtractText` method.
pdfDocumentView.ExtractText(0, out textLines);
//Gets specific line from the collection through the index.
TextLine line = textLines[0];
//Get text in the line.
string text = line.Text;
//Get bounds of the line.
RectangleF lineBounds = line.Bounds;
}
}
}
Extract words
You can get the words in a line along with the bounds using the WordCollection property of the TextLine using ExtractText method. Refer to the following code sample to perform the same.
using Syncfusion.Pdf;
using Syncfusion.Windows.Forms.PdfViewer;
using System.Collections.Generic;
using System.Drawing;
using System.Windows.Forms;
namespace TextExtractionDemo
{
public partial class Form1 : Form
{
public Form1()
{
//Initialize the `PdfDocumentView` control.
PdfDocumentView pdfDocumentView = new PdfDocumentView();
//Load the PDF file.
pdfDocumentView.Load(@"Sample.pdf");
//Initialize the `TextLines`
TextLines textLines = new TextLines();
//Pass the `TextLines` as a parameter to the `ExtractText` method.
pdfDocumentView.ExtractText(0, out textLines);
//Gets specific line from the collection through the index.
TextLine line = textLines[0];
//Get the word collection in a line.
List<TextWord> wordCollection = line.WordCollection;
//Get the word
TextWord word = wordCollection[0];
//Get the text
string text = word.Text;
//Get the bounds of the word
RectangleF bounds = word.Bounds;
}
}
}