Working with Image Extraction

22 Jan 20245 minutes to read

The Essential PDF provides support to extract images from a particular page or an entire PDF document. You can extract the images from a page using the ExtractImages method in the PdfPageBase class.

Refer to the following code snippet to extract the images from a PDF page.

//Load an existing PDF
FileStream docStream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read);
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(docStream);
//Load the first page
PdfPageBase pageBase = loadedDocument.Pages[0];

//Extract images from first page
Stream[] extractedImages = pageBase.ExtractImages();
//Close the document
loadedDocument.Close(true);
//Load an existing PDF
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(fileName);
//Load the first page
PdfPageBase pageBase = loadedDocument.Pages[0];

//Extract images from first page
Image[] extractedImages = pageBase.ExtractImages();
//Close the document
loadedDocument.Close(true);
'Load an existing PDF
Dim loadedDocument As New PdfLoadedDocument(fileName)
'Load the first page
Dim pageBase As PdfPageBase = loadedDocument.Pages(0)

'Extract images from first page
Dim extractedImages As Image() = pageBase.ExtractImages()
'Close the document
loadedDocument.Close(True)

You can download a complete working sample from GitHub.

NOTE

To extract the images from PDF page in .NET Core, you need to include Syncfusion.Pdf.Imaging.Portable assembly reference in .NET Core project.

Image informations

To extract the image properties such as bounds, image index, and more from a page, you can use the ImagesInfo property in the PdfPageBase class.

Refer to the following code snippet to extract the image info from a PDF page.

//Load an existing PDF
FileStream docStream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read);
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(docStream);
//Load the first page
PdfPageBase pageBase = loadedDocument.Pages[0];

//Extracts all the images info from first page
PdfImageInfo[] imagesInfo= pageBase.ExtractImages();
//Close the document
loadedDocument.Close(true);
//Load an existing PDF
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(fileName);
//Load the first page
PdfPageBase pageBase = loadedDocument.Pages[0];

//Extracts all the images info from first page
PdfImageInfo[] imagesInfo= pageBase.ImagesInfo;
//Close the document
loadedDocument.Close(true);
'Load an existing PDF
Dim loadedDocument As New PdfLoadedDocument(fileName)
'Load the first page
Dim pageBase As PdfPageBase = loadedDocument.Pages(0)

'Extracts all the images info from first page
Dim imagesInfo As PdfImageInfo[] = pageBase.ExtractImages()
'Close the document
loadedDocument.Close(True)

You can download a complete working sample from GitHub.

Extract images from PDF documents with better memory consumption and performance

The following code example illustrates how to extract images from an entire PDF document using the PdfDocumentExtractor class.

//Get stream from an existing PDF document.
FileStream inputStream = new FileStream(@"Input.pdf", FileMode.Open, FileAccess.Read);
//Initialize the PDF document extractor.
PdfDocumentExtractor documentExtractor = new PdfDocumentExtractor();
//Load the PDF document.
documentExtractor.Load(inputStream);
//Get the page count.
int pageCount = documentExtractor.PageCount;
// Extract images from the PDF document.
Stream[] images = documentExtractor.ExtractImages();
// Extract images by page range.
Stream[] streams = documentExtractor.ExtractImages(2, 6);
// Release all resources used by the PDF image extractor.
documentExtractor.Dispose();
//Get stream from an existing PDF document.
FileStream inputStream = new FileStream(@"Input.pdf", FileMode.Open, FileAccess.Read);
//Initialize the PDF document extractor.
PdfDocumentExtractor documentExtractor = new PdfDocumentExtractor();
//Load the PDF document.
documentExtractor.Load(inputStream);
//Get the page count.
int pageCount = documentExtractor.PageCount;
// Extract images from the PDF document.
Stream[] images = documentExtractor.ExtractImages();
// Extract images by page range.
Stream[] streams = documentExtractor.ExtractImages(2, 6);
// Release all resources used by the PDF image extractor.
documentExtractor.Dispose();
'Get stream from an existing PDF document.
Dim inputStream As FileStream = New FileStream("Input.pdf", FileMode.Open, FileAccess.Read)
'Initialize the PDF document extractor.
Dim documentExtractor As PdfDocumentExtractor = New PdfDocumentExtractor()
'Load the PDF document.
documentExtractor.Load(inputStream)
'Get the page count.
Dim pageCount As Integer = documentExtractor.PageCount
'Extract images from the PDF document.
Dim images As Stream() = documentExtractor.ExtractImages()
'Extract images by page range.
Dim streams As Stream() = documentExtractor.ExtractImages(2, 6)
'Release all resources used by the PDF image extractor.
documentExtractor.Dispose()

You can download a complete working sample from GitHub.

NOTE

To extract the image information from PDF page in .NET Core, you need to include Syncfusion.Pdf.Imaging.Portable assembly reference in .NET Core project.