Working with Image Extraction

The Essential® PDF provides support to extract images from a particular page or an entire PDF document. You can extract the images from a page using the ExtractImages method in the PdfPageBase class.

Refer to the following code snippet to extract the images from a PDF page.

//Load an existing PDF
FileStream docStream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read);
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(docStream);
//Load the first page
PdfPageBase pageBase = loadedDocument.Pages[0];

//Extract images from first page
Stream[] extractedImages = pageBase.ExtractImages();
//Close the document
loadedDocument.Close(true);
//Load an existing PDF
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(fileName);
//Load the first page
PdfPageBase pageBase = loadedDocument.Pages[0];

//Extract images from first page
Image[] extractedImages = pageBase.ExtractImages();
//Close the document
loadedDocument.Close(true);
'Load an existing PDF
Dim loadedDocument As New PdfLoadedDocument(fileName)
'Load the first page
Dim pageBase As PdfPageBase = loadedDocument.Pages(0)

'Extract images from first page
Dim extractedImages As Image() = pageBase.ExtractImages()
'Close the document
loadedDocument.Close(True)

You can download a complete working sample from GitHub.

NOTE

To extract images from PDF page in .NET Core application, add the Syncfusion.Pdf.Imaging.Net.Core package to your project.

Image informations

To extract the image properties such as bounds, image index, and more from a page, you can use the ImagesInfo property in the PdfPageBase class.

Refer to the following code snippet to extract the image info from a PDF page.

//Load an existing PDF
FileStream docStream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read);
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(docStream);
//Load the first page
PdfPageBase pageBase = loadedDocument.Pages[0];

//Extracts all the images info from first page
PdfImageInfo[] imagesInfo= pageBase.ExtractImages();
//Close the document
loadedDocument.Close(true);
//Load an existing PDF
PdfLoadedDocument loadedDocument = new PdfLoadedDocument(fileName);
//Load the first page
PdfPageBase pageBase = loadedDocument.Pages[0];

//Extracts all the images info from first page
PdfImageInfo[] imagesInfo= pageBase.ImagesInfo;
//Close the document
loadedDocument.Close(true);
'Load an existing PDF
Dim loadedDocument As New PdfLoadedDocument(fileName)
'Load the first page
Dim pageBase As PdfPageBase = loadedDocument.Pages(0)

'Extracts all the images info from first page
Dim imagesInfo As PdfImageInfo[] = pageBase.ExtractImages()
'Close the document
loadedDocument.Close(True)

You can download a complete working sample from GitHub.

Extract images from PDF documents with better memory consumption and performance

The following code example illustrates how to extract images from an entire PDF document using the PdfDocumentExtractor class.

//Get stream from an existing PDF document.
FileStream inputStream = new FileStream(@"Input.pdf", FileMode.Open, FileAccess.Read);
//Initialize the PDF document extractor.
PdfDocumentExtractor documentExtractor = new PdfDocumentExtractor();
//Load the PDF document.
documentExtractor.Load(inputStream);
//Get the page count.
int pageCount = documentExtractor.PageCount;
// Extract images from the PDF document.
Stream[] images = documentExtractor.ExtractImages();
// Extract images by page range.
Stream[] streams = documentExtractor.ExtractImages(2, 6);
// Release all resources used by the PDF image extractor.
documentExtractor.Dispose();
//Get stream from an existing PDF document.
FileStream inputStream = new FileStream(@"Input.pdf", FileMode.Open, FileAccess.Read);
//Initialize the PDF document extractor.
PdfDocumentExtractor documentExtractor = new PdfDocumentExtractor();
//Load the PDF document.
documentExtractor.Load(inputStream);
//Get the page count.
int pageCount = documentExtractor.PageCount;
// Extract images from the PDF document.
Stream[] images = documentExtractor.ExtractImages();
// Extract images by page range.
Stream[] streams = documentExtractor.ExtractImages(2, 6);
// Release all resources used by the PDF image extractor.
documentExtractor.Dispose();
'Get stream from an existing PDF document.
Dim inputStream As FileStream = New FileStream("Input.pdf", FileMode.Open, FileAccess.Read)
'Initialize the PDF document extractor.
Dim documentExtractor As PdfDocumentExtractor = New PdfDocumentExtractor()
'Load the PDF document.
documentExtractor.Load(inputStream)
'Get the page count.
Dim pageCount As Integer = documentExtractor.PageCount
'Extract images from the PDF document.
Dim images As Stream() = documentExtractor.ExtractImages()
'Extract images by page range.
Dim streams As Stream() = documentExtractor.ExtractImages(2, 6)
'Release all resources used by the PDF image extractor.
documentExtractor.Dispose()

You can download a complete working sample from GitHub.

NOTE

To extract images from PDF page in .NET Core application, add the Syncfusion.Pdf.Imaging.Net.Core package to your project.

Troubleshooting and FAQ’s

Missing SkiaSharp Native Assets on Ubuntu ARM64

Issue Image extraction fails on Ubuntu 22.04.5 LTS servers running on ARM64 architecture due to missing SkiaSharp native dependencies.
Reason SkiaSharp requires platform-specific native binaries for graphics operations:
1.The default SkiaSharp package doesn't include ARM64 Linux binaries.
2.Ubuntu ARM64 environments lack these native assets by default.
3.SkiaSharp fails to initialize without these dependencies.
Solution Add the appropriate native assets package based on your environment:
1.For Standard Linux Environments
(Ubuntu, Alpine, CentOS, Debian, Fedora, RHEL, Azure App Service, Google App Engine)

  • C#
  • dotnet add package SkiaSharp.NativeAssets.Linux --version 3.116.1
    2.For cloud native deployments
    (AWS Lambda, AWS Elastic Beanstalk)

  • C#
  • dotnet add package SkiaSharp.NativeAssets.Linux.NoDependencies --version 3.116.1
    For Ubuntu 22.04.5 LTS on ARM64, use `SkiaSharp.NativeAssets.Linux`
    Check your `.csproj` file for the following entry: ``