Image Extraction in TypeScript PDF library
18 Dec 20254 minutes to read
The PDF provides support to extract images from PDF documents and retrieve their properties such as bounds, index and raw byte data.
NOTE
For redaction features, you need to install the
@syncfusion/ej2-pdf-data-extractpackage as an add-on. Please verify the platform’s actual root directory where theopenjpegfile is extracted. Depending on the platform, the root path may vary. Check which root folder is being used by reviewing the path referenced in the Getting Started page.
Extract images from a PDF
This code demonstrates how to extract embedded images and their metadata from a PDF using PdfDataExtractor. It loads a PDF, retrieves all PdfEmbeddedImage entries across pages, accesses the first image’s raw byte data, and reads its properties format, page index, occurrence index, bounds, interpolation status, and masking flags before saving/disposing resources by destroying the document.
import { PdfDocument, Size } from '@syncfusion/ej2-pdf';
import { PdfDataExtractor, PdfEmbeddedImage} from '@syncfusion/ej2-pdf-data-extract';
// Load an existing PDF document
let document: PdfDocument = new PdfDocument(data);
// Initialize a new instance of the `PdfDataExtractor` class
let extractor: PdfDataExtractor = new PdfDataExtractor(document);
// Extract collection of `PdfEmbeddedImage` from the PDF document
let imageInfoCollection: PdfEmbeddedImage[] = extractor.extractImages({
startPageIndex: 0,
endPageIndex: document.pageCount - 1
});
// Access the first extracted image and its raw data
let imageInfo: PdfEmbeddedImage = imageInfoCollection[0];
// Get the raw byte data of the extracted image
let imageData: Uint8Array = imageInfo.data;
// Gets the image format.
let type = imageInfo.type;
// Gets the page index of the image
let pageIndex = imageInfo.pageIndex;
// Gets the index of the image occurrence within the PDF page
let index = imageInfo.index;
// Gets the bounds of the image
let bounds = imageInfo.bounds;
// Gets the XObject resource name for this image
let name: string = imageInfo.resourceName;
// Gets the boolean flag indicating whether the image is interpolated or not
let isImageInterpolated = imageInfo.isImageInterpolated;
// Gets the boolean flag indicating whether the image is masked or not
let isImageMasked = imageInfo.isImageMasked;
// Gets the boolean flag indicating whether the image is soft masked or not
let IsSoftMasked: boolean = imageInfo.isSoftMasked;
// Gets the image physical dimension
let physicalDimension: Size = imageInfo.physicalDimension;
// Destroy the PDF document
document.destroy();// Load an existing PDF document
var document = new ej.pdf.PdfDocument(data);
// Initialize a new instance of the `PdfDataExtractor` class
var extractor= new ej.pdfdataextract.PdfDataExtractor(document);
// Extract collection of `PdfEmbeddedImage` from the PDF document
var imageInfoCollection = extractor.extractImages({
startPageIndex: 0,
endPageIndex: document.pageCount - 1
});
// Access the first extracted image and its raw data
var imageInfo = imageInfoCollection[0];
// Get the raw byte data of the extracted image
var imageData = imageInfo.data;
// Gets the image format.
var type = imageInfo.type;
// Gets the page index of the image.
var pageIndex = imageInfo.pageIndex;
// Gets the index of the image occurrence within the PDF page
var index = imageInfo.index;
// Gets the bounds of the image.
var bounds = imageInfo.bounds;
// Gets the XObject resource name for this image
var name = imageInfo.resourceName;
// Gets the boolean flag indicating whether the image is interpolated or not
var isImageInterpolated = imageInfo.isImageInterpolated;
//// Gets the boolean flag indicating whether the image is masked or not
var isImageMasked = imageInfo.isImageMasked;
// Gets the boolean flag indicating whether the image is soft masked or not
var IsSoftMasked = imageInfo.isSoftMasked;
// Gets the image physical dimension
var physicalDimension = imageInfo.physicalDimension;
// Destroy the PDF document
document.destroy();