How can I help you?
Extract text method in the PDF Viewer
18 Mar 20264 minutes to read
Overview
The extractText method retrieves text content and, optionally, positional data for elements on one or more pages. It returns a Promise that resolves to an object containing extracted textData (detailed items with bounds) and pageText (concatenated plain text).
Parameters
-
startIndex— Starting page index (0-based). -
endIndexoroptions— Either the ending page index for a range extraction, or an options object specifying extraction criteria for a single page. -
options(optional) — Extraction options such asTextOnlyorTextAndBoundsto control whether bounds are included.
Returned object
-
textData— Array of objects describing extracted text items, including bounds and page-level text. -
pageText— Concatenated plain text for the specified page(s).
Complete example
Here is an example that demonstrates how to use the extractText method along with event handling:
import {
PdfViewerComponent, Toolbar, Magnification, Navigation, LinkAnnotation, BookmarkView,
ThumbnailView, Print, TextSelection, Annotation, TextSearch, FormFields, FormDesigner,
PageOrganizer, Inject, ExtractTextOption
} from '@syncfusion/ej2-react-pdfviewer';
import { useRef, RefObject } from 'react';
export default function App() {
const viewerRef: RefObject<PdfViewerComponent | null> = useRef<PdfViewerComponent>(null);
const extractText = async () => {
console.log(await viewerRef.current?.extractText(1, ExtractTextOption.TextOnly));
}
const extractsText = async () => {
console.log(await viewerRef.current?.extractText(0, 2, ExtractTextOption.TextOnly));
}
return (
<div style={{ height: '100vh' }}>
<button onClick={extractText}>Single page</button>
<button onClick={extractsText}>Multiple pages</button>
<PdfViewerComponent
id="PdfViewer"
ref={viewerRef}
documentPath="https://cdn.syncfusion.com/content/pdf/pdf-succinctly.pdf"
resourceUrl="https://cdn.syncfusion.com/ej2/32.2.3/dist/ej2-pdfviewer-lib"
style={{ height: '100%' }}
>
<Inject
services={[
Toolbar, Magnification, Navigation, Annotation, LinkAnnotation, BookmarkView,
ThumbnailView, Print, TextSelection, TextSearch, FormFields, FormDesigner, PageOrganizer
]}
/>
</PdfViewerComponent>
</div>
);
}Expected result:
- Clicking single page, extracts text from page 1 (
startIndex = 1) usingTextOnly. - Clicking multiple pages, extracts text from pages 0–2 (
startIndex = 0, endIndex = 2) usingTextOnly.