Extract Text in Vue PDF Viewer component

Extract text method in the PDF Viewer

The extractText method retrieves text content and, optionally, positional data for elements on one or more pages. It returns a Promise that resolves to an object containing extracted textData (detailed items with bounds) and pageText (concatenated plain text).

Parameters overview:

startIndex — Starting page index (0-based).
endIndex or options — Either the ending page index for a range extraction, or an options object specifying extraction criteria for a single page.
options (optional) — Extraction options such as TextOnly or TextAndBounds to control whether bounds are included.

Returned object shape (example):

textData — Array of objects describing extracted text items, including bounds and page-level text.
pageText — Concatenated plain text for the specified page(s).

Usage of extractText in Syncfusion PDF Viewer Control

Here is an example that demonstrates how to use the extractText method:

import { Component, OnInit } from '@angular/core';
import {
  LinkAnnotationService,
  BookmarkViewService,
  MagnificationService,
  ThumbnailViewService,
  ToolbarService,
  NavigationService,
  AnnotationService,
  TextSearchService,
  TextSelectionService,
  FormFieldsService,
  FormDesignerService,
  PrintService
} from '@syncfusion/ej2-angular-pdfviewer';

@Component({
  selector: 'app-root',
  template: `
    <div class="content-wrapper">
    <button #btn1 (click)="extrctText()">extrctText</button>
    <button #btn2 (click)="extrctsText()">extrctsText</button>
      <ejs-pdfviewer 
        id="pdfViewer"
        [resourceUrl]="resourceUrl"
        [documentPath]="document"
        style="height: 640px; display: block;">
      </ejs-pdfviewer>
    </div>
  `,
  providers: [
    LinkAnnotationService,
    BookmarkViewService,
    MagnificationService,
    ThumbnailViewService,
    ToolbarService,
    NavigationService,
    AnnotationService,
    TextSearchService,
    TextSelectionService,
    FormFieldsService,
    FormDesignerService,
    PrintService
  ]
})
export class AppComponent implements OnInit {
  public document: string = 'https://cdn.syncfusion.com/content/pdf/pdf-succinctly.pdf';
  public resourceUrl: string = 'https://cdn.syncfusion.com/ej2/29.1.33/dist/ej2-pdfviewer-lib';

  ngOnInit(): void { }
  // Function to extract text from a specific page (page 1)
 extrctText(): void {   
  const viewer = (document.getElementById('pdfViewer') as any).ej2_instances[0];
  viewer.extractText(1, 'TextOnly').then((val: any) => {
     console.log('Extracted Text from Page 1:');
      console.log(val);
  });
}

// Function to extract text from a range of pages (pages 0 to 2)
extrctsText(): void {    
  const viewer = (document.getElementById('pdfViewer') as any).ej2_instances[0];
  viewer.extractText(0, 2, 'TextOnly').then((val: any) => {
     console.log('Extracted Text from Pages 0 to 2:');
     console.log(val);
  });
}

}

Explanation:

Single Page Extraction: The first extractText call extracts text from page 1 (startIndex = 1), using the ‘TextOnly’ option for plain text extraction.

Multiple Pages Extraction: The second extractText call extracts text from pages 0 through 2 (startIndex = 0, endIndex = 2), using the TextOnly option for plain text extraction.

View sample in GitHub

Search docs

Ask Syncfusion AI Assistant

Search docs

Ask Syncfusion AI Assistant

Extract text method in the PDF Viewer

Usage of extractText in Syncfusion PDF Viewer Control

Explanation: