alexa
menu

Document Processing

    Show / Hide Table of Contents

    DataExtractor Class

    Provides functionality to detect and extract structural components such as tables, form fields, text segments, paragraphs, and embedded images.

    Inheritance
    System.Object
    DataExtractor
    Namespace: Syncfusion.SmartDataExtractor
    Assembly: Syncfusion.SmartDataExtractor.Base.dll
    Syntax
    public class DataExtractor : Object

    Constructors

    DataExtractor()

    Declaration
    public DataExtractor()

    Properties

    ConfidenceThreshold

    Minimum confidence (0.0–1.0) required for detected elements to be included in results. Default: 0.6

    Declaration
    public double ConfidenceThreshold { get; set; }
    Property Value
    Type
    System.Double

    EnableFormDetection

    When true, form detection/recognition will be performed. Defaults to true.

    Declaration
    public bool EnableFormDetection { get; set; }
    Property Value
    Type
    System.Boolean

    EnableTableDetection

    When true, table detection/extraction will be performed. Defaults to true.

    Declaration
    public bool EnableTableDetection { get; set; }
    Property Value
    Type
    System.Boolean

    FormRecognizeOptions

    Options that control form recognition behavior.

    Declaration
    public FormRecognizeOptions FormRecognizeOptions { get; set; }
    Property Value
    Type
    FormRecognizeOptions

    PageRange

    Specific pages to process (1-based). If null, all pages are processed.

    Declaration
    public int[, ] PageRange { get; set; }
    Property Value
    Type
    System.Int32[,]

    TableExtractionOptions

    Options that control table detection and extraction behavior.

    Declaration
    public TableExtractionOptions TableExtractionOptions { get; set; }
    Property Value
    Type
    TableExtractionOptions

    Methods

    ExtractDataAsJson(Stream)

    Declaration
    public string ExtractDataAsJson(Stream stream)
    Parameters
    Type Name Description
    System.IO.Stream stream
    Returns
    Type
    System.String

    ExtractDataAsJsonAsync(Stream, CancellationToken)

    Asynchronously extracts structured data (tables/forms) and returns JSON.

    Declaration
    public Task<string> ExtractDataAsJsonAsync(Stream inputStream, CancellationToken cancellationToken = null)
    Parameters
    Type Name Description
    System.IO.Stream inputStream

    A readable stream containing an image or PDF.

    System.Threading.CancellationToken cancellationToken

    Token to observe while waiting for the task to complete.

    Returns
    Type Description
    System.Threading.Tasks.Task<System.String>

    A task whose result is a JSON string representing extracted content.

    ExtractDataAsMarkdown(Stream)

    Extracts structured data and returns a structured Syncfusion.Office.Markdown.MarkdownDocument. This runs the same OCR/detection pipeline and returns a best-effort Markdown DOM.

    Declaration
    public string ExtractDataAsMarkdown(Stream inputStream)
    Parameters
    Type Name Description
    System.IO.Stream inputStream
    Returns
    Type
    System.String

    ExtractDataAsMarkdownAsync(Stream, CancellationToken)

    Asynchronously extracts structured data and returns a Markdown string.

    Declaration
    public Task<string> ExtractDataAsMarkdownAsync(Stream inputStream, CancellationToken cancellationToken = null)
    Parameters
    Type Name Description
    System.IO.Stream inputStream

    A readable stream containing an image or PDF.

    System.Threading.CancellationToken cancellationToken

    Token to observe while waiting for the task to complete.

    Returns
    Type Description
    System.Threading.Tasks.Task<System.String>

    A task whose result is a Markdown string representing extracted content.

    ExtractDataAsPdfDocument(Stream)

    Declaration
    public PdfLoadedDocument ExtractDataAsPdfDocument(Stream inputStream)
    Parameters
    Type Name Description
    System.IO.Stream inputStream
    Returns
    Type
    PdfLoadedDocument

    ExtractDataAsPdfDocumentAsync(Stream, CancellationToken)

    Declaration
    public Task<PdfLoadedDocument> ExtractDataAsPdfDocumentAsync(Stream inputStream, CancellationToken cancellationToken = null)
    Parameters
    Type Name Description
    System.IO.Stream inputStream
    System.Threading.CancellationToken cancellationToken
    Returns
    Type
    System.Threading.Tasks.Task<PdfLoadedDocument>

    ExtractDataAsPdfStream(Stream)

    Declaration
    public Stream ExtractDataAsPdfStream(Stream inputStream)
    Parameters
    Type Name Description
    System.IO.Stream inputStream
    Returns
    Type
    System.IO.Stream

    ExtractDataAsPdfStreamAsync(Stream, CancellationToken)

    Declaration
    public Task<Stream> ExtractDataAsPdfStreamAsync(Stream inputStream, CancellationToken cancellationToken = null)
    Parameters
    Type Name Description
    System.IO.Stream inputStream
    System.Threading.CancellationToken cancellationToken
    Returns
    Type
    System.Threading.Tasks.Task<System.IO.Stream>
    Back to top Generated by DocFX
    Copyright © 2001 - 2026 Syncfusion Inc. All Rights Reserved