Provides functionality to detect and extract structural components such as tables, form fields, text segments, paragraphs, and embedded images.
Inheritance
System.Object
DataExtractor
Assembly: Syncfusion.SmartDataExtractor.Base.dll
public class DataExtractor : Object
Constructors
Declaration
Properties
Minimum confidence (0.0–1.0) required for detected elements to be included in results. Default: 0.6
Declaration
public double ConfidenceThreshold { get; set; }
Property Value
When true, form detection/recognition will be performed.
Defaults to true.
Declaration
public bool EnableFormDetection { get; set; }
Property Value
When true, table detection/extraction will be performed.
Defaults to true.
Declaration
public bool EnableTableDetection { get; set; }
Property Value
Options that control form recognition behavior.
Declaration
public FormRecognizeOptions FormRecognizeOptions { get; set; }
Property Value
Specific pages to process (1-based). If null, all pages are processed.
Declaration
public int[, ] PageRange { get; set; }
Property Value
Options that control table detection and extraction behavior.
Declaration
public TableExtractionOptions TableExtractionOptions { get; set; }
Property Value
Methods
Declaration
public string ExtractDataAsJson(Stream stream)
Parameters
| Type |
Name |
Description |
| System.IO.Stream |
stream |
|
Returns
Asynchronously extracts structured data (tables/forms) and returns JSON.
Declaration
public Task<string> ExtractDataAsJsonAsync(Stream inputStream, CancellationToken cancellationToken = null)
Parameters
| Type |
Name |
Description |
| System.IO.Stream |
inputStream |
A readable stream containing an image or PDF.
|
| System.Threading.CancellationToken |
cancellationToken |
Token to observe while waiting for the task to complete.
|
Returns
| Type |
Description |
| System.Threading.Tasks.Task<System.String> |
A task whose result is a JSON string representing extracted content.
|
Extracts structured data and returns a structured Syncfusion.Office.Markdown.MarkdownDocument.
This runs the same OCR/detection pipeline and returns a best-effort Markdown DOM.
Declaration
public string ExtractDataAsMarkdown(Stream inputStream)
Parameters
| Type |
Name |
Description |
| System.IO.Stream |
inputStream |
|
Returns
Asynchronously extracts structured data and returns a Markdown string.
Declaration
public Task<string> ExtractDataAsMarkdownAsync(Stream inputStream, CancellationToken cancellationToken = null)
Parameters
| Type |
Name |
Description |
| System.IO.Stream |
inputStream |
A readable stream containing an image or PDF.
|
| System.Threading.CancellationToken |
cancellationToken |
Token to observe while waiting for the task to complete.
|
Returns
| Type |
Description |
| System.Threading.Tasks.Task<System.String> |
A task whose result is a Markdown string representing extracted content.
|
Declaration
public PdfLoadedDocument ExtractDataAsPdfDocument(Stream inputStream)
Parameters
| Type |
Name |
Description |
| System.IO.Stream |
inputStream |
|
Returns
Declaration
public Task<PdfLoadedDocument> ExtractDataAsPdfDocumentAsync(Stream inputStream, CancellationToken cancellationToken = null)
Parameters
| Type |
Name |
Description |
| System.IO.Stream |
inputStream |
|
| System.Threading.CancellationToken |
cancellationToken |
|
Returns
Declaration
public Stream ExtractDataAsPdfStream(Stream inputStream)
Parameters
| Type |
Name |
Description |
| System.IO.Stream |
inputStream |
|
Returns
Declaration
public Task<Stream> ExtractDataAsPdfStreamAsync(Stream inputStream, CancellationToken cancellationToken = null)
Parameters
| Type |
Name |
Description |
| System.IO.Stream |
inputStream |
|
| System.Threading.CancellationToken |
cancellationToken |
|
Returns
| Type |
| System.Threading.Tasks.Task<System.IO.Stream> |