How can I help you?
Welcome to Syncfusion Data Extraction Library
25 May 20261 minute to read
Syncfusion® Smart Data Extractor is a high‑performance, deterministic C# library for extracting complete document structures from PDFs and images.
List of Data Extraction Libraries
-
Smart Data Extractor - analyzes visual layout lines, boxes, labels, and alignment to identify and extract elements such as tables, text blocks, images, headers, footers, and form fields. Each element is returned with per‑field confidence scores for immediate review, export, or integration.
- Smart Table Extractor - detects table regions, header rows, columns, and merged cells (cell spans). Provides per‑cell confidence scores and delivers structured exports ready for downstream processing.
-
Smart Form Recognizer - analyzes layout cues such as lines, boxes, and circles to detect form regions. It extracts common controls including text fields, checkboxes, radio buttons, and signature fields, producing clean JSON output with confidence scores. When form fields are identified, the library can also generate a fillable PDF for immediate use.
-
Optical character recognition (OCR) - a high‑performance .NET library for accurate text recognition from scanned documents, images, and PDF files. It processes raster images and document pages to recognize printed text, analyze page layouts, and extract textual content programmatically.
- Conversion – extracts data from PDFs or images and produces output in developer‑friendly formats such as JSON and Markdown (MD), enabling seamless integration into applications.