How can I help you?
Working with Table Extraction
25 May 202616 minutes to read
The Syncfusion® Smart Table Extractor is a .NET library used to extract structured table data from PDF and image files.
To quickly get started with extracting table data from PDF and image files in ASP.NET Core using the Smart Table Extractor library, refer to this video tutorial:
Extract Table Data as JSON from PDF or Image
To extract structured table data from a PDF document using the ExtractTableAsJson method of the TableExtractor class, refer to the following code
using System.Text;
using Syncfusion.SmartTableExtractor;
//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
// Initialize the Table Extractor
TableExtractor extractor = new TableExtractor();
//Extract table data from the PDF document as JSON string.
string data = extractor.ExtractTableAsJson(stream);
//Save the extracted JSON data into an output file.
File.WriteAllText("Output.json", data, Encoding.UTF8);
}using System.Text;
using Syncfusion.SmartTableExtractor;
//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
//Initialize the Smart Table Extractor.
TableExtractor extractor = new TableExtractor();
//Extract table data from the PDF document as JSON string.
string data = extractor.ExtractTableAsJson(stream);
//Save the extracted JSON data into an output file.
File.WriteAllText("Output.json", data, Encoding.UTF8);
}NOTE
To convert an image instead of a PDF, replace the input stream with the image file (for example, Input.jpg or Input.png). The rest of the code remains unchanged.
You can download a complete working sample from GitHub.
Extract Table Data as Markdown from PDF or Image
To extract structured table data from a PDF document using the ExtractTableAsMarkdown method of the TableExtractor class, refer to the following code
using System.Text;
using Syncfusion.SmartTableExtractor;
//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
// Initialize the Table Extractor
TableExtractor extractor = new TableExtractor();
//Extract table data from the PDF document as markdown.
string data = extractor.ExtractTableAsMarkdown(stream);
//Save the extracted markdown data into an output file.
File.WriteAllText("Output.md", data, Encoding.UTF8);
}using System.Text;
using Syncfusion.SmartTableExtractor;
//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
// Initialize the Table Extractor
TableExtractor extractor = new TableExtractor();
//Extract table data from the PDF document as markdown.
string data = extractor.ExtractTableAsMarkdown(stream);
//Save the extracted markdown data into an output file.
File.WriteAllText("Output.md", data, Encoding.UTF8);
}You can download a complete working sample from GitHub.
NOTE
To convert an image instead of a PDF, replace the input stream with the image file (for example, Input.jpg or Input.png). The rest of the code remains unchanged.
Extract Table Data within a Specific Page Range
Extract as JSON
To extract structured table data from a specific range of pages in a PDF document using the ExtractTableAsJson method of the TableExtractor class, refer to the following code example:
using System.Text;
using Syncfusion.SmartTableExtractor;
//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
//Initialize the Smart Table Extractor.
TableExtractor extractor = new TableExtractor();
//Configure table extraction options to specify the page range for detection.
TableExtractionOptions options = new TableExtractionOptions();
options.PageRange = new int[,] { { 2, 4 } };
//Assign the configured options to the extractor.
extractor.TableExtractionOptions = options;
//Extract table data from the specified page range as a JSON string.
string data = extractor.ExtractTableAsJson(stream);
//Save the extracted JSON data into an output file.
File.WriteAllText("Output.json", data, Encoding.UTF8);
}using System.Text;
using Syncfusion.SmartTableExtractor;
//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
//Initialize the Smart Table Extractor.
TableExtractor extractor = new TableExtractor();
//Configure table extraction options to specify the page range for detection.
TableExtractionOptions options = new TableExtractionOptions();
options.PageRange = new int[,] { { 2, 4 } };
//Assign the configured options to the extractor.
extractor.TableExtractionOptions = options;
//Extract table data from the specified page range as a JSON string.
string data = extractor.ExtractTableAsJson(stream);
//Save the extracted JSON data into an output file.
File.WriteAllText("Output.json", data, Encoding.UTF8);
}You can download a complete working sample from GitHub.
Extract as Markdown
To extract structured table data from a specific range of pages in a PDF document or Image using the ExtractTableAsMarkdown method of the TableExtractor class, refer to the following code example:
using System.Text;
using Syncfusion.SmartTableExtractor;
//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
//Initialize the Smart Table Extractor.
TableExtractor extractor = new TableExtractor();
//Set the page range for extraction (pages 1 to 3).
TableExtractionOptions options = new TableExtractionOptions();
options.PageRange = new int[,] { { 1, 3 } };
extractor.TableExtractionOptions = options;
//Extract table data from the specified page range as a Markdown string.
string data = extractor.ExtractTableAsMarkdown(stream);
//Save the extracted output as a new Markdown file.
File.WriteAllText("Output.md", data, Encoding.UTF8);
}using System.Text;
using Syncfusion.SmartTableExtractor;
//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
//Initialize the Smart Table Extractor.
TableExtractor extractor = new TableExtractor();
//Set the page range for extraction (pages 1 to 3).
TableExtractionOptions options = new TableExtractionOptions();
options.PageRange = new int[,] { { 1, 3 } };
extractor.TableExtractionOptions = options;
//Extract table data from the specified page range as a Markdown string.
string data = extractor.ExtractTableAsMarkdown(stream);
//Save the extracted output as a new Markdown file.
File.WriteAllText("Output.md", data, Encoding.UTF8);
}Extract Table Data Asynchronously from PDF or Image
To extract table data asynchronously with cancellation support using the ExtractTableAsJsonAsync method of the TableExtractor class, refer to the following code example:
using System.Text;
using Syncfusion.SmartTableExtractor;
//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
//Initialize the Smart Table Extractor and assign the configured options.
TableExtractor tableExtractor = new TableExtractor();
//Create a cancellation token with a timeout of 30 seconds to control the async operation.
CancellationTokenSource cts = new CancellationTokenSource(TimeSpan.FromSeconds(30));
//Call the asynchronous extraction API to extract table data as a JSON string.
string data = await tableExtractor.ExtractTableAsJsonAsync(stream, cts.Token);
//Save the extracted JSON data into an output file.
File.WriteAllText("Output.json", data, Encoding.UTF8);
}using System.Text;
using Syncfusion.SmartTableExtractor;
//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
//Initialize the Smart Table Extractor and assign the configured options.
TableExtractor tableExtractor = new TableExtractor();
//Create a cancellation token with a timeout of 30 seconds to control the async operation.
CancellationTokenSource cts = new CancellationTokenSource(TimeSpan.FromSeconds(30));
//Call the asynchronous extraction API to extract table data as a JSON string.
string data = await tableExtractor.ExtractTableAsJsonAsync(stream, cts.Token);
//Save the extracted JSON data into an output file.
File.WriteAllText("Output.json", data, Encoding.UTF8);
}You can download a complete working sample from GitHub.
Table Extraction Options
Disable Border-less Table Detection
To disable detection of tables without visible borders in a PDF document or Image using the ExtractTableAsJson method of the TableExtractor class, refer to the following code examples.
using System.Text;
using Syncfusion.SmartTableExtractor;
//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
//Initialize the Smart Table Extractor.
TableExtractor extractor = new TableExtractor();
//Configure the table extraction option to disable border-less tables in the document.
TableExtractionOptions options = new TableExtractionOptions();
//By default *DetectBorderlessTables is true*
options.DetectBorderlessTables = false;
//Assign the configured options to the extractor.
extractor.TableExtractionOptions = options;
//Extract table data from the PDF document as a JSON string.
string data = extractor.ExtractTableAsJson(stream);
//Save the extracted JSON data into an output file.
File.WriteAllText("Output.json", data, Encoding.UTF8);
}using System.Text;
using Syncfusion.SmartTableExtractor;
//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
//Initialize the Smart Table Extractor.
TableExtractor extractor = new TableExtractor();
//Configure the table extraction option to detect border-less tables in the document.
TableExtractionOptions options = new TableExtractionOptions();
options.DetectBorderlessTables = true;
//Assign the configured options to the extractor.
extractor.TableExtractionOptions = options;
//Extract table data from the PDF document as a JSON string.
string data = extractor.ExtractTableAsJson(stream);
//Save the extracted JSON data into an output file.
File.WriteAllText("Output.json", data, Encoding.UTF8);
}You can download a complete working sample from GitHub.
Apply Confidence Threshold for Table Data Extraction
To apply confidence thresholding when extracting table data from a PDF document using the ExtractTableAsJson method of the TableExtractor class, refer to the following code example:
using System.Text;
using Syncfusion.SmartTableExtractor;
//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
//Initialize the Smart Table Extractor.
TableExtractor extractor = new TableExtractor();
//Configure table extraction options to set the confidence threshold for detection.
TableExtractionOptions options = new TableExtractionOptions();
options.ConfidenceThreshold = 0.6;
//Assign the configured options to the extractor.
extractor.TableExtractionOptions = options;
//Extract table data from the PDF document as a JSON string.
string data = extractor.ExtractTableAsJson(stream);
//Save the extracted JSON data into an output file.
File.WriteAllText("Output.json", data, Encoding.UTF8);
}using System.Text;
using Syncfusion.SmartTableExtractor;
//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
//Initialize the Smart Table Extractor.
TableExtractor extractor = new TableExtractor();
//Configure table extraction options to set the confidence threshold for detection.
TableExtractionOptions options = new TableExtractionOptions();
options.ConfidenceThreshold = 0.6;
//Assign the configured options to the extractor.
extractor.TableExtractionOptions = options;
//Extract table data from the PDF document as a JSON string.
string data = extractor.ExtractTableAsJson(stream);
//Save the extracted JSON data into an output file.
File.WriteAllText("Output.json", data, Encoding.UTF8);
}You can download a complete working sample from GitHub.
PDF to Markdown Preservation Mapping
This section illustrates how table elements in PDF documents are converted and preserved in Markdown format, ensuring that document structure and formatting remain consistent during the PDF‑to‑Markdown conversion process.
| PDF Elements | Preservation in Markdown |
|---|---|
| Table | Table |
| Text Inline Styles | Bold and Italic |