Working with Table Extraction

25 May 202616 minutes to read

The Syncfusion^® Smart Table Extractor is a .NET library used to extract structured table data from PDF and image files.

To quickly get started with extracting table data from PDF and image files in ASP.NET Core using the Smart Table Extractor library, refer to this video tutorial:

Extract Table Data as JSON from PDF or Image

To extract structured table data from a PDF document using the ExtractTableAsJson method of the TableExtractor class, refer to the following code

C# [Cross-platform]
C# [Windows-specific]
using System.Text;
using Syncfusion.SmartTableExtractor;

//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
    // Initialize the Table Extractor
    TableExtractor extractor = new TableExtractor();
    //Extract table data from the PDF document as JSON string.
    string data = extractor.ExtractTableAsJson(stream);
    //Save the extracted JSON data into an output file.
    File.WriteAllText("Output.json", data, Encoding.UTF8);
}
using System.Text;
using Syncfusion.SmartTableExtractor;

//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
    //Initialize the Smart Table Extractor.
    TableExtractor extractor = new TableExtractor();
    //Extract table data from the PDF document as JSON string.
    string data = extractor.ExtractTableAsJson(stream);
    //Save the extracted JSON data into an output file.
    File.WriteAllText("Output.json", data, Encoding.UTF8);
}

NOTE

To convert an image instead of a PDF, replace the input stream with the image file (for example, Input.jpg or Input.png). The rest of the code remains unchanged.

You can download a complete working sample from GitHub.

Extract Table Data as Markdown from PDF or Image

To extract structured table data from a PDF document using the ExtractTableAsMarkdown method of the TableExtractor class, refer to the following code

C# [Cross-platform]
C# [Windows-specific]
using System.Text;
using Syncfusion.SmartTableExtractor;

//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
    // Initialize the Table Extractor
    TableExtractor extractor = new TableExtractor();
    //Extract table data from the PDF document as markdown.
    string data = extractor.ExtractTableAsMarkdown(stream);
    //Save the extracted markdown data into an output file.
    File.WriteAllText("Output.md", data, Encoding.UTF8);
}
using System.Text;
using Syncfusion.SmartTableExtractor;

//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
    // Initialize the Table Extractor
    TableExtractor extractor = new TableExtractor();
    //Extract table data from the PDF document as markdown.
    string data = extractor.ExtractTableAsMarkdown(stream);
    //Save the extracted markdown data into an output file.
    File.WriteAllText("Output.md", data, Encoding.UTF8);
}

You can download a complete working sample from GitHub.

NOTE

To convert an image instead of a PDF, replace the input stream with the image file (for example, Input.jpg or Input.png). The rest of the code remains unchanged.

Extract Table Data within a Specific Page Range

Extract as JSON

To extract structured table data from a specific range of pages in a PDF document using the ExtractTableAsJson method of the TableExtractor class, refer to the following code example:

C# [Cross-platform]
C# [Windows-specific]
using System.Text;
using Syncfusion.SmartTableExtractor;

//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
    //Initialize the Smart Table Extractor.
    TableExtractor extractor = new TableExtractor();
    //Configure table extraction options to specify the page range for detection.
    TableExtractionOptions options = new TableExtractionOptions();
    options.PageRange = new int[,] { { 2, 4 } };
    //Assign the configured options to the extractor.
    extractor.TableExtractionOptions = options;
    //Extract table data from the specified page range as a JSON string.
    string data = extractor.ExtractTableAsJson(stream);
    //Save the extracted JSON data into an output file.
    File.WriteAllText("Output.json", data, Encoding.UTF8);
}
using System.Text;
using Syncfusion.SmartTableExtractor;

//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
    //Initialize the Smart Table Extractor.
    TableExtractor extractor = new TableExtractor();
    //Configure table extraction options to specify the page range for detection.
    TableExtractionOptions options = new TableExtractionOptions();
    options.PageRange = new int[,] { { 2, 4 } };
    //Assign the configured options to the extractor.
    extractor.TableExtractionOptions = options;
    //Extract table data from the specified page range as a JSON string.
    string data = extractor.ExtractTableAsJson(stream);
    //Save the extracted JSON data into an output file.
    File.WriteAllText("Output.json", data, Encoding.UTF8);
}

You can download a complete working sample from GitHub.

Extract as Markdown

To extract structured table data from a specific range of pages in a PDF document or Image using the ExtractTableAsMarkdown method of the TableExtractor class, refer to the following code example:

C# [Cross-platform]
C# [Windows-specific]
using System.Text;
using Syncfusion.SmartTableExtractor;

//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
    //Initialize the Smart Table Extractor.
    TableExtractor extractor = new TableExtractor();
    //Set the page range for extraction (pages 1 to 3).
    TableExtractionOptions options = new TableExtractionOptions();
    options.PageRange = new int[,] { { 1, 3 } };
    extractor.TableExtractionOptions = options;
    //Extract table data from the specified page range as a Markdown string.
    string data = extractor.ExtractTableAsMarkdown(stream);
    //Save the extracted output as a new Markdown file.
    File.WriteAllText("Output.md", data, Encoding.UTF8);
}
using System.Text;
using Syncfusion.SmartTableExtractor;

//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
	//Initialize the Smart Table Extractor.
	TableExtractor extractor = new TableExtractor();
	//Set the page range for extraction (pages 1 to 3).
	TableExtractionOptions options = new TableExtractionOptions();
	options.PageRange = new int[,] { { 1, 3 } };
	extractor.TableExtractionOptions = options;
	//Extract table data from the specified page range as a Markdown string.
	string data = extractor.ExtractTableAsMarkdown(stream);
	//Save the extracted output as a new Markdown file.
	File.WriteAllText("Output.md", data, Encoding.UTF8);
}

Extract Table Data Asynchronously from PDF or Image

To extract table data asynchronously with cancellation support using the ExtractTableAsJsonAsync method of the TableExtractor class, refer to the following code example:

C# [Cross-platform]
C# [Windows-specific]
using System.Text;
using Syncfusion.SmartTableExtractor;

//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
    //Initialize the Smart Table Extractor and assign the configured options.
    TableExtractor tableExtractor = new TableExtractor();
    //Create a cancellation token with a timeout of 30 seconds to control the async operation.
    CancellationTokenSource cts = new CancellationTokenSource(TimeSpan.FromSeconds(30));
    //Call the asynchronous extraction API to extract table data as a JSON string.
    string data = await tableExtractor.ExtractTableAsJsonAsync(stream, cts.Token);
    //Save the extracted JSON data into an output file.
    File.WriteAllText("Output.json", data, Encoding.UTF8);
}
using System.Text;
using Syncfusion.SmartTableExtractor;

//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
    //Initialize the Smart Table Extractor and assign the configured options.
    TableExtractor tableExtractor = new TableExtractor();
    //Create a cancellation token with a timeout of 30 seconds to control the async operation.
    CancellationTokenSource cts = new CancellationTokenSource(TimeSpan.FromSeconds(30));
    //Call the asynchronous extraction API to extract table data as a JSON string.
    string data = await tableExtractor.ExtractTableAsJsonAsync(stream, cts.Token);
    //Save the extracted JSON data into an output file.
    File.WriteAllText("Output.json", data, Encoding.UTF8);
}

You can download a complete working sample from GitHub.

Table Extraction Options

Disable Border-less Table Detection

To disable detection of tables without visible borders in a PDF document or Image using the ExtractTableAsJson method of the TableExtractor class, refer to the following code examples.

C# [Cross-platform]
C# [Windows-specific]
using System.Text;
using Syncfusion.SmartTableExtractor;

//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
    //Initialize the Smart Table Extractor.
    TableExtractor extractor = new TableExtractor();
    //Configure the table extraction option to disable border-less tables in the document.
    TableExtractionOptions options = new TableExtractionOptions();
    //By default *DetectBorderlessTables is true*
    options.DetectBorderlessTables = false;
    //Assign the configured options to the extractor.
    extractor.TableExtractionOptions = options;
    //Extract table data from the PDF document as a JSON string.
    string data = extractor.ExtractTableAsJson(stream);
    //Save the extracted JSON data into an output file.
    File.WriteAllText("Output.json", data, Encoding.UTF8);
}
using System.Text;
using Syncfusion.SmartTableExtractor;

//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
    //Initialize the Smart Table Extractor.
    TableExtractor extractor = new TableExtractor();
    //Configure the table extraction option to detect border-less tables in the document.
    TableExtractionOptions options = new TableExtractionOptions();
    options.DetectBorderlessTables = true;
    //Assign the configured options to the extractor.
    extractor.TableExtractionOptions = options;
    //Extract table data from the PDF document as a JSON string.
    string data = extractor.ExtractTableAsJson(stream);
    //Save the extracted JSON data into an output file.
    File.WriteAllText("Output.json", data, Encoding.UTF8);
}

You can download a complete working sample from GitHub.

Apply Confidence Threshold for Table Data Extraction

To apply confidence thresholding when extracting table data from a PDF document using the ExtractTableAsJson method of the TableExtractor class, refer to the following code example:

C# [Cross-platform]
C# [Windows-specific]
using System.Text;
using Syncfusion.SmartTableExtractor;

//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
    //Initialize the Smart Table Extractor.
    TableExtractor extractor = new TableExtractor();
    //Configure table extraction options to set the confidence threshold for detection.
    TableExtractionOptions options = new TableExtractionOptions();
    options.ConfidenceThreshold = 0.6;
    //Assign the configured options to the extractor.
    extractor.TableExtractionOptions = options;
    //Extract table data from the PDF document as a JSON string.
    string data = extractor.ExtractTableAsJson(stream);
    //Save the extracted JSON data into an output file.
    File.WriteAllText("Output.json", data, Encoding.UTF8);
}
using System.Text;
using Syncfusion.SmartTableExtractor;

//Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
    //Initialize the Smart Table Extractor.
    TableExtractor extractor = new TableExtractor();
    //Configure table extraction options to set the confidence threshold for detection.
    TableExtractionOptions options = new TableExtractionOptions();
    options.ConfidenceThreshold = 0.6;
    //Assign the configured options to the extractor.
    extractor.TableExtractionOptions = options;
    //Extract table data from the PDF document as a JSON string.
    string data = extractor.ExtractTableAsJson(stream);
    //Save the extracted JSON data into an output file.
    File.WriteAllText("Output.json", data, Encoding.UTF8);
}

You can download a complete working sample from GitHub.

PDF to Markdown Preservation Mapping

This section illustrates how table elements in PDF documents are converted and preserved in Markdown format, ensuring that document structure and formatting remain consistent during the PDF‑to‑Markdown conversion process.

PDF Elements	Preservation in Markdown
Table	Table
Text Inline Styles	Bold and Italic

Search docs

Ask Syncfusion AI Assistant

Search docs

Ask Syncfusion AI Assistant

Working with Table Extraction

Extract Table Data as JSON from PDF or Image

Extract Table Data as Markdown from PDF or Image

Extract Table Data within a Specific Page Range

Extract as JSON

Extract as Markdown

Extract Table Data Asynchronously from PDF or Image

Table Extraction Options

Disable Border-less Table Detection

Apply Confidence Threshold for Table Data Extraction

PDF to Markdown Preservation Mapping