PDF Table Extractor to CSV/JSON

Extract tables from PDFs with OpenDataLoader and export them as structured JSON, flat CSV, or HTML tables

Use OpenDataLoader table extraction to identify semantic table blocks in a PDF and export them as JSON, CSV, or HTML. This is useful for reports, statements, research PDFs, and data-heavy documents that need table reuse.

Example Results

1 examples

Extract report tables for spreadsheet analysis

Pull semantic tables from a PDF and export them in a data-friendly format for analysts or QA teams.

pdf-table-extractor-to-csv-json-example1.json View File
View input parameters
{ "pdfFile": "/public/samples/pdf/financial-report-example1.pdf", "exportFormat": "json", "tableMethod": "cluster", "pages": "", "useStructTree": false }

Click to upload file or drag and drop file here

Maximum file size: 10MB Supported formats: application/pdf

Key Facts

Category
Data & Tables
Input Types
file, select, text, checkbox
Output Type
file
Sample Coverage
4
API Ready
Yes

Overview

Extract tabular data from PDF documents and convert it into structured JSON, flat CSV, or HTML formats using OpenDataLoader. This tool identifies semantic table blocks within your PDF, preserving row and column structures, making it easy to reuse data from financial reports, research papers, and statements without manual data entry.

When to Use

  • Extracting financial data from annual reports into spreadsheets for analysis.
  • Converting research paper data tables into machine-readable JSON for database ingestion.
  • Pulling tabular line items from digital invoices or statements into flat CSV files.

How It Works

  • Upload your PDF file containing the tables you want to extract.
  • Select your preferred export format (JSON, CSV, or HTML) and specify page ranges if needed.
  • Choose the table detection method (Default or Cluster) and optionally enable the PDF structure tree.
  • Download the extracted tables in your chosen format for immediate use.

Use Cases

Financial analysts extracting balance sheets and income statements from corporate PDF reports into CSV for Excel modeling.
Data scientists converting statistical tables from academic PDFs into structured JSON for programmatic analysis.
Operations teams pulling tabular line items from digital purchase orders into HTML for web-based previews.

Examples

1. Extracting financial report tables to CSV

Financial Analyst
Background
Needs to analyze quarterly earnings data locked inside a 50-page corporate PDF report.
Problem
Manually copying and pasting tables from the PDF to Excel breaks the formatting and merges columns.
How to Use
Upload the PDF, set the Export Format to CSV, and specify the exact pages containing the financial tables.
Example Config
Export Format: CSV, Pages: 12-15, Table Detection Method: Cluster
Outcome
A flat CSV file containing the extracted table data, ready to be imported directly into spreadsheet software without formatting errors.

2. Converting research data to JSON

Data Engineer
Background
Building a pipeline to ingest tabular data from hundreds of academic research PDFs.
Problem
Needs programmatic access to table contents, including bounding boxes and page numbers, which standard text extraction misses.
How to Use
Upload the research PDF, select JSON as the export format, and enable Use Struct Tree for better accuracy.
Example Config
Export Format: JSON, Use Struct Tree: true
Outcome
A structured JSON file detailing every table, row, column, and cell value, along with spatial bounding box coordinates.

Try with Samples

json, csv, html

Related Hubs

FAQ

What export formats are supported?

You can export extracted tables as structured JSON, flat CSV, or HTML tables.

Can I extract tables from specific pages only?

Yes, you can specify a page range (e.g., 1,3,5-7) to limit extraction to specific parts of the document.

What is the difference between JSON and CSV output?

JSON retains metadata like page numbers, bounding boxes, and grid structure. CSV flattens the data into a simple table, page, row, column, and value format.

What does the Use Struct Tree option do?

It leverages the internal structural tags of the PDF (if available) to improve the accuracy of table boundary detection.

What are the table detection methods?

You can choose between Default and Cluster methods. The Cluster method groups text elements based on spatial proximity to identify table grids.

API Documentation

Request Endpoint

POST /en/api/tools/pdf-table-extractor-to-csv-json

Request Parameters

Parameter Name Type Required Description
pdfFile file (Upload required) Yes -
exportFormat select No -
tableMethod select No -
pages text No -
useStructTree checkbox No -

File type parameters need to be uploaded first via POST /upload/pdf-table-extractor-to-csv-json to get filePath, then pass filePath to the corresponding file field.

Response Format

{
  "filePath": "/public/processing/randomid.ext",
  "fileName": "output.ext",
  "contentType": "application/octet-stream",
  "size": 1024,
  "metadata": {
    "key": "value"
  },
  "error": "Error message (optional)",
  "message": "Notification message (optional)"
}
File: File

AI MCP Documentation

Add this tool to your MCP server configuration:

{
  "mcpServers": {
    "elysiatools-pdf-table-extractor-to-csv-json": {
      "name": "pdf-table-extractor-to-csv-json",
      "description": "Extract tables from PDFs with OpenDataLoader and export them as structured JSON, flat CSV, or HTML tables",
      "baseUrl": "https://elysiatools.com/mcp/sse?toolId=pdf-table-extractor-to-csv-json",
      "command": "",
      "args": [],
      "env": {},
      "isActive": true,
      "type": "sse"
    }
  }
}

You can chain multiple tools, e.g.: `https://elysiatools.com/mcp/sse?toolId=png-to-webp,jpg-to-webp,gif-to-webp`, max 20 tools.

Supports URL file links or Base64 encoding for file parameters.

If you encounter any issues, please contact us at [email protected]