PDF to JSON Structure Explorer

Extract structured OpenDataLoader JSON from a PDF and browse headings, paragraphs, tables, lists, pages, and bounding boxes in an explorer view

Run OpenDataLoader JSON extraction and render an explorer view of the semantic nodes in the PDF. This is useful for debugging heading hierarchy, checking tables, verifying page metadata, and understanding what the parser actually saw.

Example Results

1 examples

Explore the semantic structure of a PDF brand guide

Inspect headings, paragraphs, tables, and bounding boxes without reading raw JSON manually.

Explorer report showing 20 semantic nodes from brand-guidelines-pdf-example1.pdf with page metadata, node counts, and JSON preview.
View input parameters
{ "pdfFile": "/public/samples/pdf/brand-guidelines-pdf-example1.pdf", "useStructTree": true, "sanitizeSensitiveData": false, "pages": "", "nodeFilter": "all", "searchTerm": "" }

Click to upload file or drag and drop file here

Maximum file size: 10MB Supported formats: application/pdf

Key Facts

Category
Developer & Web
Input Types
file, checkbox, text, select
Output Type
html
Sample Coverage
4
API Ready
Yes

Overview

The PDF to JSON Structure Explorer allows developers and data engineers to extract OpenDataLoader JSON from PDF documents and visualize the semantic structure in an interactive HTML view. By rendering headings, paragraphs, tables, lists, and bounding boxes, this tool makes it easy to debug parser outputs, verify page metadata, and inspect the exact hierarchy of extracted document elements without manually reading raw JSON.

When to Use

  • When you need to debug the heading hierarchy and semantic parsing of a complex PDF document.
  • When verifying if tables and lists are correctly identified and extracted by the OpenDataLoader parser.
  • When inspecting bounding box coordinates and page metadata for specific text nodes within a PDF.

How It Works

  • Upload a PDF file to initiate the OpenDataLoader JSON extraction process.
  • Optionally specify page ranges, toggle the structural tree usage, or apply a node filter to isolate headings, tables, or lists.
  • Enter a search term to quickly locate specific content or enable sensitive data sanitization if required.
  • View the generated HTML explorer report to interactively browse the extracted semantic nodes, page metadata, and JSON previews.

Use Cases

Debugging PDF parsing pipelines to ensure accurate extraction of nested headings and paragraphs.
Auditing financial reports or research papers to confirm that tabular data is correctly recognized as table nodes.
Reviewing document bounding boxes to map extracted text back to its exact visual location on the original PDF page.

Examples

1. Exploring a Brand Guidelines PDF

Data Engineer
Background
A data engineer is building an ingestion pipeline for corporate brand guidelines and needs to ensure the parser correctly identifies section headers.
Problem
Reading raw JSON output to verify heading hierarchies is tedious and error-prone.
How to Use
Upload the brand guidelines PDF, leave 'Use Struct Tree' enabled, and set the Node Filter to 'Headings only'.
Example Config
{
  "useStructTree": true,
  "nodeFilter": "heading"
}
Outcome
An HTML explorer view is generated, displaying only the heading nodes, allowing quick verification of the document's structural hierarchy.

2. Verifying Table Extraction in Financial Reports

Financial Analyst
Background
An analyst needs to extract quarterly earnings tables from a 50-page PDF report.
Problem
It is unclear if the parser is correctly identifying the complex financial tables on specific pages.
How to Use
Upload the PDF, specify the exact pages containing the tables (e.g., '12-15'), and set the Node Filter to 'Tables only'.
Example Config
{
  "pages": "12-15",
  "nodeFilter": "table"
}
Outcome
The explorer view isolates and displays only the table nodes from pages 12 to 15, confirming accurate tabular data extraction.

Try with Samples

json, pdf, file

Related Hubs

FAQ

What formats are supported for upload?

The tool strictly accepts PDF files for extraction.

Can I filter the output to only show tables?

Yes, you can use the Node Filter option to display only tables, headings, or lists.

What is the 'Use Struct Tree' option?

It tells the parser to utilize the PDF's internal structural tags, if available, to improve the accuracy of semantic extraction.

Can I extract data from specific pages only?

Yes, you can input a page range like '1,3,5-7' in the Pages field to limit the extraction to those specific pages.

Does this tool output raw JSON?

The tool generates an interactive HTML explorer view that visualizes the semantic nodes, which includes previews of the underlying JSON data.

API Documentation

Request Endpoint

POST /en/api/tools/pdf-to-json-structure-explorer

Request Parameters

Parameter Name Type Required Description
pdfFile file (Upload required) Yes -
useStructTree checkbox No -
sanitizeSensitiveData checkbox No -
pages text No -
nodeFilter select No -
searchTerm text No -

File type parameters need to be uploaded first via POST /upload/pdf-to-json-structure-explorer to get filePath, then pass filePath to the corresponding file field.

Response Format

{
  "result": "
Processed HTML content
", "error": "Error message (optional)", "message": "Notification message (optional)", "metadata": { "key": "value" } }
HTML: HTML

AI MCP Documentation

Add this tool to your MCP server configuration:

{
  "mcpServers": {
    "elysiatools-pdf-to-json-structure-explorer": {
      "name": "pdf-to-json-structure-explorer",
      "description": "Extract structured OpenDataLoader JSON from a PDF and browse headings, paragraphs, tables, lists, pages, and bounding boxes in an explorer view",
      "baseUrl": "https://elysiatools.com/mcp/sse?toolId=pdf-to-json-structure-explorer",
      "command": "",
      "args": [],
      "env": {},
      "isActive": true,
      "type": "sse"
    }
  }
}

You can chain multiple tools, e.g.: `https://elysiatools.com/mcp/sse?toolId=png-to-webp,jpg-to-webp,gif-to-webp`, max 20 tools.

Supports URL file links or Base64 encoding for file parameters.

If you encounter any issues, please contact us at [email protected]