PDF Image & Caption Extractor

Extract images from PDFs, match nearby captions, and generate an HTML index package using OpenDataLoader

Use OpenDataLoader image export and semantic JSON output to build a report of PDF images, nearby captions, and page-level metadata. This is useful for textbooks, reports, presentations, and design documents where figures need to be reviewed or reused.

Example Results

1 examples

Extract visual assets and nearby captions from a PDF

Generate an HTML image index that shows extracted figures with their best matching captions.

pdf-image-caption-extractor-example1.html View File
View input parameters
{ "pdfFile": "/public/samples/pdf/pdf-image-caption-extractor-source-example1.pdf", "imageFormat": "png", "pages": "", "useStructTree": true }

Click to upload file or drag and drop file here

Maximum file size: 10MB Supported formats: application/pdf

Key Facts

Category
Images, Audio & Video
Input Types
file, select, text, checkbox
Output Type
html
Sample Coverage
4
API Ready
Yes

Overview

The PDF Image & Caption Extractor automates the retrieval of visual assets from PDF documents while preserving their semantic context. By analyzing the document's internal structure, it pairs each extracted image with its corresponding caption and generates a comprehensive HTML index for easy review and asset management.

When to Use

  • When harvesting figures and diagrams from academic papers or textbooks for research databases.
  • When performing a visual audit of corporate reports to ensure all graphics are correctly labeled and documented.
  • When migrating content from legacy PDF manuals to digital asset management systems or web-based CMS.

How It Works

  • Upload a PDF file and optionally specify a page range or preferred image format like PNG or JPEG.
  • The tool parses the document's internal structure tree to identify embedded image objects and surrounding text blocks.
  • A semantic matching algorithm associates each image with the most relevant nearby text identified as a caption.
  • The system packages the extracted images and their metadata into a downloadable HTML index for offline browsing and reuse.

Use Cases

Academic Research: Extracting figures and table descriptions from scientific journals for literature reviews.
Technical Documentation: Collecting screenshots and instructional captions from software manuals for training materials.
Marketing Audits: Reviewing visual branding and associated copy across multiple PDF brochures and catalogs.

Examples

1. Academic Paper Figure Extraction

Research Assistant
Background
A research assistant needs to compile all charts and data visualizations from a 50-page scientific study for a presentation.
Problem
Manually cropping images and copying captions from a dense PDF is time-consuming and prone to error.
How to Use
Upload the study PDF, select PNG format, and ensure 'Use Struct Tree' is enabled to capture precise captions.
Outcome
A structured HTML report showing every chart alongside its original figure caption and page number.

2. Product Catalog Asset Audit

Content Manager
Background
A content manager is updating a website using images found in a high-resolution PDF product catalog.
Problem
Identifying which product description belongs to which image across hundreds of pages is difficult to track manually.
How to Use
Upload the catalog PDF and specify the page range for the specific product line being updated.
Outcome
A visual HTML index containing high-quality JPEG images paired with their corresponding product descriptions.

Try with Samples

html, pdf, image

Related Hubs

FAQ

What image formats are supported for extraction?

You can export extracted images in either PNG or JPEG format.

Can I extract images from specific pages only?

Yes, use the Pages field to define specific numbers or ranges such as '1, 3, 5-7'.

What does the 'Use Struct Tree' option do?

It utilizes the PDF's internal logical structure to significantly improve the accuracy of caption matching.

What is the final output of this tool?

The tool generates an HTML file that serves as a visual index of all extracted images and their matched captions.

Does it work with scanned PDFs?

It is designed for digital PDFs with text layers; scanned documents without OCR will not yield text captions.

API Documentation

Request Endpoint

POST /en/api/tools/pdf-image-caption-extractor

Request Parameters

Parameter Name Type Required Description
pdfFile file (Upload required) Yes -
imageFormat select No -
pages text No -
useStructTree checkbox No -

File type parameters need to be uploaded first via POST /upload/pdf-image-caption-extractor to get filePath, then pass filePath to the corresponding file field.

Response Format

{
  "result": "
Processed HTML content
", "error": "Error message (optional)", "message": "Notification message (optional)", "metadata": { "key": "value" } }
HTML: HTML

AI MCP Documentation

Add this tool to your MCP server configuration:

{
  "mcpServers": {
    "elysiatools-pdf-image-caption-extractor": {
      "name": "pdf-image-caption-extractor",
      "description": "Extract images from PDFs, match nearby captions, and generate an HTML index package using OpenDataLoader",
      "baseUrl": "https://elysiatools.com/mcp/sse?toolId=pdf-image-caption-extractor",
      "command": "",
      "args": [],
      "env": {},
      "isActive": true,
      "type": "sse"
    }
  }
}

You can chain multiple tools, e.g.: `https://elysiatools.com/mcp/sse?toolId=png-to-webp,jpg-to-webp,gif-to-webp`, max 20 tools.

Supports URL file links or Base64 encoding for file parameters.

If you encounter any issues, please contact us at [email protected]