PDF Page Range Extractor

Extract only selected PDF pages with OpenDataLoader and export the subset as Markdown, JSON, or text

Use OpenDataLoader to convert only the pages you care about from a long PDF. This is useful for appendix review, chapter extraction, report snippets, legal review packets, and partial AI ingestion workflows.

Example Results

1 examples

Extract just the first two report pages

Select a focused page range from a longer PDF and export it as a reusable Markdown excerpt.

pdf-page-range-extractor-example1.md View File
View input parameters
{ "pdfFile": "/public/samples/pdf/financial-report-example1.pdf", "exportFormat": "markdown", "pages": "1-2", "useStructTree": true, "keepLineBreaks": true, "includePageSeparators": true }

Click to upload file or drag and drop file here

Maximum file size: 10MB Supported formats: application/pdf

Key Facts

Category
Developer & Web
Input Types
file, select, text, checkbox
Output Type
file
Sample Coverage
4
API Ready
Yes

Overview

The PDF Page Range Extractor allows you to isolate specific pages from lengthy PDF documents and export them into clean Markdown, JSON, or plain text formats. Powered by OpenDataLoader, this tool is ideal for extracting targeted chapters, appendices, or specific data snippets without processing the entire file, making it perfect for streamlined AI ingestion and focused document review.

When to Use

  • When you need to extract a specific chapter or appendix from a massive PDF report.
  • When preparing targeted document snippets for AI context windows to save token costs.
  • When converting selected pages of legal or financial documents into structured Markdown or JSON.

How It Works

  • Upload your target PDF file into the tool.
  • Specify the exact pages you want to extract using a comma-separated list or range (e.g., 1,3,5-7).
  • Select your preferred export format (Markdown, JSON, or Text) and toggle structural options like keeping line breaks or page separators.
  • Run the extraction to download a new file containing only the parsed content from your specified pages.

Use Cases

Extracting financial tables from specific pages of an annual report for data analysis.
Pulling a single contract clause or addendum from a lengthy legal packet.
Isolating a specific research paper methodology section to feed into an LLM.

Examples

1. Extracting Executive Summary for AI Analysis

Data Analyst
Background
An analyst has a 150-page annual financial report but only needs the executive summary to feed into a language model.
Problem
Processing the entire 150-page PDF consumes too many tokens and introduces irrelevant data.
How to Use
Upload the financial report PDF, set the export format to Markdown, and input '1-2' in the Pages field.
Example Config
Export Format: markdown, Pages: 1-2, Include Page Separators: true
Outcome
A clean Markdown file containing only the first two pages, perfectly formatted for AI ingestion.

2. Pulling Specific Clauses from a Legal Contract

Paralegal
Background
A paralegal needs to review the termination clauses located on pages 14 and 18 of a master service agreement.
Problem
Manually copying and pasting text from scattered PDF pages often breaks formatting and loses structural integrity.
How to Use
Upload the contract, select JSON as the export format, and enter '14,18' in the Pages field while enabling the structural tree option.
Example Config
Export Format: json, Pages: 14,18, Use Struct Tree: true
Outcome
A structured JSON file containing only the text from pages 14 and 18, preserving the logical reading order.

Try with Samples

json, markdown, pdf

Related Hubs

FAQ

What formats can I export the extracted pages to?

You can export the extracted PDF pages as Markdown, JSON, or plain text.

How do I format the page range input?

Use commas to separate individual pages and hyphens for ranges. For example, '1,3,5-7' will extract pages 1, 3, 5, 6, and 7.

What does the 'Use Struct Tree' option do?

It utilizes the PDF's internal structural tags to better preserve the logical reading order and document hierarchy during extraction.

Can I keep the original line breaks from the PDF?

Yes, you can enable the 'Keep Line Breaks' option to maintain the original text wrapping of the document.

Will the output indicate where a new page starts?

Yes, if you enable 'Include Page Separators', the exported file will contain markers indicating the boundaries between the extracted pages.

API Documentation

Request Endpoint

POST /en/api/tools/pdf-page-range-extractor

Request Parameters

Parameter Name Type Required Description
pdfFile file (Upload required) Yes -
exportFormat select No -
pages text Yes -
useStructTree checkbox No -
keepLineBreaks checkbox No -
includePageSeparators checkbox No -

File type parameters need to be uploaded first via POST /upload/pdf-page-range-extractor to get filePath, then pass filePath to the corresponding file field.

Response Format

{
  "filePath": "/public/processing/randomid.ext",
  "fileName": "output.ext",
  "contentType": "application/octet-stream",
  "size": 1024,
  "metadata": {
    "key": "value"
  },
  "error": "Error message (optional)",
  "message": "Notification message (optional)"
}
File: File

AI MCP Documentation

Add this tool to your MCP server configuration:

{
  "mcpServers": {
    "elysiatools-pdf-page-range-extractor": {
      "name": "pdf-page-range-extractor",
      "description": "Extract only selected PDF pages with OpenDataLoader and export the subset as Markdown, JSON, or text",
      "baseUrl": "https://elysiatools.com/mcp/sse?toolId=pdf-page-range-extractor",
      "command": "",
      "args": [],
      "env": {},
      "isActive": true,
      "type": "sse"
    }
  }
}

You can chain multiple tools, e.g.: `https://elysiatools.com/mcp/sse?toolId=png-to-webp,jpg-to-webp,gif-to-webp`, max 20 tools.

Supports URL file links or Base64 encoding for file parameters.

If you encounter any issues, please contact us at [email protected]