Categories

PDF OCR Text Layer

Add searchable/copyable OCR text layer to scanned PDF using Tesseract

Run OCR on scanned PDFs and output a searchable PDF with text layer.

How it works:

  • Rasterize each PDF page to image (pdftoppm or Ghostscript)
  • Run Tesseract per page to generate searchable page PDFs
  • Merge all OCR pages into one searchable PDF

Example Results

2 examples

Standard OCR Layer for Scanned Report

Adds searchable text layer using English OCR with 300 DPI and default segmentation

pdf-ocr-text-layer-example1.pdf View File
View input parameters
{ "sourceFile": "/Users/quyue/www/elysia-tools/public/samples/pdf/pdf-2026-02-19-source-4pages.pdf", "language": "eng", "dpi": 300, "oem": 1, "psm": 3 }

Fast OCR with Lower DPI

Uses 200 DPI and psm=6 for faster OCR processing with smaller output size

pdf-ocr-text-layer-example2.pdf View File
View input parameters
{ "sourceFile": "/Users/quyue/www/elysia-tools/public/samples/pdf/pdf-2026-02-19-source-4pages.pdf", "language": "eng", "dpi": 200, "oem": 1, "psm": 6 }

Click to upload file or drag and drop file here

Maximum file size: 500MB Supported formats: application/pdf

Key Facts

Category
PDF Tools
Input Types
file, text, number
Output Type
file
Sample Coverage
4
API Ready
Yes

Overview

Run OCR on scanned PDFs and output a searchable PDF with text layer. **How it works:** - Rasterize each PDF page to image (pdftoppm or Ghostscript) - Run Tesseract per page to generate searchable page PDFs - Merge all OCR pages into one searchable PDF

When to Use

  • Use it when you need to convert pdf, text content quickly in the browser.
  • Helpful for pdf tools workflows that need repeatable inputs and fast results.
  • A good fit when you want to test with real files before running the same workflow in code or API calls.

How It Works

  • Provide Source PDF File, OCR Languages, Input DPI, OCR Engine Mode as input to the tool.
  • The tool processes the request and returns a file result.
  • For file workflows, start with representative samples such as pdf, text test files to verify edge cases and output quality.

Use Cases

Convert pdf, text data during debugging or QA.
Validate expected output before using the API or automation flows.
Test the workflow with representative sample files and edge cases.

Try with Samples

pdf, text, file

Related Hubs

FAQ

What does PDF OCR Text Layer do?

PDF OCR Text Layer helps you convert pdf, text content online without setting up a separate local script or app.

When should I use this tool?

Use it when you need a quick convert workflow, want to verify output, or need a browser-based utility for pdf tools tasks.

Can I try this tool with sample data?

Yes. This page can recommend related sample files so you can test the workflow immediately.

What inputs does PDF OCR Text Layer accept?

PDF OCR Text Layer accepts Source PDF File, OCR Languages, Input DPI, OCR Engine Mode and supports file uploads for 1 field.

Is there an API for PDF OCR Text Layer?

Yes. The tool page includes an API endpoint so you can move from manual testing to scripted usage.

API Documentation

Request Endpoint

POST /en/api/tools/pdf-ocr-text-layer

Request Parameters

Parameter Name Type Required Description
sourceFile file (Upload required) Yes -
language text No -
dpi number No -
oem number No -
psm number No -

File type parameters need to be uploaded first via POST /upload/pdf-ocr-text-layer to get filePath, then pass filePath to the corresponding file field.

Response Format

{
  "filePath": "/public/processing/randomid.ext",
  "fileName": "output.ext",
  "contentType": "application/octet-stream",
  "size": 1024,
  "metadata": {
    "key": "value"
  },
  "error": "Error message (optional)",
  "message": "Notification message (optional)"
}
File: File

AI MCP Documentation

Add this tool to your MCP server configuration:

{
  "mcpServers": {
    "elysiatools-pdf-ocr-text-layer": {
      "name": "pdf-ocr-text-layer",
      "description": "Add searchable/copyable OCR text layer to scanned PDF using Tesseract",
      "baseUrl": "https://elysiatools.com/mcp/sse?toolId=pdf-ocr-text-layer",
      "command": "",
      "args": [],
      "env": {},
      "isActive": true,
      "type": "sse"
    }
  }
}

You can chain multiple tools, e.g.: `https://elysiatools.com/mcp/sse?toolId=png-to-webp,jpg-to-webp,gif-to-webp`, max 20 tools.

Supports URL file links or Base64 encoding for file parameters.

If you encounter any issues, please contact us at [email protected]