PDF OCR Text Layer

Add searchable/copyable OCR text layer to scanned PDF using Tesseract

Run OCR on scanned PDFs and output a searchable PDF with text layer.

How it works:

  • Rasterize each PDF page to image (pdftoppm or Ghostscript)
  • Run Tesseract per page to generate searchable page PDFs
  • Merge all OCR pages into one searchable PDF

Example Results

2 examples

Standard OCR Layer for Scanned Report

Adds searchable text layer using English OCR with 300 DPI and default segmentation

pdf-ocr-text-layer-example1.pdf View File
View input parameters
{ "sourceFile": "/Users/quyue/www/elysia-tools/public/samples/pdf/pdf-2026-02-19-source-4pages.pdf", "language": "eng", "dpi": 300, "oem": 1, "psm": 3 }

Fast OCR with Lower DPI

Uses 200 DPI and psm=6 for faster OCR processing with smaller output size

pdf-ocr-text-layer-example2.pdf View File
View input parameters
{ "sourceFile": "/Users/quyue/www/elysia-tools/public/samples/pdf/pdf-2026-02-19-source-4pages.pdf", "language": "eng", "dpi": 200, "oem": 1, "psm": 6 }

Click to upload file or drag and drop file here

Maximum file size: 500MB Supported formats: application/pdf

Key Facts

Category
Documents & PDF
Input Types
file, text, number
Output Type
file
Sample Coverage
4
API Ready
Yes

Overview

The PDF OCR Text Layer tool transforms scanned, image-based PDFs into fully searchable and copyable documents. By utilizing the Tesseract engine, it processes each page to recognize text and embeds a hidden, selectable text layer directly into your existing PDF file.

When to Use

  • When you need to search for specific keywords within a scanned document or image-based PDF.
  • When you want to copy and paste text from a document that was originally saved as a flat image.
  • When you need to archive physical paperwork digitally while maintaining the ability to index and retrieve information.

How It Works

  • The tool rasterizes each page of your uploaded PDF into high-resolution images.
  • Tesseract OCR analyzes the images to identify characters and text layout based on your selected language.
  • The tool generates a new PDF file, overlaying the recognized text onto the original image to ensure the document remains searchable.
  • All processed pages are merged into a single, cohesive PDF document ready for download.

Use Cases

Digitizing historical archives or paper records for keyword-based indexing.
Extracting data from scanned invoices or receipts for easier record-keeping.
Converting non-selectable academic papers or reports into accessible, searchable documents.

Examples

1. Standard OCR for Scanned Reports

Administrative Assistant
Background
Received a 50-page scanned report that is currently just a collection of images.
Problem
Cannot search for specific project names or copy text from the report.
How to Use
Upload the PDF, set DPI to 300, and use the default OCR settings.
Example Config
language: eng, dpi: 300, oem: 1, psm: 3
Outcome
A searchable PDF where all text can be highlighted, copied, and indexed by search tools.

2. Fast Processing for Large Documents

Researcher
Background
Needs to process a large volume of scanned documents quickly for preliminary review.
Problem
High-resolution processing is too slow for the current workflow.
How to Use
Upload the PDF and adjust the DPI and segmentation mode for faster output.
Example Config
language: eng, dpi: 200, oem: 1, psm: 6
Outcome
A searchable PDF generated in less time with a smaller file size, suitable for quick text extraction.

Try with Samples

pdf, text, file

Related Hubs

FAQ

What file formats are supported?

This tool specifically supports PDF files.

Can I process documents in languages other than English?

Yes, you can specify language codes like 'eng' or 'eng+chi_sim' in the OCR Languages field.

What is the recommended DPI for best results?

A DPI of 300 is generally recommended for high-accuracy OCR results.

Does this tool change the visual appearance of my PDF?

No, the tool adds a transparent text layer over your original document, so the visual layout remains unchanged.

Is there a limit to the file size?

The tool supports files up to 500MB.

API Documentation

Request Endpoint

POST /en/api/tools/pdf-ocr-text-layer

Request Parameters

Parameter Name Type Required Description
sourceFile file (Upload required) Yes -
language text No -
dpi number No -
oem number No -
psm number No -

File type parameters need to be uploaded first via POST /upload/pdf-ocr-text-layer to get filePath, then pass filePath to the corresponding file field.

Response Format

{
  "filePath": "/public/processing/randomid.ext",
  "fileName": "output.ext",
  "contentType": "application/octet-stream",
  "size": 1024,
  "metadata": {
    "key": "value"
  },
  "error": "Error message (optional)",
  "message": "Notification message (optional)"
}
File: File

AI MCP Documentation

Add this tool to your MCP server configuration:

{
  "mcpServers": {
    "elysiatools-pdf-ocr-text-layer": {
      "name": "pdf-ocr-text-layer",
      "description": "Add searchable/copyable OCR text layer to scanned PDF using Tesseract",
      "baseUrl": "https://elysiatools.com/mcp/sse?toolId=pdf-ocr-text-layer",
      "command": "",
      "args": [],
      "env": {},
      "isActive": true,
      "type": "sse"
    }
  }
}

You can chain multiple tools, e.g.: `https://elysiatools.com/mcp/sse?toolId=png-to-webp,jpg-to-webp,gif-to-webp`, max 20 tools.

Supports URL file links or Base64 encoding for file parameters.

If you encounter any issues, please contact us at [email protected]