Key Facts
- Category
- Documents & PDF
- Input Types
- file, select, number, text
- Output Type
- file
- Sample Coverage
- 4
- API Ready
- Yes
Overview
PDF Denoise is a browser-based utility designed to clean up scanned PDF documents by removing visual noise, such as salt-and-pepper speckles, random grain, and faint background haze. The tool uses real image-processing algorithms to restore clarity to scanned image pages while preserving vector text pages intact to maintain searchability and font quality.
When to Use
- •When scanned PDF documents contain distracting salt-and-pepper noise, grain, or dark speckles that hinder readability.
- •When faded or low-contrast scans need to be converted into high-contrast, crisp black-and-white text.
- •When cleaning up scanned documents that contain a mix of noisy image pages and clean, searchable vector text pages.
How It Works
- •The tool parses the uploaded PDF and identifies image pages versus vector text pages.
- •Image pages are rasterized, and the selected denoising algorithm (Auto, Median, or Otsu Binarization) is applied directly to the pixel buffer.
- •Vector text pages are preserved intact to maintain searchability and font quality, unless forced rasterization is enabled.
- •The processed image pages and preserved text pages are compiled back into a clean, optimized PDF file.
Use Cases
Examples
1. Removing Speckle Noise from a Scanned Report
Archivist- Background
- An archivist has a scanned PDF report filled with distracting salt-and-pepper noise and small black dots across the pages.
- Problem
- The speckles make the document look unprofessional and hard to read.
- How to Use
- Upload the PDF, select the 'Auto' denoise mode, set the strength to 2, and run the process.
- Example Config
-
{ "mode": "auto", "strength": 2, "rasterizeText": "false" } - Outcome
- The output PDF has clean, speckle-free pages with smooth backgrounds while preserving the original layout.
2. Binarizing a Faint Scan for High Contrast
Legal Assistant- Background
- A legal assistant receives a scanned contract that is faint, hazy, and has a grey background, making it difficult to read.
- Problem
- The text lacks contrast and needs to be converted to crisp black-and-white.
- How to Use
- Upload the PDF, select the 'Binarize' mode, and specify the page range to process.
- Example Config
-
{ "mode": "binarize", "pageRange": "1-3" } - Outcome
- The faint background haze is completely removed (turned white) and the text is rendered in solid black, significantly improving readability.
Try with Samples
pdf, image, videoRelated Hubs
FAQ
Will this tool make my searchable PDF text unsearchable?
No, by default, vector text pages are preserved verbatim to keep them searchable. Only image-only pages are rasterized and denoised.
What is the difference between the Auto and Binarize modes?
Auto mode uses a median filter and despeckling to preserve tones, while Binarize uses Otsu thresholding to turn backgrounds pure white and text solid black.
How do I clean a scanned PDF that already has an OCR text layer?
Enable the 'Rasterize Text Pages' option to force the tool to process and denoise the underlying noisy images, though this will remove the text layer.
Can I denoise only specific pages of my PDF?
Yes, you can specify a page range (for example, '1-3, 5') to target only the pages that require cleanup.
What does the strength setting do?
It controls the number of median filter passes (from 1 to 3) in Auto and Median modes; higher values remove more noise but may soften the image.