How to Redact a PDF Properly — And Why the Common Way Is Unsafe

Every few months we get an email from someone who just learned, the hard way, that the "redactions" in a PDF they shared were not actually redactions. A journalist copy-pastes a document and the black bars turn into highlighted names. A lawyer opens a filed brief in a text editor and the supposedly blacked-out passages appear in plain text. A government office releases a public records request and an intern with Ctrl+A reveals everything.

It keeps happening because most people — including most people who work with sensitive PDFs every day — do redaction wrong. The method they use looks right on screen and is completely broken underneath. We want to explain what actually happens inside a PDF when you "redact" something, why the common approach leaves your data exposed, and what to do instead.

The common mistake: drawing a black box

The default intuition is reasonable. You have a PDF. You want to hide a line. You open Preview, Acrobat, or any annotation tool, drop a black rectangle over the text, save, and send. Visually, the text is gone.

Underneath, nothing has changed. A PDF is a structured document — text, images, fonts, and graphical objects are each stored as separate entries in the file's internal object tree. When you draw a rectangle over text, you have added a new graphical object on top of the existing text layer. The text below is still there, fully intact, fully indexed.

Try this yourself: take a PDF you "redacted" with a black box, open it in any viewer, and use Ctrl+A to select everything on the page. The selection highlights will reveal the text hiding under every box. Copy and paste it into a text editor and the supposedly redacted content comes right back. Better still, open the PDF in a text viewer (most operating systems will do this with a right-click) and you can often read the internal text streams directly.

This is not a theoretical problem. It has been the cause of several high-profile leaks over the past decade — court filings where party names were recoverable, intelligence documents where source names were exposed through the same copy-paste trick, corporate disclosures where financial figures sat perfectly readable underneath their black bars.

Why overlay redaction is structurally unsafe

A PDF's rendering pipeline is designed to composite objects — it treats your black rectangle and the text beneath it as two separate things that happen to occupy the same screen position. This is not a bug. It is how every modern document format works, because most of the time you want layered editing to be reversible. "Undo" would be impossible otherwise.

The consequence is that any tool that inspects the structure of the PDF, rather than just rendering it, will see both layers:

Copy-paste pulls from the text stream, not the rendered image.
Search indexes the text stream, not the visible page.
Screen readers announce the text stream, not what sighted users see.
PDF parsers (from Python's pypdf to Acrobat Pro's own tools) can list every object on a page, including the hidden one.

If a document only contains text you are comfortable with being visible, this layered model is a feature. The moment you use it to "redact" sensitive data, it becomes a leak waiting to happen.

The safe approach: rasterization

Real redaction destroys the underlying data. The most reliable way to do that is to rasterize the page — convert it from a structured PDF page into a flat image — after the black bars have been drawn. Once the page is an image, there is no separate text layer, no hidden object tree, no way to "read underneath" the redaction because there is nothing underneath it. The redaction is baked into pixels.

A second, more surgical approach is to delete the underlying text objects directly, then flatten the rest of the page. This preserves text selection in the unredacted portions but is considerably harder to get right — if the redaction tool misses an object (say, a text fragment that was stored in two separate runs), some of the sensitive data leaks through the gap. Rasterization has no such failure mode.

Rasterization has one downside worth flagging honestly: the redacted page is no longer searchable. The text in the non-redacted parts of the page is also converted to image, so Ctrl+F on that page will stop working. For most redaction use cases — releasing a filtered document externally — this is not just acceptable, it is desirable. You do not want the released file to be machine-searchable in the same way the original was, because that makes partial extraction attacks easier.

How to redact safely in under a minute

Our Redact PDF tool uses the rasterization approach by default. The workflow:

Open Redact PDF on PDF Genie.
Drop your file into the upload area. The whole operation runs in your browser — the file never leaves your device.
Draw black rectangles over the text, names, or regions you want to hide.
Click "Apply Redactions."
Download the output.

Behind the scenes, each redacted page is rasterized at print-quality resolution, the original text objects are dropped, and the result is repackaged as a flat PDF. The output looks identical to a careful black-box redaction — but Ctrl+A, copy-paste, and any structural parser returns nothing underneath the bars, because there is nothing to return.

Try this yourself on any document you have redacted with the naive overlay method: open it in a PDF viewer, select the region with the black box, copy, and paste into a plain-text editor. If the text comes back, the redaction is fake. With our Redact PDF tool, that same paste produces an empty region — because the rasterized page literally no longer contains the original text stream. The document looks the same. Its innards are fundamentally different.

Two redaction details people forget

Even with the right tool, two practical mistakes account for most remaining leaks:

Metadata. PDFs contain hidden metadata — author names, editing software, revision timestamps, sometimes even document titles that were renamed. Redacting the body and leaving the metadata is a common leak vector. Strip it with our Edit Metadata tool before sharing anything sensitive.

Attached files and embedded objects. A PDF can contain embedded Excel sheets, fonts that leak author names, or attached originals. These survive naive redaction entirely. Flatten the document through a pipeline like ours (redact → rasterize → re-export) to strip them.

When to double-check

If the document you are redacting will be filed in court, released under a public-records request, or published to adversaries who will actively probe it, treat even the safe approach as necessary-but-not-sufficient. Verify the output yourself:

Open the redacted file and try Ctrl+A on each page. Selection should stop at the redacted regions.
Open it in a PDF inspector (Acrobat Pro's Preflight panel works; so do command-line tools like pdfinfo and pdftotext). The redacted text should not appear in the text-stream dump.
Check file metadata and attachments. Strip both if you did not intend to share them.

This is five minutes of verification for every document you send. It is cheap insurance against a leak that could take years to live down.

Redact your PDF safely

Try Redact PDF — free →

How to Redact a PDF Properly — And Why the Common Way Is Unsafe

The common mistake: drawing a black box

Why overlay redaction is structurally unsafe

The safe approach: rasterization

How to redact safely in under a minute

Two redaction details people forget

When to double-check

Redact your PDF safely

Δοκιμάστε το μόνοι σας — δωρεάν

Συνεχίστε την ανάγνωση

OCR Accuracy: What Actually Determines Whether Your Scanned PDF Becomes Searchable

PDF/A Explained: When You Need the Archival Format (And When You Don't)

PDF Encryption Explained: AES-128 vs AES-256 and What Actually Matters