Redacting Documents Before Legal AI: What Works and What's Changing

by Ali Rind, Last updated: May 19, 2026, ref:

Person reviewing digital documents on a laptop at a desk.

Should You Redact Documents Before Using Legal AI? A 2026 Guide

10:00

For most of 2025, the guidance was simple. Redact client documents before uploading them to any AI tool. Consumer chatbots trained on user inputs, terms of service permitted disclosure, and uploading privileged content put both confidentiality and the privilege itself at risk.

That guidance is still correct, but it is no longer complete. Three things shifted in late 2025 and early 2026. United States v. Heppner moved the privilege analysis. Enterprise legal AI platforms matured into something genuinely different from consumer chatbots. And in-firm AI architectures, where the model runs inside the firm's environment, became a real procurement option.

So "redact before AI" is no longer a single workflow. It is a decision tree, and the right path through it depends on which AI architecture your firm chose.

What the Heppner ruling actually changed for legal AI users

On February 10, 2026, Judge Jed Rakoff of the Southern District of New York ruled from the bench in United States v. Heppner that AI-generated documents created by a criminal defendant on a public version of Anthropic's Claude were not protected by attorney-client privilege or the work product doctrine. The defendant had been told he was the target of a grand jury investigation. He used the consumer AI tool to analyze his own legal exposure, without his attorneys directing him to do so.

The case is one district court opinion, not binding outside SDNY. The reasoning is what matters. Judge Rakoff found three things missing for privilege to apply: the AI was not an attorney, the platform's terms permitted data collection and model training, and the materials were not created at counsel's direction. He noted the outcome could have differed if counsel had directed the AI use, if the tool had contractual confidentiality and zero-retention policies, or if the AI had functioned as a Kovel agent assisting counsel.

Pair that with ABA Formal Opinion 512, which requires lawyers using AI to maintain technological competence under Rule 1.1 and protect client confidentiality under Rule 1.6, and the 2026 picture becomes clearer. Privilege now depends on three things at once: what tool was used, whether the lawyer directed and supervised the use, and what record exists of what was actually disclosed.

That last piece is where pre-upload redaction earns its place in the workflow, regardless of which AI you chose.

The three architectures of legal AI in 2026

Legal AI is no longer one category. Three architectures coexist, and each one changes where redaction fits.

The first is consumer AI: free or low-cost chatbot products like the public versions of ChatGPT, Claude, or Gemini. These are inappropriate for any client data, regardless of whether you redact first. The platform terms typically permit training on inputs, and Heppner settled the privilege question for unsupervised use of these tools.

The second is enterprise SaaS legal AI: Harvey, Spellbook, CoCounsel, Lexis+ AI, and similar platforms purpose-built for law firms. These have meaningfully different security postures from consumer AI: SOC 2 Type II certifications, contractually-enforced zero data retention with their model providers, BAAs available for HIPAA-adjacent work, and audit logging. Data still leaves the firm and travels to a vendor's cloud, but the safeguards are substantial.

The third is in-firm AI: platforms that run on the firm's own infrastructure, either on-premises or in a private cloud the firm controls. Data never leaves the firm's environment. The AI session, the documents, and the outputs all stay inside the firm's security perimeter, so the third-party disclosure question driving the Heppner analysis largely disappears.

These three are not ranked by quality. They serve different firms with different practice areas and IT capacity. The question is which one fits your firm, and once you know that, the redaction workflow follows.

Where redaction fits in each architecture

In the consumer AI path, redaction does not save you. The platform terms are the controlling problem, and pre-upload redaction does not change those terms. The honest guidance is to keep client work off consumer AI entirely. This is not a redaction problem. It is a tool selection problem.

In the enterprise SaaS path, redaction sits upstream of the AI tool. The document enters your case management system. Before any client identifier reaches Harvey, Spellbook, or CoCounsel, automated redaction strips PHI, PII, privileged communications, and work product flags. The de-identified file is what gets uploaded. The AI does its work on the redacted version. Identifiers get reintroduced inside the firm's secure environment when the deliverable is ready for the client. Every redaction is logged with the category, rule, timestamp, and approving user. This is the workflow most firms in this category are converging on, and it is what makes the Heppner analysis come out favorably if challenged.

In the in-firm AI path, redaction's role shifts. The privilege exposure to third-party vendors is no longer the primary concern, because the data is not leaving the firm. Redaction becomes a tool for internal data governance. Junior associates, contract reviewers, summer interns, or supervising attorneys delegating to students should not see more than they need to see. Redaction enforces that at the document level. It also matters when outputs leave the firm: a brief, a discovery production, a court filing, or a client deliverable derived from the in-firm AI's work still needs the same redaction sanity check before it goes out.

The architecture changes the reason for redacting. The redaction layer still earns its place in all three.

How to decide which architecture fits your firm

There is no universal right answer here, but there are patterns. Solo attorneys, small firms, and clinics with limited IT capacity are usually best served by enterprise SaaS legal AI paired with pre-upload redaction. The procurement is straightforward, the security is real, and the redaction layer handles the residual privilege concern. The HIPAA-compliant document redaction guide for small law firms walks through what setup looks like without IT.

Mid-size and large firms in regulated practice areas, including healthcare, government, defense, and matters touching CJIS data, increasingly look at in-firm AI. The deciding factor is usually data residency, classified material, or BAAs that get complicated at scale. In-firm AI removes those constraints by removing the third party from the equation.

Law school clinics, legal aid organizations, and government legal teams sit between the two, with their path usually determined by procurement and infrastructure rather than risk appetite alone. For high-volume legal work, the operational mechanics are in the bulk document redaction for law firms guide.

The honest test: if your practice regularly involves PHI, classified information, or matters where a third-party data disclosure could itself be a problem, in-firm AI is worth a serious look. For everyone else, enterprise SaaS with proper redaction is a defensible choice.

What every legal AI workflow needs, regardless of architecture

Four elements show up in every defensible legal AI workflow no matter which architecture the firm chose. The first is a complete audit trail: every redaction logged with category, rule, confidence threshold, timestamp, and approving reviewer. This is the record that holds up if a state bar, opposing counsel, or internal compliance review asks how client data was protected.

The second is preservation of originals. The unredacted file does not get destroyed. It stays separately, with stricter access controls, so the firm can always reconstruct what was redacted and why. The third is coverage across formats, because real matters do not arrive as clean Word documents. They include scanned PDFs, audio, video exhibits, images, spreadsheets, and email chains. A redaction layer that only handles documents misses what the AI tool will also miss.

The fourth is defensible redaction codes tied to specific legal exemptions: attorney-client privilege, work product, PHI under HIPAA, or FRCP 5.2 categories. Codes are what make redactions defensible when questioned. The legal redaction software overview covers how these map to ABA Model Rule 1.6 and Federal Rules of Civil Procedure compliance.

VIDIZMO Redactor handles all four across 255-plus file formats, with automated PHI and PII detection, audit logging, original preservation, and configurable redaction codes for legal use. For firms staying with enterprise SaaS legal AI, this is the upstream layer that closes the privilege gap.

Try VIDIZMO Redactor to see what the upstream redaction layer looks like on a real document from your practice.

A note on the in-firm AI path

If the third-party AI question is the blocker to your firm's AI adoption, the architectural answer is to keep the AI inside the firm. In-firm AI platforms run the model in your environment, process client data without external disclosure, and remove the contractual-trust dependency that enterprise SaaS still relies on. VIDIZMO Intelligence Hub is one option in this category, with built-in PII and PHI detection, multi-modal coverage across documents, audio, and video, and on-premises or private-cloud deployment.

For most firms, the right answer is a hybrid: enterprise SaaS for general work plus in-firm AI for the matters that genuinely cannot tolerate third-party exposure. Redaction sits in both workflows, doing different jobs.