How to Redact Medical Records Before Uploading to AI Tool

by Ali Rind, Last updated: April 2, 2026, ref: 

a person redacting document using redaction software

Redact Medical Records Before Using AI : HIPAA Guide for Attorneys
8:17

Personal injury attorneys are adopting AI tools faster than their compliance practices can keep up. AI assistants that summarize medical records, build chronologies, and draft demand letters are genuinely useful. The compliance problem is real too. Uploading raw medical records to any AI tool without first removing protected health information (PHI) creates a direct HIPAA violation.

This guide walks through exactly how to redact PHI from medical records before uploading to any AI platform, including which identifiers to remove, why manual redaction falls short, and how to build a repeatable automated workflow using VIDIZMO Redactor. For background on the underlying compliance obligations.

For a broader overview of document redaction across case types, see our complete guide to document redaction software for legal teams.

Why Attorneys Are Uploading Medical Records to AI Tools

A typical personal injury case involves hundreds, sometimes thousands, of pages of medical records: treatment notes, radiology reports, prescription histories, surgical records, and billing documentation. Reviewing all of it manually is slow and expensive.

AI tools compress that work substantially. An LLM assistant can produce a chronological summary of a multi-provider medical history in minutes, flag inconsistencies between treating physicians, and help draft damages narratives. For small firms without large associate staff, that efficiency matters.

The problem is straightforward: those medical records are full of PHI, and most AI tools are not configured to handle it.

The Compliance Risk: What PHI Exposure Looks Like in Practice

When a PI attorney uploads an unredacted medical record to a cloud-based AI tool, at least three things may happen that create legal exposure:

The data may be retained. Many AI platforms store session data on vendor infrastructure. Without a Business Associate Agreement (BAA) in place, that retention constitutes unauthorized disclosure of PHI under HIPAA.

The data may be used for model training. Some platforms use submitted content to improve their models. If patient information is used for training without authorization, it is no longer under the attorney's or patient's control.

Privilege may be implicated. Transmitting a client's confidential medical history to a third-party service raises questions under ABA Model Rule 1.6 about whether that disclosure was authorized and whether reasonable precautions were taken.

The fix is not to stop using AI tools. It is to ensure that no PHI reaches those tools in the first place.

What Needs to Be Redacted: PHI in Medical Records

HIPAA defines 18 categories of identifiers that must be removed to achieve Safe Harbor de-identification. In a typical PI case medical record, the most common include:

What Needs to Be Redacted: PHI in Medical Records

Beyond the identifiers themselves, medical records frequently contain embedded PHI inside narrative text: "Patient Jane Doe, born March 4, 1978, presented to Dr. Ramirez at Chicago General on January 12, 2025." Standard find-and-replace tools do not catch contextual PHI of this type. Automated redaction using natural language processing (NLP) does.

To understand how automated redaction handles PHI across different file types used in legal cases, see our guide on how to redact PDF documents for legal and compliance workflows.

Why Incognito Mode and Privacy Settings Are Not Enough

Two common misconceptions among attorneys new to AI compliance:

Incognito mode affects only browser-side history. It has no effect on what the AI vendor receives, processes, or stores server-side.

Disabling chat history may prevent the AI from surfacing prior sessions in the UI. It does not govern whether the vendor retains your submission data for compliance, training, or debugging purposes.

The only approach that reliably prevents PHI from reaching an AI tool is removing it from the document before submission.

The Correct Workflow: Redact First, Then Upload

The redact-first workflow has five steps:

  1. Receive and store. Medical records land in your case management system, encrypted at rest. No external tool touches them yet.
  2. Identify PHI. Every document is scanned for the 18 HIPAA Safe Harbor identifiers plus any contextual PHI in narrative text.
  3. Redact. All identified PHI is removed or obscured. The redacted copy is saved separately. The original is preserved intact.
  4. Upload to AI. Only the de-identified version is submitted to the AI tool. The AI sees no PHI.
  5. Work with the output. Summaries, chronologies, and draft text generated by the AI do not reference identifiable patient information. If PHI needs to be reintroduced into a final work product, that happens within your secure case management environment, not in the AI tool.

The critical step is Step 3, and that is where the choice between manual and automated redaction determines whether this workflow is practical at scale.

Automated Redaction vs. Manual Redaction

Most PI attorneys who redact documents manually do so by opening the PDF, searching for known identifiers, and applying black boxes or text strikethroughs. This approach has significant limitations:

Automated Redaction vs. Manual Redaction

For a firm handling five cases a month with a few hundred pages each, manual redaction is manageable. Barely. For firms with higher volume, or those building AI-assisted workflows into their standard practice, automated redaction is the only approach that does not create a new bottleneck. 

Step-by-Step: Using VIDIZMO Redactor Before Any AI Tool

VIDIZMO Redactor is an AI-powered redaction platform designed for document-heavy workflows. For PI attorneys preparing medical records for AI processing, the workflow looks like this:

Step 1: Upload the document. Drag the medical record PDF into Redactor or connect via folder watch. Redactor accepts PDFs, scanned documents, DOCX, and image files. Scanned physician notes and handwritten records are processed through optical character recognition (OCR) and intelligent character recognition (ICR).

Step 2: Run automated PHI detection. Redactor's AI scans the document for 40+ PHI and PII types, including all 18 HIPAA Safe Harbor identifiers plus contextual identifiers in narrative text. Detection uses both pattern recognition (for structured data like SSNs and phone numbers) and NLP-based contextual analysis (for names and dates embedded in prose).

Step 3: Review and adjust. Redactor flags all detected PHI for review. You can increase or decrease the confidence threshold, manually add identifiers the AI missed, or remove false positives before finalizing. For straightforward records with standard formatting, most attorneys accept the automated output directly.

Step 4: Generate the redacted copy. Redactor produces a clean redacted version. The original is preserved separately in the system. You receive a de-identified PDF ready to upload to any AI tool.

Step 5: Confirm the audit log. Every redaction decision is logged: which identifier was redacted, the redaction category, the timestamp, and the user who processed the file. This log is available for review in the event of a compliance audit or bar inquiry.

The entire process for a 200-page medical record typically takes less time than a single manual review pass.

Redact medical records in minutes before uploading to any AI tool. Try VIDIZMO Redactor or explore healthcare data redaction features to see how it fits your firm's workflow.

Request a Free Trial

Conclusion

The question is no longer whether personal injury attorneys should use AI tools for medical record workflows. It is whether they are doing it correctly. How to redact medical records before AI upload comes down to one principle: no PHI should reach an external AI tool. Every identifier must be removed before submission.

Manual redaction can accomplish this in low volumes. At any real scale, automated redaction using a purpose-built tool is the only approach that is both thorough and sustainable.

People Also Ask

What PHI needs to be removed from medical records before using AI?

HIPAA's Safe Harbor standard identifies 18 categories: patient name, geographic data smaller than state, all dates except year, phone numbers, fax numbers, email addresses, Social Security numbers, medical record numbers, health plan beneficiary numbers, account numbers, certificate and license numbers, vehicle identifiers, device identifiers, web URLs, IP addresses, biometric identifiers, full-face photographs, and any other unique identifying number or code. In practice, medical records contain most of these.

Can I use AI tools for medical records without redacting first?

Only if the AI tool vendor has signed a Business Associate Agreement with your firm and the deployment meets HIPAA's Security Rule requirements. Most general-purpose AI tools do not offer BAAs to small firms. Without a BAA, uploading PHI to these tools creates HIPAA liability.

What is the difference between redaction and de-identification?

Redaction removes or obscures specific PHI from a document. De-identification is the broader HIPAA standard for ensuring a document contains no information that could reasonably be used to identify an individual. A properly redacted document that removes all 18 Safe Harbor identifiers qualifies as de-identified under HIPAA's Safe Harbor method.

How long does it take to redact a 200-page medical record?

Manually, 45 to 90 minutes depending on the document complexity and the reviewer's thoroughness. With automated redaction software, the same document typically processes in under five minutes, with a brief review pass to confirm accuracy.

Jump to

    No Comments Yet

    Let us know what you think

    back to top