How to Redact Medical Images (PET, CT, MRI Scans) While Protecting PHI

by Ali Rind, Last updated: March 9, 2026, ref:

a person redacting information from medical images

Redacting PHI from Medical Images: DICOM, CT, PET and MRI

14:27

A hospital shares a set of CT scans with an external research partner for a multi-site oncology study. The imaging team strips the patient name from the file header. But the scan itself still carries burned-in overlays showing the patient's date of birth, medical record number, and referring physician, baked directly into the image pixels. The research partner now has identifiable patient data, and the hospital has a HIPAA compliance gap.

Medical images are not like standard documents. PHI does not sit neatly in text fields that a find-and-replace can catch. It is embedded in DICOM metadata headers, burned into image overlays, attached as embedded radiology reports, and in some cases visible in the scan content itself. Standard redaction tools built for PDFs and Word documents cannot handle any of this.

This post covers where PHI hides in medical images, why conventional redaction approaches fail, and how AI-powered redaction handles the specific challenges of PET, CT, MRI, and other diagnostic imaging files.

Where PHI Hides in Medical Images

Medical imaging files carry patient data in multiple layers, and each layer requires a different detection and redaction approach.

DICOM Metadata Headers

Digital Imaging and Communications in Medicine (DICOM) is the universal standard for medical imaging. Every DICOM file contains a structured metadata header with dozens of fields, many of which hold PHI:

image (1)-Mar-09-2026-03-08-29-1826-PM

The DICOM standard defines over 40 tags that can contain individually identifiable health information. Simply clearing the Patient Name field is not sufficient. The remaining tags can still identify the patient, especially when combined.

Burned-In Overlays

Many imaging modalities embed patient information directly into the pixel data of the image. This is called "burned-in" data because it is rendered as part of the image itself, not stored as separate metadata that can be stripped independently.

Burned-in overlays commonly include patient name and ID in the corner of the scan image, date of birth and age at time of study, accession numbers and study dates, institutional name and department identifiers, referring and performing physician names, and technologist initials.

Burned-in data is the most challenging PHI to remove from medical images. Because it is part of the pixel data, metadata-stripping tools cannot detect or remove it. It requires visual analysis, either manual pixel-by-pixel inspection or AI-powered optical character recognition (OCR) applied to the image content.

Embedded Reports and Annotations

Medical imaging studies often include embedded objects beyond the scan itself. Radiology reports may be attached as encapsulated PDF documents within the DICOM file. Clinical annotations added by radiologists or referring physicians, structured reports (DICOM SR) containing diagnostic findings linked to patient identifiers, and key images selected by radiologists with annotated findings that reference patient data are all common examples.

These embedded documents carry the same PHI risks as standalone clinical documents, including patient names, dates, diagnoses, and provider information, but they travel inside the imaging file and are invisible to tools that only process the image layer.

Secondary Capture and Derived Images

When medical images are screen-captured, exported to standard image formats (JPEG, PNG, TIFF), or converted for presentations and publications, the resulting files may carry patient information visible in the original scan overlay (now permanently embedded as pixels), filename conventions that include patient names, MRNs, or dates (e.g., SmithJohn_MRN12345_CT_20260215.jpg), and EXIF metadata inherited from the capture device.

These secondary captures are often the files that get shared most broadly, in research presentations, educational materials, clinical case discussions, and second-opinion consultations, making them high-risk vectors for PHI exposure.

Why Standard Redaction Tools Fail with Medical Images

Most redaction software is designed for one of two things: redacting text in documents or redacting faces in video. Neither approach handles the layered PHI structure of medical imaging files.

Text-based redaction tools built for PDFs, Word documents, and spreadsheets cannot read DICOM file structures or metadata headers, cannot detect burned-in text rendered as pixels in an image, cannot process embedded objects within a DICOM wrapper, and have no awareness of medical imaging standards or healthcare-specific identifier patterns.

Video and image redaction tools built for faces, license plates, and objects can detect visual elements in the image but lack OCR capabilities for burned-in text overlays. They cannot access or modify DICOM metadata headers, do not recognize healthcare-specific identifiers like MRNs, accession numbers, or NPI numbers, and cannot handle the multi-layer structure of a DICOM file.

DICOM de-identification utilities, while specialized for metadata stripping, handle metadata headers effectively but ignore burned-in overlays. They do not process embedded reports or structured reports, cannot redact secondary captures or standard image formats, and typically lack AI-powered detection, relying instead on pre-defined tag lists that miss custom or institution-specific fields.

The gap is clear: no single-purpose tool covers all the layers where PHI exists in medical imaging. Healthcare organizations that use a metadata stripper for DICOM headers and a separate tool for document redaction still miss burned-in overlays, embedded reports, and secondary captures. Learn more about the broader limitations of general-purpose tools in our guide to PHI redaction in healthcare documents.

A Multi-Layer Approach to Medical Image Redaction

Effective medical image redaction requires addressing every layer where PHI can exist, covering metadata, pixel-level overlays, embedded documents, and derived files, in a coordinated workflow.

Layer 1: DICOM Metadata Redaction

The structured metadata header is the most straightforward layer to address. A DICOM-aware redaction tool should identify and redact all HIPAA-relevant DICOM tags, not just Patient Name and Patient ID, but the full set of 40+ tags that can contain individually identifiable information. It should support configurable tag policies so organizations can define which tags to clear, replace with pseudonyms, or retain for research utility, preserve study-level identifiers needed for data integrity while removing patient-level identifiers, and log every metadata modification for audit trail purposes.

Layer 2: Burned-In Overlay Detection and Redaction

This is where AI-powered redaction becomes essential. Burned-in overlays are pixel data and cannot be stripped by metadata tools. Redacting them requires OCR applied to the image content to detect text rendered as pixels in overlay regions, pattern matching to identify PHI-specific text such as names, dates, MRNs, accession numbers, and physician names, contextual AI recognition to catch identifiers that do not follow standard patterns, and targeted pixel redaction that obscures only the PHI regions while preserving the diagnostic content of the image.

The clinical preservation requirement is critical. Over-redacting, meaning obscuring diagnostic regions of the scan alongside PHI overlays, destroys the clinical value of the image. AI-powered detection with configurable confidence thresholds allows teams to balance thoroughness with precision.

Layer 3: Embedded Document and Report Redaction

Radiology reports, structured reports, and clinical annotations embedded within DICOM files need the same redaction treatment as standalone documents. This means extracting embedded PDFs and structured reports from the DICOM wrapper, applying document-level PII detection covering patient names, dates, MRNs, diagnoses, and provider identifiers, redacting identified PHI, and re-embedding the redacted documents into the DICOM file or exporting them separately.

A platform that handles both medical imaging and document redaction in a single workflow eliminates the gap where a de-identified scan ships with an un-redacted radiology report still attached. See how a unified approach works in our overview of healthcare redaction software.

Layer 4: Secondary Captures and Derived Files

Screen captures, exported JPEGs, and presentation slides derived from medical images require image-level OCR to detect patient information burned into the exported file, face detection for images that include patient photos (common in dermatology, ophthalmology, and surgical imaging), filename sanitization to catch patient data embedded in file naming conventions, and metadata stripping to remove EXIF data from standard image formats.

Common Scenarios Requiring Medical Image Redaction

Research Data Sharing

Multi-site clinical studies require de-identified imaging datasets shared across institutions. IRB (Institutional Review Board) protocols mandate that all 18 HIPAA identifiers are removed before data leaves the originating institution. A single burned-in overlay or un-cleared metadata tag can disqualify an entire dataset from research use or constitute a reportable breach. Our guide on how to redact PHI in medical records for clinical research covers this workflow in depth.

Second-Opinion Consultations

Physicians sharing imaging studies with external specialists for diagnostic consultation must ensure patient PHI is removed before transmission. This applies whether the consultation is formal (through an established referral network) or informal (a colleague reviewing a scan via email or secure messaging).

Medical Education and Training

Teaching hospitals and medical schools routinely use diagnostic images in lectures, case presentations, and training materials. These images are often shared with large audiences, including residents, students, and visiting faculty, who are not covered under the originating institution's treatment relationship with the patient.

Legal and Insurance Proceedings

Medical images submitted as evidence in malpractice cases, disability claims, or insurance disputes may need selective redaction, removing identifiers of non-party patients visible in the same study or redacting provider information that is not relevant to the proceeding.

Publication and Presentation

Researchers submitting case reports, clinical papers, or conference presentations with imaging data must de-identify all patient information. Journal requirements and conference policies typically align with HIPAA's Safe Harbor method, requiring removal of all 18 identifier categories.

How VIDIZMO Redactor Handles Medical Image Redaction

VIDIZMO Redactor addresses the multi-layer PHI challenge in medical images through a combination of DICOM-specific capabilities and its broader AI-powered redaction platform.

For DICOM support, Redactor provides native DICOM file processing, redacting PHI embedded in medical imaging metadata and burned-in overlays without converting to standard image formats first, along with metadata header redaction covering the full range of HIPAA-relevant DICOM tags.

For AI-powered visual detection, OCR detects burned-in text overlays including patient names, dates, MRNs, accession numbers, and physician identifiers rendered as pixels in the image. Face detection covers patient-identifiable features in clinical photographs and surgical imaging. Over 40 PII and PHI type detections use both pattern matching and contextual AI recognition. Configurable confidence thresholds run from 25% to 90%, supporting higher sensitivity for research de-identification and balanced settings for clinical consultation.

For embedded document redaction, PDF, DOCX, and image redaction are handled within the same platform, so radiology reports, structured reports, and clinical annotations attached to imaging studies are processed alongside the image itself. OCR handles scanned documents and image-embedded text across 255+ supported file formats.

For workflow and compliance, audit trails log every detection and redaction decision, which is essential for IRB compliance and HIPAA documentation. Redaction copy generation preserves the original imaging study untouched. Bulk processing has been tested with over 1.1 million files for institutions managing large imaging archives. Split-screen comparison supports radiologist or privacy officer review of redacted versus original files. HIPAA-compliant deployments are supported with BAA/DPA available, and SaaS, government cloud, on-premises, and hybrid deployment options ensure imaging data never leaves the organization's approved environment during redaction.

To understand the full scope of the platform beyond imaging, explore the VIDIZMO healthcare data redaction software page or read how redaction for hospital systems addresses PHI across documents, video, and audio in a single workflow.

Explore how Redactor handles PHI redaction across medical imaging workflows: request a demo.

Conclusion

Medical images are among the most PHI-dense files in healthcare. Patient data is layered across DICOM metadata headers, burned-in pixel overlays, embedded radiology reports, and derived files, and each layer requires a different detection and redaction approach. Standard redaction tools built for documents or video cannot handle this layered structure, and DICOM-only de-identification utilities miss burned-in overlays and embedded reports.

Effective medical image redaction requires a platform that addresses all four layers in a coordinated workflow: metadata redaction, AI-powered overlay detection, embedded document processing, and secondary capture handling. Audit trails and configurable sensitivity controls ensure the process is both defensible and clinically appropriate.