Skip to content
  • Our Work
    • Fields
      • Cardiology
      • ENT
      • Gastro
      • Orthopedics
      • Ophthalmology
      • Pulmonology
      • Surgical
      • Urology
      • Other
    • Modalities
      • Endoscopy
      • Medical segmentation
      • Microscopy
      • Ultrasound
  • Success Stories
  • Insights
    • Computer Vision News
    • News
    • Upcoming Events
    • Blog
  • The company
    • About RSIP Vision
    • Careers
  • FAQ
Menu
  • Our Work
    • Fields
      • Cardiology
      • ENT
      • Gastro
      • Orthopedics
      • Ophthalmology
      • Pulmonology
      • Surgical
      • Urology
      • Other
    • Modalities
      • Endoscopy
      • Medical segmentation
      • Microscopy
      • Ultrasound
  • Success Stories
  • Insights
    • Computer Vision News
    • News
    • Upcoming Events
    • Blog
  • The company
    • About RSIP Vision
    • Careers
  • FAQ
Contact

From DICOM Chaos to Training-Ready Data: Our Dataset Pipeline for Medical AI – Part 3

Part 3: Prepare

The Step Where “Model Bugs” Are Usually Born

Once scans are organized and annotations exist, the temptation is to treat “prepare” as a quick preprocessing script. This is where teams get hurt because prepare rarely fails loudly.

It produces training data that is:

  • Plausible
  • Consistent-looking
  • But wrong

Wrong training data is worse than broken training code because it wastes time and misleads you.

What Prepare Actually Does

Prepare takes your organized data and turns it into training artifacts:

  • Precomputed crops / patches
  • Resized volumes
  • Resampled labels
  • Split manifests (train/val/test lists)
  • Cached augmentations
  • Derived targets

It also often includes decisions like:

  • What spacing to resample to
  • How to crop around anatomy
  • What augmentations are allowed
  • What “empty” masks mean
  • How to handle partial labels

It’s production code — and should be treated that way.

Validation in Prepare: Your Lie Detector

We validate prepare using two pillars:

1) Visual tests (random cases, always)

For random cases:

  • Display image + annotation overlay
  • Display training crop + annotation overlay
  • Display augmentations (rotate/flip/intensity)
  • Ensure the anatomy stays inside the crop
  • Ensure labels aren’t shifted, degraded, or resampled incorrectly

If you do nothing else: do this.

Visual inspection catches the “silent failure” class of bugs better than anything.

2) Dataset unit tests (cheap, powerful)

These are simple checks that save weeks:

  • Label coverage (% positive voxels)
  • Non-empty masks where expected
  • Bounding box distributions
  • Image/label shape match
  • Spacing consistency between image and label
  • Patient IDs don’t overlap across splits
  • Intensity histogram comparisons per dataset
  • Class balance across splits

These tests don’t need to be fancy. They just need to exist because a dataset pipeline without unit tests is an accident waiting to happen.

Next: Part 4 — Validation + Test: Keeping Your Metric Real and Your Test Set Meaningful

Share

Share on linkedin
Share on twitter
Share on facebook

Related Content

From DICOM Chaos to Training-Ready Data: Our Dataset Pipeline for Medical AI – Part 2

From DICOM Chaos to Training-Ready Data: Our Dataset Pipeline for Medical AI – Part 1

Continuous Integration for AI Projects – Part 2

Continuous Integration for AI Projects

MLOps for AI in Medical Imaging

Using Generative AI to generate Synthetic Labeled medical data

GenAI in Medical Imaging

From DICOM Chaos to Training-Ready Data: Our Dataset Pipeline for Medical AI – Part 2

From DICOM Chaos to Training-Ready Data: Our Dataset Pipeline for Medical AI – Part 1

Continuous Integration for AI Projects – Part 2

Continuous Integration for AI Projects

MLOps for AI in Medical Imaging

Using Generative AI to generate Synthetic Labeled medical data

GenAI in Medical Imaging

Show all

Get in touch

Please fill the following form and our experts will be happy to reply to you soon

Recent News

Announcement – XPlan.ai Confirms Premier Precision in Peer-Reviewed Clinical Study of its 2D-to-3D Knee Reconstruction Solution

IBD Scoring – Clario, GI Reviewers and RSIP Vision Team Up

RSIP Neph Announces a Revolutionary Intra-op Solution for Partial Nephrectomy Surgeries

Announcement – XPlan.ai by RSIP Vision Presents Successful Preliminary Results from Clinical Study of it’s XPlan 2D-to-3D Knee Bones Reconstruction

All news
Upcoming Events
Stay informed for our next events
Find quick answers here
FAQ
Follow us
Linkedin Twitter Facebook Youtube

contact@rsipvision.com

Terms of Use

Privacy Policy

© All rights reserved to RSIP Vision 2023

Created by Shmulik

  • Our Work
    • title-1
      • Ophthalmology
      • Uncategorized
      • Ophthalmology
      • Pulmonology
      • Cardiology
      • Orthopedics
    • Title-2
      • Orthopedics
  • Success Stories
  • Insights
  • The company
  • FAQ