Skip to content
  • Our Work
    • Fields
      • Cardiology
      • ENT
      • Gastro
      • Orthopedics
      • Ophthalmology
      • Pulmonology
      • Surgical
      • Urology
      • Other
    • Modalities
      • Endoscopy
      • Medical segmentation
      • Microscopy
      • Ultrasound
  • Success Stories
  • Insights
    • Computer Vision News
    • News
    • Upcoming Events
    • Blog
  • The company
    • About RSIP Vision
    • Careers
  • FAQ
Menu
  • Our Work
    • Fields
      • Cardiology
      • ENT
      • Gastro
      • Orthopedics
      • Ophthalmology
      • Pulmonology
      • Surgical
      • Urology
      • Other
    • Modalities
      • Endoscopy
      • Medical segmentation
      • Microscopy
      • Ultrasound
  • Success Stories
  • Insights
    • Computer Vision News
    • News
    • Upcoming Events
    • Blog
  • The company
    • About RSIP Vision
    • Careers
  • FAQ
Contact

From DICOM Chaos to Training-Ready Data: Our Dataset Pipeline for Medical AI – Part 3

Part 3: Prepare

The Step Where “Model Bugs” Are Usually Born

Once scans are organized and annotations exist, the temptation is to treat “prepare” as a quick preprocessing script. This is where teams get hurt because prepare rarely fails loudly.

It produces training data that is:

  • Plausible
  • Consistent-looking
  • But wrong

Wrong training data is worse than broken training code because it wastes time and misleads you.

What Prepare Actually Does

Prepare takes your organized data and turns it into training artifacts:

  • Precomputed crops / patches
  • Resized volumes
  • Resampled labels
  • Split manifests (train/val/test lists)
  • Cached augmentations
  • Derived targets

It also often includes decisions like:

  • What spacing to resample to
  • How to crop around anatomy
  • What augmentations are allowed
  • What “empty” masks mean
  • How to handle partial labels

It’s production code — and should be treated that way.

Validation in Prepare: Your Lie Detector

We validate prepare using two pillars:

1) Visual tests (random cases, always)

For random cases:

  • Display image + annotation overlay
  • Display training crop + annotation overlay
  • Display augmentations (rotate/flip/intensity)
  • Ensure the anatomy stays inside the crop
  • Ensure labels aren’t shifted, degraded, or resampled incorrectly

If you do nothing else: do this.

Visual inspection catches the “silent failure” class of bugs better than anything.

2) Dataset unit tests (cheap, powerful)

These are simple checks that save weeks:

  • Label coverage (% positive voxels)
  • Non-empty masks where expected
  • Bounding box distributions
  • Image/label shape match
  • Spacing consistency between image and label
  • Patient IDs don’t overlap across splits
  • Intensity histogram comparisons per dataset
  • Class balance across splits

These tests don’t need to be fancy. They just need to exist because a dataset pipeline without unit tests is an accident waiting to happen.

Next: Part 4 — Validation + Test: Keeping Your Metric Real and Your Test Set Meaningful

Share

Share on linkedin
Share on twitter
Share on facebook

Related Content

Engineering for Annotation in the ML Pipeline

From DICOM Chaos to Training-Ready Data: Our Dataset Pipeline for Medical AI – Part 4

From DICOM Chaos to Training-Ready Data: Our Dataset Pipeline for Medical AI – Part 2

From DICOM Chaos to Training-Ready Data: Our Dataset Pipeline for Medical AI – Part 1

Continuous Integration for AI Projects – Part 2

Continuous Integration for AI Projects

Engineering for Annotation in the ML Pipeline

From DICOM Chaos to Training-Ready Data: Our Dataset Pipeline for Medical AI – Part 4

From DICOM Chaos to Training-Ready Data: Our Dataset Pipeline for Medical AI – Part 2

From DICOM Chaos to Training-Ready Data: Our Dataset Pipeline for Medical AI – Part 1

Continuous Integration for AI Projects – Part 2

Continuous Integration for AI Projects

Show all

Get in touch

Please fill the following form and our experts will be happy to reply to you soon

Recent News

RSIP Vision will be at the Spine Summit in Phoenix, AZ

RSIP Vision will be attending the SAGES NBT Innovation weekend

Announcement – XPlan.ai Confirms Premier Precision in Peer-Reviewed Clinical Study of its 2D-to-3D Knee Reconstruction Solution

IBD Scoring – Clario, GI Reviewers and RSIP Vision Team Up

All news
Upcoming Events

Spine Summit | Phoenix, AZ

February 26 - March 1 2026

AAOS | New Orleans, LA

March 2 - 6 2026

SAGES | Tampa, FL

March 25 - 28 2026
Stay informed for our next events
Find quick answers here
FAQ
Follow us
Linkedin Twitter Facebook Youtube

contact@rsipvision.com

Terms of Use

Privacy Policy

© All rights reserved to RSIP Vision 2023

Created by Shmulik

  • Our Work
    • title-1
      • Ophthalmology
      • Uncategorized
      • Ophthalmology
      • Pulmonology
      • Cardiology
      • Orthopedics
    • Title-2
      • Orthopedics
  • Success Stories
  • Insights
  • The company
  • FAQ