Part 3: Prepare
The Step Where “Model Bugs” Are Usually Born
Once scans are organized and annotations exist, the temptation is to treat “prepare” as a quick preprocessing script. This is where teams get hurt because prepare rarely fails loudly.
It produces training data that is:
- Plausible
- Consistent-looking
- But wrong
Wrong training data is worse than broken training code because it wastes time and misleads you.
What Prepare Actually Does
Prepare takes your organized data and turns it into training artifacts:
- Precomputed crops / patches
- Resized volumes
- Resampled labels
- Split manifests (train/val/test lists)
- Cached augmentations
- Derived targets
It also often includes decisions like:
- What spacing to resample to
- How to crop around anatomy
- What augmentations are allowed
- What “empty” masks mean
- How to handle partial labels
It’s production code — and should be treated that way.
Validation in Prepare: Your Lie Detector
We validate prepare using two pillars:
1) Visual tests (random cases, always)
For random cases:
- Display image + annotation overlay
- Display training crop + annotation overlay
- Display augmentations (rotate/flip/intensity)
- Ensure the anatomy stays inside the crop
- Ensure labels aren’t shifted, degraded, or resampled incorrectly
If you do nothing else: do this.
Visual inspection catches the “silent failure” class of bugs better than anything.
2) Dataset unit tests (cheap, powerful)
These are simple checks that save weeks:
- Label coverage (% positive voxels)
- Non-empty masks where expected
- Bounding box distributions
- Image/label shape match
- Spacing consistency between image and label
- Patient IDs don’t overlap across splits
- Intensity histogram comparisons per dataset
- Class balance across splits
These tests don’t need to be fancy. They just need to exist because a dataset pipeline without unit tests is an accident waiting to happen.
Next: Part 4 — Validation + Test: Keeping Your Metric Real and Your Test Set Meaningful





