Continuous Integration for AI Projects – Part 2

Itai Weiss

In Part 1, we introduced a 3-layer testing framework for AI projects: unit tests, smoke tests, and golden set tests. Now let’s look at how to actually set this up with pytest and GitLab CI—including the infrastructure details that tutorials often skip.

Organizing Your Tests

Create three directories matching the framework: tests/unit/ for fast deterministic tests, tests/smoke/ for pipeline sanity checks, and tests/golden/ for fixed dataset tests. Use a conftest.py to automatically apply pytest markers based on directory. This lets you run each layer separately with pytest -m unit or pytest -m golden.

The Pipeline

Your .gitlab-ci.yml should define two jobs. The first runs unit tests on every push—these are fast and catch logic errors immediately. The second runs smoke and golden tests only on merge requests—these take longer and may need GPU access, so you don’t want them blocking every commit.

Here’s an example for such .yml file:

Registering a Runner

Most runner setup guides start at config.toml, but the first real decision is how the runner is attached to a project. In modern GitLab, the recommended flow is: create the runner in the GitLab UI to get a short-lived runner authentication token, then register the runner from the server so it writes its local config.toml.

In the GitLab UI (project runner)

Go to your project → Settings → CI/CD.
Expand Runners, then choose Create project runner.
Choose the OS where the runner will run.
Set Tags (for AI work, tags like gpu / cpu are worth being strict about). If you don’t want this runner picking up random jobs, don’t enable “run untagged”.
Create the runner and copy the runner authentication token (tokens typically look like glrt-…). This token is only shown briefly during the creation/registration flow.

On the runner host (server-side registration)

After you’ve installed GitLab Runner on the machine, run the interactive registration:

You’ll be prompted for:

GitLab instance URL
Runner authentication token
Runner description
Job tags
Executor type (use docker for the containerized workflow)

Once you finish registration, GitLab Runner saves everything locally in config.toml. Treat that file like a secret: it contains the runner authentication token, and anyone who gets it can potentially “clone” the runner identity.

Setting Up the Runner

For AI projects, use GitLab’s Docker executor rather than the shell executor. Each job runs in a fresh container, which gives you isolation and reproducibility. You’ll need to install NVIDIA Container Toolkit on the runner host so containers can access the GPU. Configure the runner with runtime = “nvidia” to enable GPU passthrough.

Mounting Data and Resources

Here’s what tutorials often miss: your development container and your CI runner need access to the same data. If you’re using NFS mounts for datasets in your devcontainer, configure the runner to mount those same paths into CI containers. Add volume mounts in your runner configuration—for example, mounting <HOST_GOLDEN_DATA_PATH>:<CONTAINER_GOLDEN_DATA_PATH>:ro for read-only dataset access and

<HOST_GOLDEN_DATA_PATH>:<CONTAINER_GOLDEN_DATA_PATH>:rw for scratch space. This way, golden tests can access fixed datasets without copying anything.

For PyTorch projects, also increase the shared memory size in your runner config. The default is too small for DataLoader with multiple workers, and you’ll get cryptic crashes without it.

Final Thoughts

Most AI CI failures aren’t bugs—they’re symptoms of a system that changes every week. This 3-layer framework gives you rock-solid logic tests, pipeline survival checks, and behavioral sanity grounded in domain knowledge. Once your runner is configured with GPU access and the right data mounts, CI becomes a tool that works with your AI system instead of against it.

Here’s the question to leave with: what are the domain rules in your problem that you can turn into golden set tests tomorrow?

Get in touch

Please fill the following form and our experts will be happy to reply to you soon

Continuous Integration for AI Projects – Part 2

Organizing Your Tests

The Pipeline

Registering a Runner

In the GitLab UI (project runner)

Setting Up the Runner

Mounting Data and Resources

Final Thoughts

Related Content

Engineering for Annotation in the ML Pipeline | Part 2

Engineering for Annotation in the ML Pipeline

From DICOM Chaos to Training-Ready Data: Our Dataset Pipeline for Medical AI – Part 4

From DICOM Chaos to Training-Ready Data: Our Dataset Pipeline for Medical AI – Part 3

From DICOM Chaos to Training-Ready Data: Our Dataset Pipeline for Medical AI – Part 2

From DICOM Chaos to Training-Ready Data: Our Dataset Pipeline for Medical AI – Part 1

Engineering for Annotation in the ML Pipeline | Part 2

Engineering for Annotation in the ML Pipeline

From DICOM Chaos to Training-Ready Data: Our Dataset Pipeline for Medical AI – Part 4

From DICOM Chaos to Training-Ready Data: Our Dataset Pipeline for Medical AI – Part 3

From DICOM Chaos to Training-Ready Data: Our Dataset Pipeline for Medical AI – Part 2

From DICOM Chaos to Training-Ready Data: Our Dataset Pipeline for Medical AI – Part 1

Get in touch

Recent News

RSIP Vision is attending AAOS 2026

RSIP Vision will be at the Spine Summit in Phoenix, AZ

RSIP Vision will be attending the SAGES NBT Innovation weekend

Announcement – XPlan.ai Confirms Premier Precision in Peer-Reviewed Clinical Study of its 2D-to-3D Knee Reconstruction Solution

Upcoming Events

SAGES | Tampa, FL

AUA | Washington, DC

ATS | Orlando, FL

Find quick answers here

Follow us