Published on: September 12, 2022
4 min read
Learn cache types, as well as when and how to use them.
If you've ever worked with GitLab CI/CD you may have needed, at some point, to use a cache to share content between jobs. The decentralized nature of GitLab CI/CD is a strength that can confuse the understanding of even the best of us when we want to connect wires all together. For instance, we need to know critical information such as the difference between artifacts and cache and where/how to place setups.
This visual guide will help with both challenges.
The concepts may seem to overlap because they are about sharing content between jobs, but they actually are fundamentally different:
Here is a simple sentence to remember if you struggle between choosing cache or artifact:
Cache is here to speed up your job but it may not exist, so don't rely on it.
This article will focus on cache.
We'll go with a simple representation of the GitLab CI/CD pipelining model and ignore (for now) that the jobs can be executed on any runners and hosts. It will help get the basics.
Let's say you have:
If you want a local cache between all your jobs running on the same runner, use the cache statement in your .gitlab-ci.yml
:
default:
cache:
path:
- relative/path/to/folder/*.ext
- relative/path/to/another_folder/
- relative/path/to/file
Using the predefined variable CI_COMMIT_REF_NAME
as the cache key, you can ensure the cache is tied to a specific branch:
default:
cache:
key: $CI_COMMIT_REF_NAME
path:
- relative/path/to/folder/*.ext
- relative/path/to/another_folder/
- relative/path/to/file
Using the predefined variable CI_JOB_NAME
as the cache key, you can ensure the cache is tied to a specific job:
If you don't want to use a volume for caching purposes (debugging purpose, cleanup disk space more easily, etc.), you can configure a bind mount for Docker volumes while registering the runner. With this setup, you do not need to set up the cache statement in your .gitlab-ci.yml
:
#!/bin/bash
gitlab-runner register \
--name="Bind-Mount Runner" \
--docker-volumes="/host/path:/container/path:rw" \
...
In fact, this setup even allows you to share a cache between jobs running on the same host without requiring you to set up a distributed cache (which we'll talk about later):
#!/bin/bash
gitlab-runner register \
--name="Bind-Mount Runner X" \
--docker-volumes="/host/path:/container/path:rw" \
...
gitlab-runner register \
--name="Bind-Mount Runner Y" \
--docker-volumes="/host/path:/container/alt/path:rw" \
...
If you want to have a shared cache between all your jobs running on multiple runners and hosts, use the [runner.cache] section in your config.toml
:
[[runners]]
name = "Distributed-Cache Runner"
...
[runners.cache]
Type = "s3"
Path = "bucket/path/prefix"
Shared = true
[runners.cache.s3]
ServerAddress = "s3.amazonaws.com"
AccessKey = "<changeme>"
SecretKey = "<changeme>"
BucketName = "foobar"
BucketLocation = "us-east-1"
Using the predefined variable CI_COMMIT_REF_NAME
as the cache key you can ensure the cache is tied to a specific branch between multiple runners and hosts:
The above assumptions allowed you to harness your understanding of the concepts and possibilities.
In real life, you'll face more complex wiring and we hope this article will help you as a visual cheatsheet along with the reference documentation.
Just to give you a sneak peek, here is an exercise for you:
Happy caching, folks!
Cover image by Alina Grubnyak on Unsplash