Runnerset and PVC #1605
-
When using RunnerSet with Persistent Storage, everytime we scale up or scale down aswell as use ephemeral runnerset the storage is lost between jobs. The statefulset is recreated together with the job pod causing the Pvc to be removed and another bounding has to happen. This makes it take about 5 minutes for a new runner to start every time. We have resorted to using Persistent runners for now but would love to be able to use Ephemeral runnerSet together with docker image cache etc like its documented in the readme. apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerSet
metadata:
name: cached-runnerset-a
spec:
ephemeral: true
group: gaas
enterprise: removed
labels:
- gaas-cached
- cached-runnerset-a
selector:
matchLabels:
app: cached-runnerset-a
serviceName: pipeline
template:
metadata:
labels:
app: cached-runnerset-a
spec:
serviceAccountName: arc-runnerset
containers:
- name: runner
env:
- name: HTTP_PROXY
value: "removed"
- name: HTTPS_PROXY
value: "removed"
- name: NO_PROXY
value: "removed"
- name: DISABLE_RUNNER_UPDATE
value: "true"
resources:
limits:
cpu: 500m
memory: "1Gi"
requests:
cpu: 100m
memory: "1Gi"
- name: docker
env:
- name: HTTP_PROXY
value: "removed"
- name: HTTPS_PROXY
value: "removed"
- name: NO_PROXY
value: "removed"
resources:
limits:
cpu: 500m
memory: "4Gi"
requests:
cpu: 100m
memory: "4Gi"
securityContext:
privileged: true
volumeMounts:
- mountPath: /var/lib/docker
name: var-lib-docker-a
volumeClaimTemplates:
- metadata:
name: var-lib-docker-a
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: arc-var-lib-docker apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: arc-var-lib-docker
labels:
content: arc-var-lib-docker
provisioner: csi.vsphere.vmware.com
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true Is our configuration wrong? Is this how its supposed to work that the PVC is removed and our provisioning is just too slow for this configuration etc. I would love some answers if possible |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 19 replies
-
AFAIK K8s doesn't allow dynamically rebounding a bound PVC to another PV or remount bound PVC/PV to another pod. It needs to unmount and unbound before the PV is reused by another pod. The same for runner pods. Note that it's not PVCs to be reused. PVs are reused, by newly created PVCs being bound to existing eligible PVs.
I think "every time" is not correct. Due to how it works, a new runnerset statefulset's PVC needs a already "Available" PV for reuse. If there's no such PV, the K8s pvc controller will dynamically create a PV and it might take 5 minutes for that specific pod. This happens when your pv provisioner and the k8s control-plane is slow enough so that it can't unbound the PV from the old runner pod before the new statefulset gets created. After a few trials, you might see the number of PVs will max out at certain number, depending on how many new runner statefulsets/pods can be concurrently created at a time(And it highly depends on your workflows, not ARC or K8s or GitHub Actions). After that, every newly created statefulset PVC will see at least one available PV to be reused, hence the "5m" delay you saw disappear. Would you mind confirming? |
Beta Was this translation helpful? Give feedback.
-
The problem seem to be the CSI driver my enterprise onprem is running has some kind of bug, as it works on my local cluster. Thanks for a quick answer |
Beta Was this translation helpful? Give feedback.
-
@mumoshu something related to this question, I would like to understand how ARC understand which PV to use ? As the PVC is deleted when the pod is deleted and the volume become available, how does ARC know which PV to use. Second things, the PV / PVC created trough ARC even with the |
Beta Was this translation helpful? Give feedback.
-
As someone who has worked on k8s storage internals, I wanted to share the perspective that the PVC deletion and PVC-PV unbind logic is at odds with how the k8s storage system is intended to work. The RunnerSet PVC logic has a few issues:
In order to reuse cache stored in a volume, the PVC can be left in place. A new StatefulSet runner replica with the same index as the PVC (as suffixes in their names) will automatically mount the volume the PVC references. Curious what folks think, and please correct me if there's anything I misunderstood. I'm also curious to hear more about the original intention of the PVC deletion and PVC-PV unbind logic, and happy to help brainstorm other ways to solve the original problem. |
Beta Was this translation helpful? Give feedback.
-
Hello, I have registered the GitHub action runner in GKE cluster at organisation level. How we can use same docker cache PV with multiple runners pod. When I runner pipeline in Multiple repo within organisation, how runners pod use same PV to store docker cache. I understood, if one runner pod is running at a time, it will use existing PV to store docker cache but when multiple runners(more than one or two) pods are running at same time. how runner pods will use same PV. |
Beta Was this translation helpful? Give feedback.
-
Hi |
Beta Was this translation helpful? Give feedback.
AFAIK K8s doesn't allow dynamically rebounding a bound PVC to another PV or remount bound PVC/PV to another pod. It needs to unmount and unbound before the PV is reused by another pod. The same for runner pods. Note that it's not PVCs to be reused. PVs are reused, by newly created PVCs being bound to existing eligible PVs.
I think "every time" is not correct. Due to how it works, a new runnerset statefulset's PVC needs a already "Available" PV for reuse. If there's no such PV, the K8s pvc contr…