Skip to content

Commit

Permalink
[CI] Switch away from kubernetes executor mode
Browse files Browse the repository at this point in the history
We are running into reliability problems with the kubernetes executor mode,
hypothesized currently to be due to a complex interaction with
konnectivity-agent pods dying and subsequently killing in-process execs through
the k8s control plane. The plan is to switch back to the in-container executor
mode for now while we sort out the issue upstream with the konnectivity
developers.
  • Loading branch information
boomanaiden154 committed Jan 21, 2025
1 parent 6cc5dcf commit fec632c
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 85 deletions.
29 changes: 0 additions & 29 deletions premerge/linux_container_pod_template.yaml

This file was deleted.

54 changes: 11 additions & 43 deletions premerge/linux_runners_values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,6 @@ githubConfigSecret: "github-token"
minRunners: 0
maxRunners: 4

containerMode:
type: "kubernetes"
kubernetesModeWorkVolumeClaim:
accessModes: ["ReadWriteOnce"]
storageClassName: "standard-rwo"
resources:
requests:
storage: "200Gi"
kubernetesModeServiceAccount:
annotations:

template:
metadata:
annotations:
Expand All @@ -29,49 +18,28 @@ template:
premerge-platform: linux
containers:
- name: runner
image: ghcr.io/actions/actions-runner:latest
command: ["/home/runner/run.sh"]
image: ghcr.io/llvm/ci-ubuntu-22.04-agent:latest
command: ["/home/gha/actions-runner/run.sh"]
resources:
# The container will be scheduled on the same node as this runner.
# This means if we don't set the CPU request high-enough here, 2
# containers will be scheduled on the same pod, meaning 2 jobs.
# If we don't set the CPU request high-enough here, 2 runners might
# be scheduled on the same pod, meaning 2 jobs, and they will starve
# each other.
#
# This number should be:
# - greater than number_of_cores / 2:
# A value lower than that could allow the scheduler to put 2
# runners in the same pod. Meaning 2 containers in the same pod.
# Meaning 2 jobs sharing the resources.
# runners on the same node. Meaning 2 jobs sharing the resources of
# a single node.
# - lower than number_of_cores:
# Each pod has some basic services running (metrics for ex). Those
# already require some amount of CPU (~0.5). This means we don't
# exactly have N cores to allocate, but N - epsilon.
#
# Memory however shall be handled at the container level. The runner
# itself doesn't need much, just using something enough not to get
# OOM killed.
# We also need to request sufficient memory to not get OOM killed.
requests:
cpu: 55
memory: "8Gi"
memory: "200Gi"
limits:
cpu: 64
memory: "8Gi"
env:
- name: ACTIONS_RUNNER_CONTAINER_HOOKS
value: /home/runner/k8s/index.js
- name: ACTIONS_RUNNER_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
value: "true"
- name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
value: "/home/runner/pod-config/linux-container-pod-template.yaml"
volumeMounts:
- name: container-pod-config
mountPath: /home/runner/pod-config
securityContext:
fsGroup: 123
volumes:
- name: container-pod-config
configMap:
name: linux-container-pod-template
memory: "256Gi"

14 changes: 1 addition & 13 deletions premerge/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -199,17 +199,6 @@ resource "kubernetes_secret" "windows_github_pat" {
}


resource "kubernetes_config_map" "linux_container_pod_template" {
metadata {
name = "linux-container-pod-template"
namespace = "llvm-premerge-linux-runners"
}

data = {
"linux-container-pod-template.yaml" : "${file("linux_container_pod_template.yaml")}"
}
}

resource "helm_release" "github_actions_runner_controller" {
name = "llvm-premerge-controller"
namespace = "llvm-premerge-controller"
Expand All @@ -235,9 +224,8 @@ resource "helm_release" "github_actions_runner_set_linux" {

depends_on = [
kubernetes_namespace.llvm_premerge_linux_runners,
kubernetes_config_map.linux_container_pod_template,
kubernetes_secret.linux_github_pat,
helm_release.github_actions_runner_controller
kubernetes_secret.linux_github_pat
]
}

Expand Down

0 comments on commit fec632c

Please sign in to comment.