~sircmpwn/sr.ht-dev

builds.sr.ht: worker: prepare Kubernetes support v1 PROPOSED

This patchset makes a bunch of changes to the worker and related tools
so that they could be run in Kubernetes. This is designed in such a way
that merging these changes should have no effect on existing systems,
unless one explictly enables certain configuration settings.

The big picture would be somewhat like this: the worker and and an SSH
dispatch component run in Kubernetes. The worker runs as a dedicated
service account, which is given permissions to manage a certain (ideally
not its own) namespace. It will then run build jobs as Kubernetes batch
jobs in that namespace.

The worker and the runner-shell (used by the SSH dispatch component)
only have to adapt to the fact that build ports are adressed by a
variable host name rather than a TCP port.

The main construction site is the image control script. It is currently
called by the worker for all relevant image actions (boot, cleanup,
package install, etc). While it being a shell script makes it quite
flexible, the changes in there are not exactly pretty. I hope, however,
the patchset gives a decent impression of what is required to make this
all work.

There is one small caveat that is not being addressed by this: the
worker keeps a list of builds in memory, and considers those it does not
know about to not exist. Hence, one must take care to only run a single
instance of it (per ingress, that is). This is not exactly great, but
can be addressed at a later time.

Conrad Hoffmann (4):
  Add config settings for basic Kubernetes support
  worker: basic support for running in Kubernetes
  runner-shell: basic Kubernetes support
  images/control: basic Kubernetes support

 config.example.ini |  13 +++++
 images/control     | 122 +++++++++++++++++++++++++++++++++++++++++++--
 runner-shell       |  18 +++++--
 worker/context.go  |  28 ++++++++---
 4 files changed, 164 insertions(+), 17 deletions(-)

-- 
2.41.0
#1027222 alpine.yml success
#1027223 archlinux.yml failed
#1027224 debian.yml success
Found a couple things to point out:
Next
builds.sr.ht/patches: FAILED in 7m23s

[worker: prepare Kubernetes support][0] from [Conrad Hoffmann][1]

[0]: https://lists.sr.ht/~sircmpwn/sr.ht-dev/patches/42856
[1]: mailto:ch@bitfehler.net

✓ #1027222 SUCCESS builds.sr.ht/patches/alpine.yml    https://builds.sr.ht/~sircmpwn/job/1027222
✓ #1027224 SUCCESS builds.sr.ht/patches/debian.yml    https://builds.sr.ht/~sircmpwn/job/1027224
✗ #1027223 FAILED  builds.sr.ht/patches/archlinux.yml https://builds.sr.ht/~sircmpwn/job/1027223
Ah, indeed. Nice catch, thanks!
Absolutely, thanks. They are mostly enforced by qemu already, but better 
safe than sorry, and especially the requests should be added.
Export patchset (mbox)
How do I use this?

Copy & paste the following snippet into your terminal to import this patchset into git:

curl -s https://lists.sr.ht/~sircmpwn/sr.ht-dev/patches/42856/mbox | git am -3
Learn more about email & git

[PATCH builds.sr.ht 1/4] Add config settings for basic Kubernetes support Export this patch

This commit adds two new config settings for the worker, which, when
set, indicate that the worker and related components (SSH dispatch,
build jobs) are being run in Kubernetes and should adapt their behavior
accordingly. The settings provide the information necessary for the
components to communicate with the individual build jobs.

Actual usage of these settings will have to be implemented in all
components to be usable.

Signed-off-by: Conrad Hoffmann <ch@bitfehler.net>
---
 config.example.ini | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/config.example.ini b/config.example.ini
index 2061fc8..8f384cc 100644
--- a/config.example.ini
+++ b/config.example.ini
@@ -150,5 +150,18 @@ trigger-from=
s3-bucket=
s3-prefix=

# EXPERIMENTAL!
#
# Setting this value assumes the build runner is running in Kubernetes and
# build jobs should be run as Kubernetes batch jobs. The runner will name
# various objects (services, jobs) using this name and appending a "port" number
# (which is an implementation detail).
#k8s-port-prefix=builds-port-
#
# The Kubernetes namespace that build jobs are run in. The build runner must be
# running as a service account that has permissions to manage jobs and services
# in this namespace.
#k8s-jobs-namespace=build-jobs

[meta.sr.ht]
origin=http://meta.sr.ht.local
-- 
2.41.0

[PATCH builds.sr.ht 2/4] worker: basic support for running in Kubernetes Export this patch

This commit adds proper handling of two new config settings for running
the worker in Kubernetes. Based on their value, the worker will take a
different approach when SSHing into build jobs: instead of using the
build port as TCP port it will be used to construct a hostname instead.

These changes do not affect the workers behavior if the respective
config options are not set.

NOTE: 9front is not yet supported

Signed-off-by: Conrad Hoffmann <ch@bitfehler.net>
---
worker/context.go | 28 ++++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/worker/context.go b/worker/context.go
index 4cd1709..3bc0008 100644
--- a/worker/context.go
+++ b/worker/context.go
@@ -272,14 +272,26 @@ func (ctx *JobContext) SSH(args ...string) *exec.Cmd {
			"-h", "127.0.0.1",
			"-Gc", strings.Join(args, " "))
	case "ssh":
		return exec.CommandContext(ctx.Context, "ssh",
			append([]string{"-q",
				"-p", sport,
				"-o", "UserKnownHostsFile=/dev/null",
				"-o", "StrictHostKeyChecking=no",
				"-o", "LogLevel=quiet",
				"build@localhost",
			}, args...)...)
		baseArgs := []string{"-q",
			"-o", "UserKnownHostsFile=/dev/null",
			"-o", "StrictHostKeyChecking=no",
			"-o", "LogLevel=quiet",
		}

		target := "build@localhost"
		portPrefix, useHostBased := config.Get("builds.sr.ht::worker", "k8s-port-prefix")
		if useHostBased {
			ns, ok := config.Get("builds.sr.ht::worker", "k8s-jobs-namespace")
			if ok {
				target = fmt.Sprintf("build@%s%s.%s", portPrefix, sport, ns)
			} else {
				target = fmt.Sprintf("build@%s%s", portPrefix, sport)
			}
			baseArgs = append(baseArgs, target)
		} else {
			baseArgs = append(baseArgs, "-p", sport, target)
		}
		return exec.CommandContext(ctx.Context, "ssh", append(baseArgs, args...)...)
	default:
		panic(errors.New("Unknown login command"))
	}
-- 
2.41.0

[PATCH builds.sr.ht 3/4] runner-shell: basic Kubernetes support Export this patch

This commit handles two new config settings for the worker, which, when
set, indicate that the runner-shell and build jobs are being run in
Kubernetes. The runner-shell will take a slightly different approach
when SSHing into build jobs: instead of using the build port as TCP port
it will be used to construct a hostname instead.

These changes do not affect the current behavior if the respective
config options are not set.

Signed-off-by: Conrad Hoffmann <ch@bitfehler.net>
---
 runner-shell | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/runner-shell b/runner-shell
index 9178b1f..aa2d251 100755
--- a/runner-shell
+++ b/runner-shell
@@ -30,6 +30,8 @@ job_id = int(cmd[1])
cmd = cmd[2:]

bind_address = cfg("builds.sr.ht::worker", "bind-address", "0.0.0.0:8080")
k8s_port_prefix = cfg("builds.sr.ht::worker", "k8s-port-prefix", "")
k8s_jobs_namespace = cfg("builds.sr.ht::worker", "k8s-jobs-namespace", "")

def get_info(job_id):
    r = requests.get(f"http://{bind_address}/job/{job_id}/info")
@@ -67,14 +69,22 @@ def connect(job_id, info):
    except:
        pass # non-interactive
    redis.incr(f"builds.sr.ht-shell-{job_id}")
    subprocess.call([
    port = str(info["port"])
    ssh = [
        "ssh", "-qt",
        "-p", str(info["port"]),
        "-o", "UserKnownHostsFile=/dev/null",
        "-o", "StrictHostKeyChecking=no",
        "-o", "LogLevel=quiet",
        "build@localhost",
    ] + cmd)
    ]
    if k8s_port_prefix:
        target = f"build@{k8s_port_prefix}{port}"
        if k8s_jobs_namespace:
            target = f"{target}.{k8s_jobs_namespace}"
        ssh += [target]
    else:
        ssh += ["-p", port, "build@localhost"]

    subprocess.call(ssh + cmd)
    n = redis.decr(f"builds.sr.ht-shell-{job_id}")
    if n == 0:
        requests.post(f"http://{bind_address}/job/{job_id}/terminate")
-- 
2.41.0

[PATCH builds.sr.ht 4/4] images/control: basic Kubernetes support Export this patch

This commit adds the option to execute build jobs as Kubernetes batch
jobs. This currently requires setting a bunch of variables in
/etc/image-control.conf (some of which have to match certain values from
config.ini).

The qemu Docker image is the same as used by regular setups (see
images/qemu). All the YAML required to glue this all together will be
published soon. The worker itself (who is calling this script) should be
running in-cluster, as a service account that has the required
permissions to manage the namespace for the build jobs.

Signed-off-by: Conrad Hoffmann <ch@bitfehler.net>
---
 images/control | 122 +++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 117 insertions(+), 5 deletions(-)

diff --git a/images/control b/images/control
index b468e37..e02729b 100755
--- a/images/control
+++ b/images/control
@@ -1,6 +1,9 @@
#!/bin/sh -eu
self=$(readlink -f $0)
self=$(dirname "$self")
# The actual images might be in a different place than this script and the meta
# data. If so, $images should be configured via /etc/image-control.conf
images="$self"

if [ -f /etc/image-control.conf ]
then
@@ -21,14 +24,22 @@ ssh_opts="-o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no"
guestport=22

guest_ssh() {
	ssh $ssh_opts "$@"
	if [ "$default_means" = "k8s" ]; then
		# Pretty horrible, but should do until all this gets ported to Go
		ssh -p 22 $ssh_opts -o "Hostname=builds-port-${port}.build-jobs" "$@"
	else
		ssh $ssh_opts "$@"
	fi
}

cpu_opts() {
	if [ "$(uname -m)" = "$1" ] && [ -e /dev/kvm ]
	if [ "$(uname -m)" = "$1" ]
	then
		printf "%s" "-cpu host -enable-kvm"
		return
		if [ -e /dev/kvm ] || [ "$default_means" = "k8s" ]
		then
			printf "%s" "-cpu host -enable-kvm"
			return
		fi
	fi
	case "$1" in
		aarch64)
@@ -47,6 +58,94 @@ cpu_opts() {
	esac
}

_k8s_boot() {
# The following variables have to be set in /etc/image-control.conf:
#
# - $default_means: must be "k8s"
# - $images: path where the actual build images are mounted
# - $k8s_port_prefix: must match `k8s-port-prefix` in the
#   [builds.sr.ht::worker] section of config.ini
# - $k8s_jobs_namespace: must match `k8s-jobs-namespace` in the
#   [builds.sr.ht::worker] section of config.ini
# - $k8s_qemu_image_ref: reference to the QEMU docker image, e.g.
#   registry.example.org/qemu:latest
# - $k8s_build_images_pvc: name of the persistent volume claim for a volume
#   containing the actual build images
# - $k8s_kvm_resource: the name under which a device plugin makes the host's
#   /dev/kvm device available
#
	port_name="${k8s_port_prefix}${port}"
	cat <<EOF | tee /tmp/k8s-$port.id | kubectl apply -f - > /dev/null
apiVersion: batch/v1
kind: Job
metadata:
  name: ${port_name}
  namespace: ${k8s_jobs_namespace}
  labels:
    job: ${port_name}
spec:
  template:
    metadata:
      labels:
        job: ${port_name}
    spec:
      containers:
      - name: qemu
        image: ${k8s_qemu_image_ref}
        command:
        - "/bin/${qemu:-qemu-system-$arch}"
        - "-m"
        - "${MEMORY:-4096}"
        - "-smp"
        - "cpus=2"
        - "-net"
        - "nic,model=virtio"
        - "-net"
        - "user,hostfwd=tcp::22-:$guestport"
        - "-display"
        - "none"
        - "-device"
        - "virtio-rng-pci"
        - "-device"
        - "virtio-balloon"
        - "-drive"
        - "file=$wd/$arch/root.img.qcow2,media=disk,snapshot=on,${driveopts:-if=virtio}"
$(for arg; do printf "        - \"$arg\"\n"; done)
        volumeMounts:
          - name: build-images
            mountPath: /var/lib/images
          - name: tmp
            mountPath: /var/tmp
        resources:
          limits:
            ${k8s_kvm_resource}: 1
      volumes:
        - name: build-images
          persistentVolumeClaim:
            claimName: ${k8s_build_images_pvc}
            readOnly: false
        - name: tmp
          emptyDir:
            medium: Memory
            sizeLimit: 2Gi
      restartPolicy: Never
  backoffLimit: 0
---
apiVersion: v1
kind: Service
metadata:
  name: ${port_name}
  namespace: ${k8s_jobs_namespace}
spec:
  selector:
    job: ${port_name}
  ports:
    - protocol: TCP
      port: 22
      targetPort: 22
EOF
}

_docker_boot() {
	docker run -d \
		-v "$self/$base":/base:ro \
@@ -113,6 +212,9 @@ _boot() {
	if [ "$means" = "docker" ]
	then
		_docker_boot "$@"
	elif [ "$means" = "k8s" ]
	then
		_k8s_boot "$@"
	elif [ "$means" = "qemu" ]
	then
		_qemu_boot "$@"
@@ -133,7 +235,7 @@ cmd_boot() {
	then
		arch="$default_arch"
	fi
	if [ ! -e "$self/$base/$arch/root.img.qcow2" ]
	if [ ! -e "$images/$base/$arch/root.img.qcow2" ]
	then
		printf "Image '%s' is not available for arch '%s'\n" "$base" "$arch" >&2
		exit 1
@@ -150,6 +252,9 @@ cmd_boot() {
	if [ "$means" = "docker" ]
	then
		wd="/base"
	elif [ "$means" = "k8s" ]
	then
		wd="/var/lib/images/$base"
	elif [ "$means" = "qemu" ]
	then
		wd="$self/$base"
@@ -206,6 +311,13 @@ cmd_cleanup() {
			kill $cid || true
			rm -f /tmp/qemu-$port.id
		fi
		if [ -e /tmp/k8s-$port.id ]
		then
			guest_ssh -p $port build@localhost $poweroff_cmd || true
			sleep 2
			kubectl delete --timeout=5s --ignore-not-found=true -f /tmp/k8s-$port.id || true
			rm -f /tmp/k8s-$port.id
		fi
	fi
}

-- 
2.41.0
Found a couple things to point out:
builds.sr.ht/patches: FAILED in 7m23s

[worker: prepare Kubernetes support][0] from [Conrad Hoffmann][1]

[0]: https://lists.sr.ht/~sircmpwn/sr.ht-dev/patches/42856
[1]: mailto:ch@bitfehler.net

✓ #1027222 SUCCESS builds.sr.ht/patches/alpine.yml    https://builds.sr.ht/~sircmpwn/job/1027222
✓ #1027224 SUCCESS builds.sr.ht/patches/debian.yml    https://builds.sr.ht/~sircmpwn/job/1027224
✗ #1027223 FAILED  builds.sr.ht/patches/archlinux.yml https://builds.sr.ht/~sircmpwn/job/1027223