Last year, I was reading about PSP deprecation and started wondering what could be the solutions in the future. Fortunately, there are already several policy engines like OPA Gatekeeper and Kyverno available.
With the help of the policy engine, not only we can ensure workloads are compliant with selected, predefined rules, but also achieved custom company policies like:
- Schedule workloads to spot instances based on certain criteria for better cost-saving, and put delicate ones to on-demand instances.
- Add
preStop
hooks for containers that have ports open (like ingress-nginx!)1. - Patch image version to leverage local cache and speed things up (e.g., fixed version for
amazon/aws-cli
). - Restrict home-made services exposing endpoints that are not ready at the moment (
publishNotReadyAddresses
). - Restrict service load balancers.
- Restrict modification on ingress’ annotations that tries to use an arbitrary proxy buffer size.
- …and many, many more, without any other users’ interventions and/or modifications.
Policy engines are just fascinating. I also learned a few things from it by making my own admission webhooks. You should be able to achieve most of the requirements by policy engines alone, though.
Policies are meant to be enforeced. Documents and meetings alone just won’t stop bad use of Kubernetes (intentionally or unintentionally).
Your cluster, your rules2.
Jobs
that just won’t go away#
We constantly maintain and improve our cluster policies because operation issues never end.
Recently, I noticed more and more dangling jobs are floating around and keep increasing. Apparently, these jobs were created directly (no matter by service or user instead of by Controller like CronJob
).
It’s not necessarily a bad practice – but the Completed
(or Error
ed ) jobs just won’t disappear.
Fortunately, there is a TTL-after-finished Controller that can help.
To quote from the enhancement proposal:
Motivation
… it’s difficult for the users to clean them up automatically, and those Jobs and Pods can accumulate and overload a Kubernetes cluster very easily.
User Stories
The users keep creating Jobs in a small Kubernetes cluster with 4 nodes. The Jobs accumulates over time, and 1 year later, the cluster ended up with more than 100k old Jobs. This caused etcd hiccups, long high latency etcd requests, and eventually made the cluster unavailable.
The situation of our clusters is definitely nowhere close to 100k at this point. But I’ve seen 3k finished jobs in a really small cluster before, and that already made me feel terrified.
The answer to this problem seems very straightforward: just add .spec.ttlSecondsAfterFinished
to your Job
and it’s done.
But is it really that “happily ever after”?
Yes and no. You can’t expect everyone who directly creates a Job
will always puts that field. So what should we do now?
Since we are in a post talking about policy engine, so yeah, let’s leverage policy engine.
We will set .spec.ttlSecondsAfterFinished
to a Job
whenever there is no .metadata.ownerReferences
defined (i.e. It’s not created by controller like CronJob
).
Prerequisites#
- Kubernetes >= 1.12
- The TTL-after-finished Controller’s feature state is
alpha
in 1.12,beta
in 1.21, andstable
in 1.23. - If you are using Amazon EKS like me, features are only available after they enter
beta
feature state. That is, you can only use TTL-after-finished Controller on Amazon EKS >= 1.21.
- The TTL-after-finished Controller’s feature state is
- Your policy engine of choice.
Here we will use Kyverno’s ClusterPolicy
as an example, but you should be able to implement with any other solutions on the market.
Example ClusterPolicy
for Kyverno#
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
annotations:
policies.kyverno.io/title: Add TTL to dangling Job
policies.kyverno.io/category: The cool company policy collection
policies.kyverno.io/description: >-
Automatically clean dangling jobs by adding TTL to spec.
name: add-ttl-to-dangling-job
spec:
background: false
failurePolicy: Ignore
validationFailureAction: enforce
rules:
- name: add-ttl-to-dangling-job
match:
resources:
kinds:
# We only deal with Job in this policy
- "Job"
preconditions:
any:
# If the Job is created by CronJob, it will have ".metadata.ownerReferences" field,
# which is an array. But the exactly value doesn't really matter here.
# We just want to know whether this field exists.
#
# The following line is saying:
# If there is no ".metadata.ownerReferences", fallback to an empty string (''),
- key: "{{ request.object.metadata.ownerReferences || '' }}"
operator: Equals
# And if the value is empty string, means there is no ".metadata.ownerReferences".
# That's the kind of Job we want to set ".spec.ttlSecondsAfterFinished"
value: ''
mutate:
patchStrategicMerge:
spec:
# Add ".spec.ttlSecondsAfterFinished" (only when it's not specified),
# so the Job will be deleted 15 minutes after completion.
# Set to the value you want.
+(ttlSecondsAfterFinished): 900
Policy engine in action#
It’s important to validate whether the policy actually works; let’s leverage k3d again.
Start k3d#
$ k3d cluster create
# ...omitted
NFO[0008] Starting Node 'k3d-k3s-default-serverlb'
INFO[0015] Injecting records for hostAliases (incl. host.k3d.internal) and for 3 network members into CoreDNS configmap...
INFO[0017] Cluster 'k3s-default' created successfully!
# ...omitted
Install Kyverno with Helm Chart#
$ helm repo add kyverno https://kyverno.github.io/kyverno/
$ helm repo update
$ helm install kyverno kyverno/kyverno --namespace kyverno --create-namespace
NAME: kyverno
LAST DEPLOYED: Sun Jul 10 01:25:23 2022
# ...omitted
Thank you for installing kyverno! Your release is named kyverno.
Apply the ClusterPolicy
#
First, save the ClusterPolicy above as a file, e.g. add-ttl-to-dangling-job.yaml
.
$ kubectl apply -f add-ttl-to-dangling-job.yaml
# Or if you are feeling lazy, use the following command:
# Caution: Always check what's in the file first before applying anything!
$ kubectl apply -f https://blog.wtcx.dev/2022/07/09/automatically-clean-up-dangling-jobs-with-policy-engine/add-ttl-to-dangling-job.yaml
Create Job
directly#
You can use the Job example from Kubernetes’ document:
$ kubectl apply -f https://kubernetes.io/examples/controllers/job.yaml
job.batch/pi created
Examine the Job
#
$ kubectl get job pi -o yaml
apiVersion: batch/v1
kind: Job
metadata:
# There is no ".ownerReferences" under "metadata".
annotations:
# ...omitted
# You can see the modifications done by Kyverno here
policies.kyverno.io/last-applied-patches: |
add-ttl-to-dangling-job.add-ttl-to-dangling-job.kyverno.io: added /spec/ttlSecondsAfterFinished
# ...omitted
name: pi
namespace: default
# ...omitted
spec:
# ...omitted
# The following field is added by the ClusterPolicy
ttlSecondsAfterFinished: 900
So, from what we can see here, we know the ClusterPolicy actually works as expected.
Now, let’s make sure Kyverno doesn’t touch the Job created by CronJob
.
Create a CronJob
#
Again, let’s just use CronJob example from Kubernetes’ document:
$ kubectl apply -f https://kubernetes.io/examples/application/job/cronjob.yaml
cronjob.batch/hello created
Examine the Job
(created by CronJob
)#
The Job created by CronJob will be named differently with some suffix. Get the name first.
$ kubectl get job
pi 1/1 36s 10m
hello-27623144 1/1 8s 32s
$ kubectl get job hello-27623144 -o yaml
apiVersion: batch/v1
kind: Job
metadata:
annotations:
# There is no Kyverno's annotation
batch.kubernetes.io/job-tracking: ""
# ...omitted
name: hello-27623144
namespace: default
# This is the ".metadata.ownerReferences" we kept talking about before!
ownerReferences:
- apiVersion: batch/v1
blockOwnerDeletion: true
controller: true
kind: CronJob
name: hello
uid: d4910a7c-dc57-4563-8611-e6f58a1cb5e1
# ...omitted
spec:
# You won't see the ".spec.ttlSecondsAfterFinished" field here.
backoffLimit: 6
completionMode: NonIndexed
completions: 1
parallelism: 1
selector:
matchLabels:
controller-uid: 3dbaeff8-3163-4b32-9946-ececad06e965
suspend: false
template:
# ...omitted
Cleanup#
$ k3d cluster delete
Clean dangling Jobs
manually for the one last time#
The following gives you the idea of which Jobs
are not owned by higher-level controllers:
$ kubectl get job -o json -A | jq -r '.items[] | select(.metadata.ownerReferences == null and .status.active == null) | .metadata.name'
To delete these Jobs
:
$ kubectl get job -o json -A | jq -r '.items[] | select(.metadata.ownerReferences == null and .status.active == null) | "kubectl delete job -n " + .metadata.namespace + " " + .metadata.name' | xargs -I {} bash -c "{}"
Conclusion#
It’s pretty common that users are not aware of the potential issues like massive dangling jobs.
However, problems are normally caused by the area where no one pays attention. At the end of day, it’s still admin’s job (no pun intended) to make sure things run as smooth as possible.