That's a story - it was late in the evening, and I was about to leave the office. I saw a group of developers sitting in a corner and doing something frantically with laptops. I go up to the guys and ask: "What has happened?"
A little earlier, at nine o'clock in the evening, one of the developers was going home. He decided: "I'm going to scale my application down to one instance". He pressed key "1@, and the Internet froze a little. He smashed this one critical again, and again, he clicked on the Enter key. He poked everything he could. Then the Internet came to life — and everything began to scale up to 11 111 111 111 111.
True, this story did not take place on Kubernetes. At that time, it was Nomad. It ended with the fact that after an hour of our attempts to stop Nomad from persistent attempts to scale, Nomad stated that he would not stop scaling and would not do anything else. And crashed.
Naturally, I tried to do the same on Kubernetes. Scaling to eleven billion pods did not please Kubernetes; it responded: "I can't. Exceeds internal caps." But it could launch 1,000,000,000 pods. In response to one billion, the Kube did not crash. It started to scale. The further the process went, the more time it took to create new pods. But still, the process went on.
If I can run pods in my namespace indefinitely, then even without requests and limits, I can run several such pods with some tasks that will start exhausting the nodes with the help of these tasks memory and the CPU. When I launch so many pods, information from them should go to the repository, that is, etcd. And when too much information arrives there, the storage begins to respond too slowly - and Kubernetes starts to slow down.
That's a problem - the control elements of Kubernetes are not one central gimmick but several components. There, in particular, there is a controller, a manager, a Scheduler, and so on. All these tools will start doing unnecessary, stupid work simultaneously, which will begin to take more and more time as it goes on. The controller manager will create new pods. The Scheduler will try to find a new node for them. You will likely run out of new nodes in your cluster soon. The Kubernetes cluster will start to run slower and slower.
But I decided to go even further. As you know, Kubernetes has a concept called a Service. By default, the Service is running using IP tables in your clusters. If you run one billion pods, for example, and then use a script to force Kubernetes to create new services:
for i in {1..1111111}; do
kubectl expose deployment test --port 80 \
--overrides="{\"apiVersion\": \"v1\",
\"metadata\": {\"name\": \"nginx$i\"}}";
done
On cluster nodes, more new IPtables rules will be generated almost instantly. Moreover, for each service, one billion IPtables rules will be generated.
I checked this on several thousand pods, up to ten. At this threshold already, gaining SSH access to a node is quite problematic to do. Because the packets, going through a such number of rules, start to perform not very well.
This is all solved with the help of Kubernetes, too. There is a
Resource quota object. It sets the number of available resources and objects for the namespace in the cluster. We can create a yaml object in each namespace of the Kubernetes cluster. Using this object, we can say that we have allocated a certain number of requests, limits for this namespace, and then we can say that it is possible to create ten services and ten pods in this namespace. As a result - within the proper environment developer can press 1 for hours without doing no harm. Kubernetes will tell him: "You cannot scale your pods to that number because the resource quota is exceeded." That's it - the problem is solved.
Documentation is here.
One problematic point emerges in this regard. You can now feel how difficult it is to create a namespace in Kubernetes. To make it, we need to consider a bunch of things.
Resource quota + Limit Range + RBAC
- Create a namespace
- Create a LimitRange inside
- Create ResourceQuota inside
- Create a Service account for CI
- Create Rolebinding for CI and users
- Optionally launch the required service pods
Therefore, taking this opportunity, I would like to share my developments. There is such an item called the SDK operator. This is the way in the Kubernetes cluster to write operators for it using Ansible.
At first, we had it written in Ansible, and then I looked to see what the SDK operator is and rewrote the Ansible role into the operator. This operator allows you to create an object in the Kubernetes cluster called a command. Within a command, it will enable you to describe the environment for that command in yaml. Within the team's environment, it allows us to explain that we allocate this many resources and no more -
find it here.