For production clusters, you might want to automatically scale your clusters on demand. Konvoy provides an autoscaling feature that works at the node pool level. Node pools can be configured to define autoscaling properties such as the maximum size and minimum size of the chosen pools. The Konvoy autoscaler can not scale up or down a pool beyond the defined max and min thresholds.
Konvoy uses the cluster-autoscaler community tool for autoscaling.
Add autoscaling capabilities to the “worker” pool
Enabling autoscaling on a cluster consists of the following steps:
First, you need to have a running Konvoy cluster without autoscaling enabled. Follow the instructions in the Install section for your specific cloud provider. Remember that autoscaling is only available on AWS and Azure!
Configure minimum and maximum node pool size
Next, add autoscaling properties to your
cluster.yaml. In the following example, the cluster specification enables autoscaling for the “worker” node pool:
kind: ClusterProvisioner apiVersion: konvoy.mesosphere.io/v1beta2 metadata: name: mycluster spec: provider: aws aws: region: us-west-2 availabilityZones: - us-west-2c nodePools: - name: worker count: 2 machine: type: m5.2xlarge autoscaling: minSize: 2 maxSize: 4 - name: control-plane controlPlane: true count: 3 machine: type: m5.xlarge version: v1.5.0
The worker pool scales up to a maximum of 4 machines and scales down to a minimum of 2 machines. The scaling decisions are based on the usage of the resources and requested resources, as detailed below.
When deploying the cluster for the first time using
konvoy up, you must ensure the initial cluster size satisfies the resource requirements of your addons. The autoscaler will not scale up the cluster if
konvoy up did not succeed. For instance, this can happen when you create an underprovisioned cluster where the addons won’t fit in the cluster, and thereby the installation will fail. A possible workaround could be to firstly run
konvoy deploy kubernetes to deploy the autoscaler, and then run
konvoy deploy addons. Following this approach would allow the cluster to autoscale in order to satisfy the requirements of the selected addons.
Deploy the updated configuration
After the cluster specification is configured, run
konvoy up -y to apply the required
changes to the cluster and to start autoscaling the worker pool on
demand. When the command completes successfully, certain configuration files of the cluster
are stored in Kubernetes.
This keeps the cluster state up-to-date for any change triggered by the autoscaler and prevents the system from having multiple or out-dated cluster specifications.
The following files are stored in Kubernetes when using the autoscaling feature:
cluster.yaml: the cluster specification is stored in Kubernetes in the
konvoynamespace as part of the resource
kubectl get konvoycluster -n konvoy NAMESPACE NAME DISPLAY NAME STATUS PROVIDER AGE PROVISIONING PAUSED konvoy mycluster New Cluster Provisioned aws 6d2h
KonvoyCluster custom Kubernetes resource has the following structure:
apiVersion: kommander.mesosphere.io/v1beta1 kind: KonvoyCluster metadata: generateName: konvoy- annotations: kommander.mesosphere.io/display-name: Some display name spec: cluster: ...Konvoy Kubernetes cluster configuration as in cluster.yaml... provisioner: ...Konvoy provisioner configuration as in cluster.yaml... cloudProviderAccountRef: name: ...name of cloud provider account if created otherwise created by default... provisioningPaused: ...flag to pause any provisioning actions... terraformExtrasProvisionerRef: name: ...Konvoy Terraform extra provisioner files if used (optional)... status: adminConfSecretRef: name: ...kubeconfig to access the installed cluster...
ssh-keys: your ssh credentials are stored in a Kubernetes
konvoywith the name
admin.conf: your kubeconfig file, while present in your working directory, is also stored in the cluster, but present in the current working directory, as
terraform.tfstate: this Terraform file stores the state of the cluster, and is crucial for keeping the infrastructure configuration up-to-date. Konvoy autoscaler pushes this file to Kubernetes, like it does
cluster.yamland other files, to keep the Terraform state up-to-date, for when the Konvoy autoscaler triggers scaling actions.
extras/provisioner: all the Terraform extra files are stored in a Kubernetes
konvoynamespace, if present in the working directory.
When all the cluster states are stored in Kubernetes, users can find all their configurations
konvoy namespace, e.g.
kubectl get all -n konvoy.
The Konvoy autoscaler deploys two pods.
kubectl get pods -n konvoy NAME READY STATUS RESTARTS AGE mycluster-kbk4w 1/1 Running 0 5s mycluster-kubeaddons-konvoy-cluster-autoscaler-55f48c876dp2z9h 1/1 Running 0 98m
To make any future change to the configuration of the Konvoy cluster,
you must use the
konvoy pull command to fetch the required files in your working directory.
konvoy pull -h Pull cluster state Usage: konvoy pull state [flags] Flags: --cluster-name string Name used to prefix the cluster and all the created resources (default "mycluster") -h, --help help for pull --verbose enable debug level logging
konvoy pull Kubernetes cluster state pulled successfully!
konvoy pull fetches the cluster state, and creates or updates the required files to continue
operating the cluster from the working directory.
We recommend using
pull right before making any changes to the Konvoy cluster.
In the future, Konvoy will warn users when there are differences between the Konvoy cluster state in the
working directory and the cluster state in Kubernetes.
In addition to
konvoy pull, Konvoy provides a new command to store the cluster
state in Kubernetes,
konvoy push -h Push cluster state Usage: konvoy push state [flags] Flags: --cloud-provider-account-name string Name of the Cloud Provider Account used to create the resources --force Force push the cluster state -h, --help help for push --verbose enable debug level logging
push command stores the cluster state, on demand, in Kubernetes.
It allows users to specify certain cloud provider credentials that might differ
from those used during the cluster bootstrap operation. This is especially important when using temporary credentials when bootstrapping the cluster.
konvoy push --cloud-provider-account-name=my-specific-aws-cloudaccount Kubernetes cluster state pushed successfully!
apiVersion: kommander.mesosphere.io/v1beta1 kind: CloudProviderAccount metadata: generateName: my-specific-aws-cloudaccount annotations: kommander.mesosphere.io/display-name: Some display name spec: provider: aws credentialsRef: name: ...name of secret created above...
An example of a Kubernetes secret, with the credentials, could be:
apiVersion: v1 kind: Secret metadata: generateName: myawscreds- data: credentials: ...aws credentials file content... config: ...optional aws config file content... type: kommander.mesosphere.io/aws-credentials
Changing an existing autoscaling configuration
In case you would like to change any autoscaling configuration of your cluster, edit
cluster.yaml and adapt the
autoscaling property, then run
konvoy up again to apply the changes.
Autoscaling an air-gapped cluster
In an air-gapped cluster, you need to specify some additional configurations for the autoscaling functionality to work without access to the Internet.
Configuring auto-provisioning with a local Docker registry is mandatory and explained in the air-gapped installation documentation.
You will also need to configure the autoscaler to use a local Helm charts repository running as a pod in the cluster
kind: ClusterConfiguration apiVersion: konvoy.mesosphere.io/v1beta2 metadata: name: clustername spec: autoProvisioning: config: clusterAutoscaler: chartRepo: http://konvoy-addons-chart-repo.kubeaddons.svc:8879
Finally, you must also configure the autoscaler with a static mapping of Kubernetes versions to the Kubernetes Base Addons versions, to prevent it from trying to list tags from the remote Github URL.
kind: ClusterConfiguration apiVersion: konvoy.mesosphere.io/v1beta2 metadata: name: clustername spec: autoProvisioning: config: kubeaddonsRepository: versionStrategy: mapped-kubernetes-version versionMap: 1.17.9: stable-1.17-2.0.2
Putting it all together, the configuration would be as follows:
kind: ClusterConfiguration apiVersion: konvoy.mesosphere.io/v1beta2 metadata: name: clustername spec: autoProvisioning: config: konvoy: imageRepository: 10.0.127.6:5000/mesosphere/konvoy webhook: extraArgs: konvoy.docker-registry-url: https://myregistry:443 #konvoy.docker-registry-insecure-skip-tls-verify: false konvoy.docker-registry-username: "myuser" konvoy.docker-registry-password: "mypassowrd" clusterAutoscaler: chartRepo: http://konvoy-addons-chart-repo.kubeaddons.svc:8879 kubeaddonsRepository: versionStrategy: mapped-kubernetes-version versionMap: 1.17.9: testing-2.0.0-5 ...
Autoscaler scaling decision making
The Konvoy autoscaler scales clusters based on the following conditions:
- there are pods that failed to run in the cluster due to insufficient resources.
- there are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes.
Likewise Konvoy autoscale does not scale clusters under the following conditions:
- All the pods in the
candidatenode to be deleted are:
- Pods with restrictive PodDisruptionBudget.
- are not run on the node by default, *
- don’t have a pod disruption budget set or their PDB is too restrictive.
- Pods that are not backed by a controller object (so not created by deployment, replica set, job, stateful set etc). *
- Pods with local storage. *
- Pods with the
priorityClassName: system-cluster-criticalproperty set on the pod spec, (to prevent your pod from being evicted).
- Pods that cannot be moved elsewhere due to various constraints (lack of resources, non-matching node selectors or affinity, matching anti-affinity, etc).
- Pods that have the following annotation set:
- the Konvoy machine specification does not provide enough resources to schedule the pending applications.
To disable the autoscaling of a node pool, remove the
from the node pool.
The Konvoy autoscaler deletion of nodes in the pools does not follow a specific algorithm, it picks one node from the node pool for deletion. Consequently certain disruptions can be caused when scaling down nodes.
Autoscaler metrics and events
The Konvoy autoscaler provides some custom metrics that are automatically scraped
by our Prometheus addon, if configured. These metrics contain the prefix
and can be searched and consumed by Grafana or the Prometheus console.
Whenever a scaling decision is triggered successfully, a new event is registered
in our respective
KonvoyCluster resource in Kubernetes. These are the two events representing
scaling up and down decisions, as shown below:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ClusterScaleUpSuccess 13m cluster-autoscaler 2 machine(s) added to nodepool "worker" by autoscaler (provider: konvoy) ... Normal ClusterScaleDownSuccess 3m cluster-autoscaler 1 machine removed from nodepool "worker" by autoscaler (provider: konvoy)