Cluster API Operator
The Cluster API Operator is a Kubernetes Operator designed to empower cluster administrators to handle the lifecycle of Cluster API providers within a management cluster using a declarative approach. It aims to improve user experience in deploying and managing Cluster API, making it easier to handle day-to-day tasks and automate workflows with GitOps.
This operator leverages a declarative API and extends the capabilities of the clusterctl
CLI, allowing greater flexibility and configuration options for cluster administrators.
Features
- Offers a declarative API that simplifies the management of Cluster API providers and enables GitOps workflows.
- Facilitates provider upgrades and downgrades making it more convenient for distributed teams and CI pipelines.
- Aims to support air-gapped environments without direct access to GitHub/GitLab.
- Leverages controller-runtime configuration API for a more flexible Cluster API providers setup.
- Provides a transparent and effective way to interact with various Cluster API components on the management cluster.
Getting started
User guide
This section contains quick start and concepts relevant to a new operator user.
Concepts
CoreProvider
A component responsible for providing the fundamental building blocks of the Cluster API. It defines and implements the main Cluster API resources such as Clusters, Machines, and MachineSets, and manages their lifecycle. This includes:
- Defining the main Cluster API resources and their schemas.
- Implementing the logic for creating, updating, and deleting these resources.
- Managing the overall lifecycle of Clusters, Machines, and MachineSets.
- Providing the base upon which other providers like BootstrapProvider and InfrastructureProvider build.
BootstrapProvider
A component responsible for turning a server into a Kubernetes node as well as for:
- Generating the cluster certificates, if not otherwise specified
- Initializing the control plane, and gating the creation of other nodes until it is complete
- Joining control plane and worker nodes to the cluster
ControlPlaneProvider
A component responsible for managing the control plane of a Kubernetes cluster. This includes:
- Provisioning the control plane nodes.
- Managing the lifecycle of the control plane, including upgrades and scaling.
InfrastructureProvider
A component responsible for the provisioning of infrastructure/computational resources required by the Cluster or by Machines (e.g. VMs, networking, etc.). For example, cloud Infrastructure Providers include AWS, Azure, and Google, and bare metal Infrastructure Providers include VMware, MAAS, and metal3.io.
AddonProvider
A component that extends the functionality of Cluster API by providing a solution for managing the installation, configuration, upgrade, and deletion of Cluster add-ons using Helm charts.
IPAMProvider
A component that manages pools of IP addresses using Kubernetes resources. It serves as a reference implementation for IPAM providers, but can also be used as a simple replacement for DHCP.
Quickstart
This is a quickstart guide for getting Cluster API Operator up and running on your Kubernetes cluster.
For more detailed information, please refer to the full documentation.
Prerequisites
- Running Kubernetes cluster.
- kubectl for interacting with the management cluster.
- Cert Manager for managing operator certificates.
- Helm for installing operator on the cluster (optional).
Install and configure Cluster API Operator
Configuring credential for cloud providers
Instead of using environment variables as clusterctl does, Cluster API Operator uses Kubernetes secrets to store credentials for cloud providers. Refer to provider documentation on which credentials are required.
This example uses AWS provider, but the same approach can be used for other providers.
export CREDENTIALS_SECRET_NAME="credentials-secret"
export CREDENTIALS_SECRET_NAMESPACE="default"
kubectl create secret generic "${CREDENTIALS_SECRET_NAME}" --from-literal=AWS_B64ENCODED_CREDENTIALS="${AWS_B64ENCODED_CREDENTIALS}" --namespace "${CREDENTIALS_SECRET_NAMESPACE}"
Installing Cluster API Operator
Add CAPI Operator & cert manager helm repository:
helm repo add capi-operator https://kubernetes-sigs.github.io/cluster-api-operator
helm repo add jetstack https://charts.jetstack.io --force-update
helm repo update
Install cert manager:
helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --set installCRDs=true
Deploy Cluster API components with docker provider using a single command during operator installation.
helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system --set infrastructure=docker --set configSecret.name=${CREDENTIALS_SECRET_NAME} --set configSecret.namespace=${CREDENTIALS_SECRET_NAMESPACE} --wait --timeout 90s
Docker provider can be replaced by any provider supported by clusterctl.
Other options for installing Cluster API Operator are described in installation documentation.
Example API Usage
Deploy latest version of core Cluster API components:
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: CoreProvider
metadata:
name: cluster-api
namespace: capi-system
Deploy Cluster API AWS provider with specific version, custom manager options and flags:
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: InfrastructureProvider
metadata:
name: aws
namespace: capa-system
spec:
version: v2.1.4
configSecret:
name: aws-variables
Installation
This section describes cluster-api-operator
components installation instructions.
Prerequisites
Before installing the Cluster API Operator, you must first ensure that cert-manager is installed, as the operator does not manage cert-manager installations. To install cert-manager, run the following command:
kubectl apply -f https://github.com/jetstack/cert-manager/releases/latest/download/cert-manager.yaml
Wait for cert-manager to be ready before proceeding.
After cert-manager is successfully installed, you can proceed installing the Cluster API operator.
Plugin installation
The cluster-api-operator
plugin can be installed using krew, the kubectl plugin manager.
Prerequisites
krew installed on your system. See the krew installation guide for instructions.
Steps
- Add the cluster-api-operator plugin index to krew:
kubectl krew index add operator https://github.com/kubernetes-sigs/cluster-api-operator.git
- Install the cluster-api-operator plugin:
kubectl krew install operator/clusterctl-operator
- Verify the installation:
kubectl operator
This should print help information for the kubectl operator plugin.
The cluster-api-operator
plugin is now installed and ready to use with kubectl
.
Optionally: installing as a clusterctl
plugin
Typically the plugin is installed under ~/.krew/bin/kubectl-operator
, which would be present under your $PATH
after correct krew
installation. If you want to use plugin with clusterctl
, you need to rename this file to be prefixed with clusterctl-
instead, like so:
cp ~/.krew/bin/kubectl-operator ~/.krew/bin/clusterctl-operator
After that plugin is available to use as a clusterctl
plugin:
clusterctl operator --help
Upgrade
To upgrade your plugin with the new release of cluster-api-operator
you will need to run:
kubectl krew upgrade
Using Manifests from Release Assets
You can install the Cluster API operator directly by applying the latest release assets:
kubectl apply -f https://github.com/kubernetes-sigs/cluster-api-operator/releases/latest/download/operator-components.yaml
Using Helm Charts
Alternatively, you can install the Cluster API operator using Helm charts:
helm repo add capi-operator https://kubernetes-sigs.github.io/cluster-api-operator
helm repo update
helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system
Installing providers using Helm chart
The operator Helm chart supports a "quickstart" option for bootstrapping a management cluster. The user experience is relatively similar to clusterctl init:
helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system --set infrastructure=docker:v1.4.2 --wait --timeout 90s # core Cluster API with kubeadm bootstrap and control plane providers will also be installed
helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system —set infrastructure="docker;azure" --wait --timeout 90s # core Cluster API with kubeadm bootstrap and control plane providers will also be installed
helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system —set infrastructure="capd-custom-ns:docker:v1.4.2;capz-custom-ns:azure:v1.10.0" --wait --timeout 90s # core Cluster API with kubeadm bootstrap and control plane providers will also be installed
helm install capi-operator capi-operator/cluster-api-operator --create-namespace -n capi-operator-system --set core=cluster-api:v1.4.2 --set controlPlane=kubeadm:v1.4.2 --set bootstrap=kubeadm:v1.4.2 --set infrastructure=docker:v1.4.2 --wait --timeout 90s
For more complex operations, please refer to our API documentation.
Topics
This section contains information about enabling and configuring various features of Cluster API Operator.
Cluster API Provider Lifecycle
This section contains lifecycle operations a user can perform on a provider manifest, such as:
- Install
- Upgrade
- Modify
- Delete
Installing a Provider
To install a new Cluster API provider with the Cluster API Operator, create a provider object as shown in the first example API usage for creating the secret with variables and the provider itself.
The operator processes a provider object by applying the following rules:
- The CoreProvider is installed first; other providers will be requeued until the core provider exists.
- Before installing any provider, the following pre-flight checks are executed:
- No other instance of the same provider (same Kind, same name) should exist in any namespace.
- The Cluster API contract (e.g., v1beta1) must match the contract of the core provider.
- The operator sets conditions on the provider object to surface any installation issues, including pre-flight checks and/or order of installation.
- If the FetchConfiguration is not defined, the operator applies the embedded fetch configuration for the given kind and
ObjectMeta.Name
specified in the Cluster API code.
The installation process, managed by the operator, aligns with the implementation underlying the clusterctl init
command and includes these steps:
- Fetching provider artifacts (the components.yaml and metadata.yaml files).
- Applying image overrides, if any.
- Replacing variables in the infrastructure-components from EnvVar and Secret.
- Applying the resulting YAML to the cluster.
Differences between the operator and clusterctl init
include:
- The operator installs one provider at a time while
clusterctl init
installs a group of providers in a single operation. - The operator stores fetched artifacts in a config map for reuse during subsequent reconciliations.
- The operator uses a Secret, while
clusterctl init
relies on environment variables and a local configuration file.
Upgrading a Provider
To trigger an upgrade for a Cluster API provider, change the spec.Version
field. All providers must follow the golden rule of respecting the same Cluster API contract supported by the core provider.
The operator performs the upgrade by:
- Deleting the current provider components, while preserving CRDs, namespaces, and user objects.
- Installing the new provider components.
Differences between the operator and clusterctl upgrade apply
include:
- The operator upgrades one provider at a time while
clusterctl upgrade apply
upgrades a group of providers in a single operation. - With the declarative approach, users are responsible for manually editing the Provider objects' YAML, while
clusterctl upgrade apply --contract
automatically determines the latest available versions for each provider.
Modifying a Provider
In addition to changing a provider version (upgrades), the operator supports modifying other provider fields such as controller flags and variables. This can be achieved through kubectl edit
or kubectl apply
to the provider object.
The operation works similarly to upgrades: The current provider instance is deleted while preserving CRDs, namespaces, and user objects. Then, a new provider instance with the updated flags/variables is installed.
Note: clusterctl
currently does not support this operation.
Deleting a Provider
To remove the installed providers and all related kubernetes objects just delete the following CRs:
kubectl delete infrastructureprovider azure
kubectl delete coreprovider cluster-api
Configuration
This section contains a list of frequent configuration tasks for CAPI Operator providers.
Air-gapped Environment
To install Cluster API providers in an air-gapped environment using the operator, address the following issues:
- Configure the operator for an air-gapped environment:
- Manually fetch and store a helm chart for the operator.
- Provide image overrides for the operator in from an accessible image repository.
- Configure providers for an air-gapped environment:
- Provide fetch configuration for each provider from an accessible location (e.g., an internal GitHub repository) or from pre-created ConfigMaps within the cluster.
- Provide image overrides for each provider to pull images from an accessible image repository.
Example Usage:
As an admin, I need to fetch the Azure provider components from within the cluster because I am working in an air-gapped environment.
In this example, there is a ConfigMap in the capz-system
namespace that defines the components and metadata of the provider.
The Azure InfrastructureProvider is configured with a fetchConfig
specifying the label selector, allowing the operator to determine the available versions of the Azure provider. Since the provider's version is marked as v1.9.3
, the operator uses the components information from the ConfigMap with matching label to install the Azure provider.
---
apiVersion: v1
kind: ConfigMap
metadata:
labels:
provider-components: azure
name: v1.9.3
namespace: capz-system
data:
components: |
# Components for v1.9.3 YAML go here
metadata: |
# Metadata information goes here
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: InfrastructureProvider
metadata:
name: azure
namespace: capz-system
spec:
version: v1.9.3
configSecret:
name: azure-variables
fetchConfig:
selector:
matchLabels:
provider-components: azure
Situation when manifests do not fit into configmap
There is a limit on the maximum size of a configmap - 1MiB. If the manifests do not fit into this size, Kubernetes will generate an error and provider installation fail. To avoid this, you can archive the manifests and put them in the configmap that way.
For example, you have two files: components.yaml
and metadata.yaml
. To create a working config map you need:
- Archive components.yaml using
gzip
cli tool
gzip -c components.yaml > components.gz
- Create a configmap manifest from the archived data
kubectl create configmap v1.9.3 --namespace=capz-system --from-file=components=components.gz --from-file=metadata=metadata.yaml --dry-run=client -o yaml > configmap.yaml
- Edit the file by adding "provider.cluster.x-k8s.io/compressed: true" annotation
yq eval -i '.metadata.annotations += {"provider.cluster.x-k8s.io/compressed": "true"}' configmap.yaml
Note: without this annotation operator won't be able to determine if the data is compressed or not.
- Add labels that will be used to match the configmap in
fetchConfig
section of the provider
yq eval -i '.metadata.labels += {"my-label": "label-value"}' configmap.yaml
- Create a configmap in your kubernetes cluster using kubectl
kubectl create -f configmap.yaml
Injecting additional manifests
It is possible to inject additional manifests when installing/upgrading a provider. This can be useful when you need to add extra RBAC resources to the provider controller, for example.
The field AdditionalManifests
is a reference to a ConfigMap that contains additional manifests, which will be applied together with the provider components. The key for storing these manifests has to be manifests
.
The manifests are applied only once when a certain release is installed/upgraded. If the namespace is not specified, the namespace of the provider will be used. There is no validation of the YAML content inside the ConfigMap.
---
apiVersion: v1
kind: ConfigMap
metadata:
name: additional-manifests
namespace: capi-system
data:
manifests: |
# Additional manifests go here
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: CoreProvider
metadata:
name: cluster-api
namespace: capi-system
spec:
additionalManifests:
name: additional-manifests
Examples of API Usage
In this section we provide some concrete examples of CAPI Operator API usage for various use-cases.
- As an admin, I want to install the aws infrastructure provider with specific controller flags.
apiVersion: v1
kind: Secret
metadata:
name: aws-variables
namespace: capa-system
type: Opaque
data:
AWS_B64ENCODED_CREDENTIALS: ...
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: InfrastructureProvider
metadata:
name: aws
namespace: capa-system
spec:
version: v2.1.4
configSecret:
name: aws-variables
manager:
# These top level controller manager flags, supported by all the providers.
# These flags come with sensible defaults, thus requiring no or minimal
# changes for the most common scenarios.
metrics:
bindAddress: ":8181"
syncPeriod: "500s"
fetchConfig:
url: https://github.com/kubernetes-sigs/cluster-api-provider-aws/releases
deployment:
containers:
- name: manager
args:
# These are controller flags that are specific to a provider; usage
# is reserved for advanced scenarios only.
"--awscluster-concurrency": "12"
"--awsmachine-concurrency": "11"
- As an admin, I want to install aws infrastructure provider but override the container image of the CAPA deployment.
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: InfrastructureProvider
metadata:
name: aws
namespace: capa-system
spec:
version: v2.1.4
configSecret:
name: aws-variables
deployment:
containers:
- name: manager
imageUrl: "gcr.io/myregistry/capa-controller:v2.1.4-foo"
- As an admin, I want to change the resource limits for the manager pod in my control plane provider deployment.
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: ControlPlaneProvider
metadata:
name: kubeadm
namespace: capi-kubeadm-control-plane-system
spec:
version: v1.4.3
configSecret:
name: capi-variables
deployment:
containers:
- name: manager
resources:
limits:
cpu: 100m
memory: 30Mi
requests:
cpu: 100m
memory: 20Mi
- As an admin, I would like to fetch my azure provider components from a specific repository which is not the default.
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: InfrastructureProvider
metadata:
name: myazure
namespace: capz-system
spec:
version: v1.9.3
configSecret:
name: azure-variables
fetchConfig:
url: https://github.com/myorg/awesome-azure-provider/releases
- As an admin, I would like to use the default fetch configurations by simply specifying the expected Cluster API provider names such as
aws
,vsphere
,azure
,kubeadm
,talos
, orcluster-api
instead of having to explicitly specify the fetch configuration. In the example below, since we are using 'vsphere' as the name of the InfrastructureProvider the operator will fetch it's configuration fromurl: https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/releases
by default.
See more examples in the air-gapped environment section
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: InfrastructureProvider
metadata:
name: vsphere
namespace: capv-system
spec:
version: v1.6.1
configSecret:
name: vsphere-variables
Patching provider manifests
Provider manifests can be patched using JSON merge patches. This can be useful when you need to modify the provider manifests that are fetched from the repository. In order to provider
manifests spec.ResourcePatches
has to be used where an array of patches can be specified:
---
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: CoreProvider
metadata:
name: cluster-api
namespace: capi-system
spec:
resourcePatches:
- |
apiVersion: v1
kind: Service
metadata:
labels:
test-label: test-value
More information about JSON merge patches can be found here https://datatracker.ietf.org/doc/html/rfc7396
There are couple of rules for the patch to match a manifest:
- The
kind
field must match the target object. - If
apiVersion
is specified it will only be applied to matching objects. - If
metadata.name
andmetadata.namespace
not specified, the patch will be applied to all objects of the specified kind. - If
metadata.name
is specified, the patch will be applied to the object with the specified name. This is for cluster scoped objects. - If both
metadata.name
andmetadata.namespace
are specified, the patch will be applied to the object with the specified name and namespace.
Provider Spec
-
ProviderSpec
: desired state of the Provider, consisting of:- Version (string): provider version (e.g., "v0.1.0")
- Manager (optional ManagerSpec): controller manager properties for the provider
- Deployment (optional DeploymentSpec): deployment properties for the provider
- ConfigSecret (optional SecretReference): reference to the config secret
- FetchConfig (optional FetchConfiguration): how the operator will fetch components and metadata
YAML example:
... spec: version: "v0.1.0" manager: maxConcurrentReconciles: 5 deployment: replicas: 1 configSecret: name: "provider-secret" fetchConfig: url: "https://github.com/owner/repo/releases" ...
-
ManagerSpec
: controller manager properties for the provider, consisting of:- ProfilerAddress (optional string): pprof profiler bind address (e.g., "localhost:6060")
- MaxConcurrentReconciles (optional int): maximum number of concurrent reconciles
- Verbosity (optional int): logs verbosity
- FeatureGates (optional map[string]bool): provider specific feature flags
YAML example:
... spec: manager: profilerAddress: "localhost:6060" maxConcurrentReconciles: 5 verbosity: 1 featureGates: FeatureA: true FeatureB: false ...
-
DeploymentSpec
: deployment properties for the provider, consisting of:- Replicas (optional int): number of desired pods
- NodeSelector (optional map[string]string): node label selector
- Tolerations (optional []corev1.Toleration): pod tolerations
- Affinity (optional corev1.Affinity): pod scheduling constraints
- Containers (optional []ContainerSpec): list of deployment containers
- ServiceAccountName (optional string): pod service account
- ImagePullSecrets (optional []corev1.LocalObjectReference): list of image pull secrets specified in the Deployment
YAML example:
... spec: deployment: replicas: 2 nodeSelector: disktype: ssd tolerations: - key: "example" operator: "Exists" effect: "NoSchedule" affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "example" operator: "In" values: - "true" containers: - name: "containerA" imageUrl: "example.com/repo/image-name:v1.0.0" args: exampleArg: "value" ...
-
ContainerSpec
: container properties for the provider, consisting of:- Name (string): container name
- ImageURL (optional string): container image URL
- Args (optional map[string]string): extra provider specific flags
- Env (optional []corev1.EnvVar): environment variables
- Resources (optional corev1.ResourceRequirements): compute resources
- Command (optional []string): override container's entrypoint array
YAML example:
... spec: deployment: containers: - name: "example-container" imageUrl: "example.com/repo/image-name:v1.0.0" args: exampleArg: "value" env: - name: "EXAMPLE_ENV" value: "example-value" resources: limits: cpu: "1" memory: "1Gi" requests: cpu: "500m" memory: "500Mi" command: - "/bin/bash" ...
-
FetchConfiguration
: components and metadata fetch options, consisting of:- URL (optional string): URL for remote Github repository releases (e.g., "https://github.com/owner/repo/releases")
- Selector (optional metav1.LabelSelector): label selector to use for fetching provider components and metadata from ConfigMaps stored in the cluster
YAML example:
... spec: fetchConfig: url: "https://github.com/owner/repo/releases" selector: matchLabels: ...
-
SecretReference
: pointer to a secret object, consisting of:
-
Name (string): name of the secret
-
Namespace (optional string): namespace of the secret, defaults to the provider object namespace
YAML example:
... spec: configSecret: name: capa-secret namespace: capa-system ...
Deleting providers
To remove all installed providers and all related kubernetes objects just delete the following CRs:
kubectl delete coreprovider --all --all-namespaces
kubectl delete infrastructureprovider --all --all-namespaces
kubectl delete bootstrapprovider --all --all-namespaces
kubectl delete controlplaneprovider --all --all-namespaces
kubectl delete ipamprovider --all --all-namespaces
kubectl delete addonprovider --all --all-namespaces
Basic Cluster API provider installation
This section provides an example to a CAPZ provider installation.
Installing the CoreProvider
The first step is to install the CoreProvider, which is responsible for managing the Cluster API CRDs and the Cluster API controller.
You can utilize any existing namespace for providers in your Kubernetes operator. However, before creating a provider object, make sure the specified namespace has been created. In the example below, we use the capi-system
namespace. You can create this namespace through either the Command Line Interface (CLI) by running kubectl create namespace capi-system
, or by using the declarative approach described in the official Kubernetes documentation.
Example:
apiVersion: operator.cluster.x-k8s.io/v1alpha2
kind: CoreProvider
metadata:
name: cluster-api
namespace: capi-system
spec:
version: v1.4.3
Note: Only one CoreProvider can be installed at the same time on a single cluster.
Installing Azure Infrastructure Provider
Next, install Azure Infrastructure Provider. Before that ensure that capz-system
namespace exists.
Since the provider requires variables to be set, create a secret containing them in the same namespace as the provider. It is also recommended to include a github-token
in the secret. This token is used to fetch the provider repository, and it is required for the provider to be installed. The operator may exceed the rate limit of the GitHub API without the token. Like clusterctl, the token needs only the repo
scope.
---
apiVersion: v1
kind: Secret
metadata:
name: azure-variables
namespace: capz-system
type: Opaque
stringData:
AZURE_CLIENT_ID_B64: Zm9vCg==
AZURE_CLIENT_SECRET_B64: Zm9vCg==
AZURE_SUBSCRIPTION_ID_B64: Zm9vCg==
AZURE_TENANT_ID_B64: Zm9vCg==
github-token: ghp_fff
---
apiVersion: operator.cluster.x-k8s.io/v1alpha1
kind: InfrastructureProvider
metadata:
name: azure
namespace: capz-system
spec:
version: v1.9.3
configSecret:
name: azure-variables
Developer
This section contains regular developer tasks, such as:
- Release
- Development guide
- Version migration
Releasing New Versions
This document describes the release process for the Cluster API Operator.
- Create a new release branch and cut a release tag.
git checkout -b release-0.1
git push -u upstream release-0.1
# Export the tag of the release to be cut, e.g.:
export RELEASE_TAG=v0.1.1
# Create tags locally
# Warning: The test tag MUST NOT be an annotated tag.
git tag -s -a ${RELEASE_TAG} -m ${RELEASE_TAG}
git tag test/${RELEASE_TAG}
# Push tags
# Note: `upstream` must be the remote pointing to `github.com/kubernetes-sigs/cluster-api-operator`.
git push upstream ${RELEASE_TAG}
git push upstream test/${RELEASE_TAG}
Note: You may encounter an ioctl error during tagging. To resolve this, you need to set the GPG_TTY environment variable as export GPG_TTY=$(tty)
.
This will trigger a release GitHub action that creates a release with operator components and the Helm chart. Concurrently, a Prow job will start to publish operator images to the staging registry.
-
Wait for the images to appear in the staging registry.
-
Create a GitHub Personal access token if you don't already have one. We're going to use this for opening a PR to promote the images from staging to production.
export GITHUB_TOKEN=<your GH token>
export USER_FORK=<your GH account name>
make promote-images
After it has been tested, merge the PR and verify that the image is present in the production registry.
docker pull registry.k8s.io/capi-operator/cluster-api-operator:${RELEASE_TAG}
- Switch back to the main branch and update
index.yaml
andclusterctl-operator.yaml
. These are the sources for the operator Helm chart repository and the local krew plugin manifest index, respectively.
git checkout main
make update-helm-plugin-repo
- Create a PR with the changes.
Setup jobs and dashboards for a new release branch
The goal of this task is to have test coverage for the new release branch and results in testgrid. We are currently running CI jobs only in main and latest stable release branch (i.e release-0.5 will be used as an example below) and all configurations are hosted in test-infra repo.
- Create new jobs based on the jobs running against our
main
branch:- Copy
test-infra/config/jobs/kubernetes-sigs/cluster-api-operator/cluster-api-operator-periodics-main.yaml
totest-infra/config/jobs/kubernetes-sigs/cluster-api-operator/cluster-api-operator-periodics-release-0-5.yaml
. - Copy
test-infra/config/jobs/kubernetes-sigs/cluster-api-operator/cluster-api-operator-presubmits-main.yaml
totest-infra/config/jobs/kubernetes-sigs/cluster-api-operator/cluster-api-operator-presubmits-release-0-5.yaml
. - Modify the following:
- Rename the jobs, e.g.:
periodic-cluster-api-operator-test-main
=>periodic-cluster-api-operator-test-release-0-5
. - Change
annotations.testgrid-dashboards
tosig-cluster-lifecycle-cluster-api-operator-0.5
. - Change
annotations.testgrid-tab-name
, e.g.capi-operator-test-main
=>capi-operator-test-release-0-5
. - For periodics additionally:
- Change
extra_refs[].base_ref
torelease-0.5
(for repo:cluster-api-operator
).
- Change
- For presubmits additionally: Adjust branches:
^main$
=>^release-0.5$
.
- Rename the jobs, e.g.:
- Copy
- Create a new dashboard for the new branch in:
test-infra/config/testgrids/kubernetes/sig-cluster-lifecycle/config.yaml
(dashboard_groups
anddashboards
).- Add a new entry
sig-cluster-lifecycle-cluster-api-operator-0.5
in bothdashboard_groups
anddashboards
lists.
- Add a new entry
- Remove tests for previous release branch.
- For example, let's assume we just created tests for v0.5, then we can now drop test coverage for the release-0.4 branch.
- Verify the jobs and dashboards a day later by taking a look at:
https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-operator-0.5
.
Prior art: https://github.com/kubernetes/test-infra/pull/30372
Version migration
This section provides an overview of relevant changes between versions of Cluster API Operator and their direct successors.
Cluster API Operator v1alpha1 compared to v1alpha2
This document provides an overview over relevant changes between Cluster API Operator API v1alpha1 and v1alpha2 for consumers of our Go API.
Changes by Kind
The changes below affect all v1alpha1 provider kinds: CoreProvider
, ControlPlaneProvider
, BootstrapPrivider
and InfrastructureProvider
.
API Changes
This section describes changes that were introduced in v1alpha2 API and how to update your templates to the new version.
ImageMeta -> imageURL conversion
In v1alpha1 we use ImageMeta object that consists of 3 parts:
- Repository (optional string): image registry (e.g., "example.com/repo")
- Name (optional string): image name (e.g., "provider-image")
- Tag (optional string): image tag (e.g., "v1.0.0")
In v1alpha2 it is just a string, which represents the URL, e.g. example.com/repo/image-name:v1.0.0
.
Example:
v1alpha1
spec:
deployment:
containers:
- name: manager
image:
repository: "example.com/repo"
name: "image-name"
tag: "v1.0.0"
v1alpha2
spec:
deployment:
containers:
- name: manager
imageURL: "example.com/repo/image-name:v1.0.0"
secretName/secretNamespace -> configSecret conversion
In v1alpha1 we have 2 separate top-level fields to point to a config secret: secretName
and secretNamespace
. In v1alpha2 we reworked them into an object configSecret
that has 2 fields: name
and namespace
.
Example:
v1alpha1
spec:
secretName: azure-variables
secretNamespace: capz-system
v1alpha2
spec:
configSecret:
name: azure-variables
namespace: capz-system
Developer Guide
Prerequisites
Docker
Iterating on the Cluster API Operator involves repeatedly building Docker containers.
A Cluster
You'll likely want an existing cluster as your management cluster. The easiest way to do this is with kind v0.9 or newer, as explained in the quick start.
Make sure your cluster is set as the default for kubectl
.
If it's not, you will need to modify subsequent kubectl
commands below.
kubectl
kubectl for interacting with the management cluster.
Helm
Helm for installing operator on the cluster (optional).
A container registry
If you're using kind, you'll need a way to push your images to a registry so they can be pulled. You can instead side-load all images, but the registry workflow is lower-friction.
Most users test with GCR, but you could also use something like Docker Hub.
If you choose not to use GCR, you'll need to set the REGISTRY
environment variable.
Kustomize
You'll need to install kustomize
.
There is a version of kustomize
built into kubectl, but it does not have all the features of kustomize
v3 and will not work.
Kubebuilder
You'll need to install kubebuilder
.
Cert-Manager
You'll need to deploy cert-manager components on your management cluster, using kubectl
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.2/cert-manager.yaml
Ensure the cert-manager webhook service is ready before creating the Cluster API Operator components.
This can be done by following instructions for manual verification from the cert-manager website. Note: make sure to follow instructions for the release of cert-manager you are installing.
Development
Option 1: Tilt
Tilt is a tool for quickly building, pushing, and reloading Docker containers as part of a Kubernetes deployment.
Once you have a running Kubernetes cluster, you can run:
tilt up
That's it! Tilt will automatically reload the deployment to your local cluster every time you make a code change.
Option 2: The kustomize way
# Build all the images
make docker-build
# Push images
make docker-push
# Apply the manifests
kustomize build config/default | ./hack/tools/bin/envsubst | kubectl apply -f -
Reference
API Reference
Cluster API Operator currently exposes the following APIs:
-
Cluster API Operator Custom Resource Definitions (CRDs): documentation
-
Golang APIs: godoc
Glossary
The lexicon used in this document is described in more detail here. Any discrepancies should be rectified in the main Cluster API glossary.
Code of Conduct
Kubernetes Community Code of Conduct
Please refer to our Kubernetes Community Code of Conduct
Contributing
Contributing Guidelines
Welcome to Kubernetes. We are excited about the prospect of you joining our community! The Kubernetes community abides by the CNCF code of conduct. Here is an excerpt:
As contributors and maintainers of this project, and in the interest of fostering an open and welcoming community, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.
Getting Started
We have full documentation on how to get started contributing here:
- Contributor License Agreement Kubernetes projects require that you sign a Contributor License Agreement (CLA) before we can accept your pull requests
- Kubernetes Contributor Guide - Main contributor documentation, or you can just jump directly to the contributing section
- Contributor Cheat Sheet - Common resources for existing developers
Mentorship
- Mentoring Initiatives - We have a diverse set of mentorship programs available that are always looking for volunteers!
CI Jobs
This document intends to provide an overview over our jobs running via Prow, GitHub actions and Google Cloud Build. It also documents the cluster-api-operator specific configuration in test-infra.
Builds and Tests running on the main branch
NOTE: To see which test jobs execute which tests or e2e tests, you can click on the links which lead to the respective test overviews in testgrid.
The dashboards for the ProwJobs can be found here: https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-operator
More details about ProwJob configurations can be found here.
Presubmits
Prow Presubmits:
- mandatory for merge, always run:
- pull-cluster-api-operator-build-main
./scripts/ci-build.sh
- pull-cluster-api-operator-make-main
./scripts/ci-make.sh
- pull-cluster-api-operator-verify-main
./scripts/ci-verify.sh
- pull-cluster-api-operator-build-main
- mandatory for merge, run if go code changes:
- pull-cluster-api-operator-test-main
./scripts/ci-test.sh
- pull-cluster-api-operator-e2e-main
./scripts/ci-e2e.sh
- pull-cluster-api-operator-test-main
- optional for merge, run if go code changes:
- pull-cluster-api-operator-apidiff-main
./scripts/ci-apidiff.sh
- pull-cluster-api-operator-apidiff-main
GitHub Presubmit Workflows:
- PR golangci-lint: golangci/golangci-lint-action
- Runs golangci-lint. Can be run locally via
make lint
.
- Runs golangci-lint. Can be run locally via
- PR verify: kubernetes-sigs/kubebuilder-release-tools verifier
- Verifies the PR titles have a valid format, i.e. contains one of the valid icons.
- Verifies the PR description is valid, i.e. is long enough.
- PR dependabot (run on dependabot PRs)
- Regenerates Go modules and code.
Other Github workflows
- release (runs when tags are pushed)
- Creates a GitHub release with release notes for the tag.
- book publishing
- Deploys operator book to GitHub Pages
Postsubmits
Prow Postsubmits:
- post-cluster-api-operator-push-images Google Cloud Build:
make release-staging
Periodics
Prow Periodics:
- periodic-cluster-api-operator-test-main
./scripts/ci-test.sh
- periodic-cluster-api-operator-e2e-main
./scripts/ci-e2e.sh
Test-infra configuration
- config/jobs/image-pushing/k8s-staging-cluster-api.yaml
- Configures postsubmit job to push images and manifests.
- config/jobs/kubernetes-sigs/cluster-api-operator/
- Configures Cluster API Operator presubmit and periodic jobs.
- config/testgrids/kubernetes/sig-cluster-lifecycle/config.yaml
- Configures Cluster API Operator testgrid dashboards.
- config/prow/plugins.yaml
approve
: disable auto-approval of PR authors, ignore GitHub reviews (/approve is explicitly required)lgtm
: enables retaining lgtm through squashrequire_matching_label
: configuresneeds-triage
plugins
: enablesrequire-matching-label
pluginexternal_plugins
: enablescherrypicker
plugin
- label_sync/labels.yaml
- Configures labels for the
cluster-api-operator
repository.
- Configures labels for the
Provider List
The Cluster API Operator introduces new API types: CoreProvider
, BootstrapProvider
, ControlPlaneProvider
, InfrastructureProvider
, AddonProvider
and IPAMProvider
. These five provider types share common Spec and Status types, ProviderSpec
and ProviderStatus
, respectively.
The CRDs are scoped to be namespaced, allowing RBAC restrictions to be enforced if needed. This scoping also enables the installation of multiple versions of controllers (grouped within namespaces) in the same management cluster.
Related Golang structs can be found in the Cluster API Operator repository.
Below are the new API types being defined, with shared types used for Spec and Status among the different provider types—Core, Bootstrap, ControlPlane, and Infrastructure:
CoreProvider
type CoreProvider struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec ProviderSpec `json:"spec,omitempty"`
Status ProviderStatus `json:"status,omitempty"`
}
BootstrapProvider
type BootstrapProvider struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec ProviderSpec `json:"spec,omitempty"`
Status ProviderStatus `json:"status,omitempty"`
}
ControlPlaneProvider
type ControlPlaneProvider struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec ProviderSpec `json:"spec,omitempty"`
Status ProviderStatus `json:"status,omitempty"`
}
InfrastructureProvider
type InfrastructureProvider struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec ProviderSpec `json:"spec,omitempty"`
Status ProviderStatus `json:"status,omitempty"`
}
AddonProvider
type AddonProvider struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec AddonProviderSpec `json:"spec,omitempty"`
Status AddonProviderStatus `json:"status,omitempty"`
}
IPAMProvider
type IPAMProvider struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec IPAMProviderSpec `json:"spec,omitempty"`
Status IPAMProviderStatus `json:"status,omitempty"`
}
The following sections provide details about ProviderSpec
and ProviderStatus
, which are shared among all the provider types.
Provider Status
ProviderStatus
: observed state of the Provider, consisting of:
-
Contract (optional string): core provider contract being adhered to (e.g., "v1beta1")
-
Conditions (optional clusterv1.Conditions): current service state of the provider
-
ObservedGeneration (optional int64): latest generation observed by the controller
-
InstalledVersion (optional string): version of the provider that is installed
YAML example:
status: contract: "v1beta1" conditions: - type: "Ready" status: "True" reason: "ProviderAvailable" message: "Provider is available and ready" observedGeneration: 1 installedVersion: "v0.1.0"