logo
how-to-monitor-kubernetes-apiserver-diagram.webp

Kubernetes Control Plane: API Server, etcd, Scheduler and Controller Manager

  • Author: Trần Trung
  • Published On: 20 May 2025

Deep Inside the Kubernetes Control Plane: API Server, etcd, Scheduler and Controller Manager

Welcome to the journey of exploring Kubernetes. To truly master Kubernetes, it is essential to understand its brain – the Kubernetes Control Plane . Today, we will explore four components of the Control Plane: Kube-API Server , etcd state store, Kube-Scheduler , and Kube Controller Manager . This article will provide an easy-to-understand overview to help you work more confidently.

Kube-API Server: The Critical Communication Gateway

Think of the Kube-API Server as the gatekeeper for your entire Kubernetes cluster. It’s the only thing that users, command-line tools (like kubectl), UIs, and even other components in the Control Plane and on worker nodes (via Kubelets) directly interact with. Every request, whether it’s creating a new Pod, getting information about a Service, or updating a Deployment, must go through the Kube-API Server.

Main Functions of Kube-API Server

The Kube-API Server performs many important tasks, ensuring the smooth and secure operation of the cluster:

  • Request Handling: It exposes a RESTful style application programming interface (API) over HTTP/HTTPS. This allows clients to send requests using standard HTTP methods (GET, POST, PUT, DELETE, PATCH) to manage Kubernetes resources. For example, when you run the kubectl get pods command, kubectl sends a GET request to the API Server.
  • Authentication: Before performing any action, the API Server needs to determine the identity of the person or service sending the request – “Who are you?”. Kubernetes supports multiple authentication mechanisms such as client certificates, bearer tokens (including Service Account tokens, OIDC tokens), or even custom authentication webhooks.
  • Authorization: After knowing who you are, the API Server checks whether you have the right to perform the requested action on a specific resource – “What are you allowed to do?”. The most common and powerful mechanism is Role-Based Access Control (RBAC) , which allows defining Roles with specific permissions and assigning them to users or service accounts via RoleBindings or other mechanisms like ABAC, Node Authorization, Webhook Authorization.
  • Validation: When a request is received to create or update a resource, the API Server checks that the definition (YAML or JSON manifest) of the resource conforms to the correct structure (schema) and rules. If there is an error, the request is rejected.
  • Admission Controllers: This is a more powerful layer of control after authentication and authorization. Admission Controllers can intercept or modify requests before they are persisted to etcd. There are two types: Mutating Admission Webhooks (which can mutate objects) and Validating Admission Webhooks (which can only check and can reject). They help enforce security policies, naming rules, resource limits, and more. For example, a validating webhook can ensure that every container image comes from an approved private registry.
  • Storing State to etcd: After a request has been authenticated, authorized, and validated (including admission control), the API Server writes the desired state of that resource to etcd . This is a very important step, as etcd is where the entire state of the cluster is stored.
  • Provide Watch Mechanism: The API Server allows clients to "watch" changes to resources. When a resource is created, updated, or deleted, the API Server sends notifications to watching clients, helping other components (such as the Scheduler and Controller Manager) react promptly to changes in the cluster.

In simple terms, the Kube-API Server is the nerve center where all information comes in, gets processed, validated, and then distributed to other components or stored securely. Without it, there is no way to manage or interact with the Kubernetes cluster.

Etcd: Kubernetes Trusted State Store

If Kube-API Server is the gateway, then etcd is the memory, the “single source of truth” for the entire Kubernetes cluster. It is a distributed, consistent, and highly reliable key-value store designed to store and manage cluster state and configuration data.

Features and Role of Etcd

  • Stores the Entire Cluster State: Everything about nodes, pods, services, deployments, secrets, configmaps, and all other Kubernetes resources is stored in etcd . It stores not only the desired state that users declare, but also the actual state of those resources as they are updated by the controllers.
  • Consistency and High Reliability: Etcd uses the Raft consensus algorithm to ensure that data written to is consistent across all etcd nodes in the cluster (usually 3 or 5 nodes to ensure high availability and fault tolerance). This is extremely important, because if the state data is inconsistent, the entire Kubernetes cluster can malfunction.
  • Communication Only via Kube-API Server: An important point is that other Kubernetes components (except Kube-API Server itself) do not communicate directly with etcd. All data reads/writes to etcd must go through Kube-API Server. This allows API Server to enforce security policies, authentication, authorization, and validation before data is persisted, while also optimizing access and reducing load on etcd.
  • Watch Support: Similar to API Server, etcd also supports a lower-level "watch" mechanism, allowing API Server to efficiently monitor data changes and then notify its clients.
  • Backup and Recovery: The data in etcd is critical. Regular etcd backups are an integral part of Kubernetes administration so that the cluster state can be restored in the event of a catastrophic failure.

Think of etcd as a highly confidential and heavily guarded accounting ledger of a company. Every transaction, every asset, every contract (corresponding to Kubernetes resources and configurations) is accurately and consistently recorded here. Only the chief accountant (Kube-API Server) has the right to write and read directly from this ledger, ensuring the integrity of the information.

Kube-Scheduler: The Master Coordinator

Once a Pod is created and its information is saved to etcd via the API Server, the next important question is: Which worker node should this Pod run on? This is where the Kube-Scheduler comes in. The Scheduler is a Control Plane component that keeps track of newly created Pods that are not assigned to a node (Pods with an empty .spec.nodeName field) and decides on the most suitable node to run that Pod.

Kube-Scheduler Scheduling Process

The Scheduler's decision making process is not simple, but consists of two main steps:

  1. Filtering:
    • In this step, the Scheduler will filter out the list of nodes that are not suitable to run the Pod. Filtering criteria can include:
      • Pod resource requirements (CPU, memory): Does the Node have enough resources to meet the requirements?
      • Node selectors and affinity/anti-affinity rules: Does the Pod require running on a node with a specific label, or cannot run with/must run with other Pods?
      • Taints and Tolerations: Does a Node have a "taint" (marking a regular Pod not to run) and does a Pod have a "toleration" (accepting to run on a node with that taint) respectively?
      • Volume conflicts: Can the Node satisfy the Pod's storage volume request (e.g. does the Node already have a persistent volume mounted that the Pod requires)?
      • Port conflicts: Is the Pod requesting a hostPort that is already in use on the node?
    • The result of this step is a list of candidate nodes that are capable of running the Pod. If no suitable node is available, the Pod will be in Pending state until a suitable node is available.
  2. Scoring:
    • Once it has a list of candidate nodes, the Scheduler scores each node based on a set of priority rules. The goal is to select the “best” node.
    • Scoring rules may include:
      • Prioritize nodes with lots of free resources (to avoid concentrating too many Pods on one node).
      • Image locality: Prioritize nodes that already have the container image that the Pod needs (to reduce Pod startup time).
      • Spread Pods across failure domains: Spread Pods of the same service/deployment across different failure domains (e.g. different availability zones in the cloud).
      • Node affinity/anti-affinity and Pod affinity/anti-affinity with weights.
    • The node with the highest score is selected. If there are multiple nodes with the same highest score, the Scheduler can choose randomly or in a simple round-robin mechanism.

Once the node is selected, the Scheduler will update the .spec.nodeName field of the Pod object via the Server API. At this point, the Kubelet on the selected node will recognize that this Pod is assigned to it and begin the process of initializing a container for the Pod.

Kube-Scheduler acts as an intelligent traffic orchestrator, ensuring that “vehicles” (Pods) are optimally allocated to “lanes” (Nodes), load balancing, compliance with rules and policies, and maximizing the performance and reliability of the entire system.

Kube Controller Manager: The Team That Maintains the Desired State

We know that Kube-API Server receives requests, etcd stores state, and Kube-Scheduler decides where Pods run. So who will make sure that the actual state of the cluster always matches the desired state that the user has declared? That is the job of Kube Controller Manager .

There is not a single controller, but a master process (daemon) that runs within it many independent control loops, called controllers . Each controller is responsible for monitoring a specific type of resource in the cluster and trying to bring the current state closer to the desired state.

How it Works: Reconciliation Loop

Each controller in Kube Controller Manager operates according to a common principle, called the reconciliation loop:

  1. Observe: The Controller uses the Kube-API Server's "watch" mechanism to monitor the current state of the resources it manages.
  2. Compare: It compares this current state with the desired state stored in etcd (via API Server).
  3. Act: If there is a difference, the controller takes necessary actions (by sending requests to the API Server) to adjust the current state to match the desired state.

This process repeats over and over again, making Kubernetes self-healing and stable. You just declare “I want the system to look like this”, and the controllers work tirelessly to make it happen.

Some Important Controllers

Kube Controller Manager includes several types of controllers, each with its own specific purpose. Here are a few typical examples:

  • Node Controller: Monitors the health of worker nodes. If a node becomes unresponsive (for example, due to a hardware failure), the Node Controller marks it as `NotReady` and can evacuate any Pods running on it to other healthy nodes.
  • Replication Controller / ReplicaSet Controller: Ensures that a certain number of replicas of a Pod are always maintained. If the actual number of Pods is less than desired, it creates more. If more, it deletes some. ReplicaSet is a more modern version and is usually managed by Deployment.
  • Deployment Controller: Manages application deployments and updates in a controlled manner through ReplicaSets. It supports update strategies such as rolling updates and rollbacks.
  • Service Controller: Interacts with the cloud provider's infrastructure to create, update, and delete load balancers when you create a Service of type `LoadBalancer`.
  • Endpoints Controller / EndpointSlice Controller: Populates and updates the Endpoints object (or EndpointSlice for better performance) with a list of IP addresses and ports of Pods that match a Service's selector. This information is used by Kube-Proxy to route traffic.
  • Namespace Controller: Manages the lifecycle of Namespaces. When a Namespace is deleted, it ensures all resources within that Namespace are also cleaned up.
  • Service Account & Token Controllers: Automatically create default ServiceAccounts for each Namespace and manage API tokens for ServiceAccounts, allowing Pods to securely interact with the API Server.
  • CronJob Controller: Manages CronJobs, ensuring Jobs are created and run on a scheduled basis.
  • Job Controller: Monitors Job objects and ensures Pods execute jobs that complete successfully.

The Kube Controller Manager is like a team of dedicated managers, each responsible for a specific area of work, constantly checking, reconciling, and acting to ensure Kubernetes is running properly.

Smooth Coordination Between Control Plane Components

The components of the Kubernetes Control Plane do not operate in isolation but work closely together:

  1. A user or an automated system sends a request (e.g. to create a new Deployment) to the Kube-API Server .
  2. The Kube-API Server authenticates, authorizes, validates the request, and then saves the desired state of the Deployment to etcd .
  3. Deployment Controller (inside Kube Controller Manager), watches for changes related to Deployments on API Server, gets notified about new Deployments.
  4. The Deployment Controller parses the Deployment and creates one or more corresponding ReplicaSets (for example, to manage 3 replicas of a Pod). It sends the request to create this ReplicaSet to the Kube-API Server , and the API Server saves it to etcd .
  5. The ReplicaSet Controller (also in Kube Controller Manager) discovers the new ReplicaSet and sees that it needs to create 3 Pods. It creates definitions for the 3 Pods (no nodes assigned yet) and sends a request to the Kube-API Server . The API Server stores the 3 Pod information in etcd .
  6. Kube-Scheduler , which is monitoring unassigned Pods on the API Server, discovers 3 new Pods. For each Pod, it performs a filtering and scoring process, then selects the most suitable node. The Scheduler updates the selected node information (.spec.nodeName) for each Pod via Kube-API Server . The API Server in turn updates this information to etcd .
  7. Kubelet on selected nodes, which is monitoring its assigned Pods on API Server, sees a new Pod. Kubelet proceeds to download the image, initialize the container, and run the Pod. It also periodically updates the actual state of the Pod (e.g. Running, Succeeded, Failed) to Kube-API Server , and API Server saves it to etcd .
  8. Other controllers (e.g. Endpoints Controller) also watch for relevant changes (like a new Pod running and having a label matching the Service) and update the corresponding resources (e.g. Endpoints object) via the Kube-API Server .

This cycle continues, with the Kube-API Server as the orchestration hub, etcd as the store of truth, the Kube-Scheduler deciding the location, and the Kube Controller Manager making sure everything goes according to plan.

Conclusion

Through this article, I hope you have a deeper and clearer view of the four components: Kubernetes Control Plane : Kube-API Server , etcd , Kube-Scheduler , and Kube Controller Manager . Each component plays an indispensable role in creating a powerful, flexible, and highly self-healing container orchestration platform.

Understanding how they work will not only help you troubleshoot more effectively, but will also open up the possibility of customizing and optimizing your Kubernetes cluster. Enjoy your Kubernetes experience!

  • Share On: