Connect Kubernetes Events and Metrics
Kubernetes is the de facto standard for cloud-native container orchestration. It is so powerful it can orchestrate and execute all services powering a business today. But even with all its capabilities, monitoring and troubleshooting can still be a black box. Despite abundant documentation, not to mention tools, developers and operators often lack the right knowledge or guidance to manage and troubleshoot Kubernetes environments effectively. In short, it’s difficult to connect Kubernetes events and metrics into actionable insights.
When changes occur in a Kubernetes cluster, you have to know what information to look for, where to look for it, and how to access it across several Kubernetes tools. It’s no easy feat considering all the layers — from containers to clusters — that go into a successful Kubernetes deployment. Then, you have to figure out which line or lines in the dozens of configuration files to fix.
What should be simple is actually very complex. So what do developers need to know to get the most out of Kubernetes events and metrics?
What can Kubernetes do out of the box?
Kubernetes offers a lot of events and metrics out of the box that can be harnessed to conduct effective observability when it’s done right. But ensuring it’s all in the right context is key.
Kubernetes offers a detailed view of clusters for alerting and monitoring in the form of events. The events are generated when resources like nodes, deployments, or pods change states. The issue with the events generated is that they don’t last. Kubernetes events don’t persist, they are only available for an hour after the event.
The Events to Track
- Container startup and shutdown events
- Scaling events (e.g., replicating or deleting pods)
- Changes in the state of objects (e.g., pod updates, status changes, etc.)
- Failures, (e.g., pod crashes or node unavailability)
- Scheduling events, such as successful assignment of a pod to a node
- Resource utilization events (e.g., a pod consuming too much memory or CPU)
These are also critical at all levels from the cluster down to the worker node, pods, and containers. Metrics can be pulled from the Metrics Server, which gets data from kubelet and exposes it through the Metrics API. kube-state-metrics listens to the Kubernetes API and generates metrics. And cAdvisor exposes container usage and performance metrics
The Metrics That Matter
- Resource utilization (e.g., CPU and memory usage of nodes and pods).
- Pod and node status (e.g., the number of running, pending, and failed pods).
- Cluster Autoscaler metrics (related to horizontal pod scaling).
- Cluster health (e.g, the number of unschedulable nodes).
- Network traffic and error count (to monitor network connectivity and detect issues).
- API server request latencies (to monitor the performance of the Kubernetes API server).
- Container runtime (e.g., the number of containers in a running state).
Complexity and Lack of Context
There’s a lot of data but it’s not immediately useful. Just seeing it in a log stream or metric chart doesn’t really provide enough context or direction. There are options like Kubernetes Dashboard, Prometheus, Grafana, and Stackdriver, among others. But the toolset can get broad and requires pulling in context from across different tools and sources to see what really happened, who did what, and what needs to be addressed to fix the issue.
Fortunately, there’s a way to view all crucial Kubernetes events and metrics in a system without worrying about information overload or lack of actionable insight. CtrlStack provides a central point-of-view for Kubernetes events and metrics.
Join our upcoming webinar Change Intelligence 101: Connecting Kubernetes Events with Metrics to learn how CtrlStack lets you view all the right information in the right context and how expanding even beyond Kubernetes for more context can speed up root case analysis.