Custom Metrics API
Introduction
The Custom Metrics API is a powerful extension to Kubernetes that allows you to expose application-specific metrics to the Kubernetes control plane. This integration is particularly valuable when you need to scale applications based on metrics beyond the standard CPU and memory usage (which are provided by the Resource Metrics API).
In a Prometheus and Kubernetes environment, the Custom Metrics API bridges the gap between your application metrics collected by Prometheus and the Kubernetes autoscaling system. This enables more intelligent scaling decisions based on business-relevant metrics like request latency, queue lengths, or concurrent users.
Understanding the Custom Metrics API
What is the Custom Metrics API?
The Custom Metrics API is an extension API that follows Kubernetes' aggregation layer pattern. It allows third-party solutions like Prometheus to expose application-specific metrics to Kubernetes components, particularly the Horizontal Pod Autoscaler (HPA).
By default, Kubernetes can only scale based on CPU and memory usage. With the Custom Metrics API, you can scale based on any metric that makes sense for your application:
- Request count or rate
- Queue length
- Response time
- Error rates
- Business-specific metrics (like active users)
Architecture Overview
The architecture involves several components working together:
- Your application pods expose metrics via Prometheus exporters
- Prometheus scrapes and stores these metrics
- Prometheus Adapter translates between Prometheus queries and the Kubernetes Custom Metrics API
- Kubernetes HPA queries these metrics and makes scaling decisions
Setting Up the Custom Metrics API
Let's walk through setting up the Custom Metrics API with Prometheus in your Kubernetes cluster.
Prerequisites
- A running Kubernetes cluster (v1.16+)
- Prometheus installed in your cluster
- Helm (optional, but recommended for easier installation)
Installing the Prometheus Adapter
The Prometheus Adapter is the component that implements the Custom Metrics API and makes Prometheus metrics available to Kubernetes.
Using Helm (Recommended)
# Add the Prometheus Community Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Install the Prometheus Adapter
helm install prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring \
--create-namespace \
--set prometheus.url=http://prometheus-server.monitoring.svc.cluster.local \
--set prometheus.port=9090
Using YAML Manifests
If you prefer to use raw YAML manifests, you can apply the following configuration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-adapter
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: prometheus-adapter
template:
metadata:
labels:
app: prometheus-adapter
spec:
containers:
- name: prometheus-adapter
image: registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.10.0
args:
- --prometheus-url=http://prometheus-server.monitoring.svc.cluster.local:9090
- --metrics-relist-interval=1m
- --v=4
- --secure-port=6443
ports:
- containerPort: 6443
---
apiVersion: v1
kind: Service
metadata:
name: prometheus-adapter
namespace: monitoring
spec:
ports:
- port: 443
targetPort: 6443
selector:
app: prometheus-adapter
Configuring the Prometheus Adapter
The Prometheus Adapter needs configuration to know which Prometheus metrics should be exposed through the Custom Metrics API. Create a ConfigMap with your adapter configuration:
apiVersion: v1
kind: ConfigMap
metadata:
name: adapter-config
namespace: monitoring
data:
config.yaml: |
rules:
- seriesQuery: '{__name__=~"http_requests_total"}'
resources:
overrides:
kubernetes_namespace:
resource: namespace
kubernetes_pod_name:
resource: pod
name:
matches: "http_requests_total"
as: "http_requests_per_second"
metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
This configuration exposes the http_requests_total
Prometheus metric as http_requests_per_second
to the Custom Metrics API, calculating a 2-minute rate and allowing Kubernetes to query it per pod or namespace.
Using Custom Metrics for Autoscaling
Once you have the Custom Metrics API set up, you can create HPAs that use these metrics for scaling decisions.
Creating an HPA with Custom Metrics
Here's an example of an HPA that scales a service based on HTTP request rate:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: sample-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: sample-app
minReplicas: 1
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 500m # 500 milli-requests per second, or 0.5 requests/second
Verifying the Setup
You can verify that your custom metrics are available to Kubernetes using the following commands:
# Check if the API service is available
kubectl get apiservice v1beta1.custom.metrics.k8s.io
# List available custom metrics
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .
# Get a specific metric for pods in a namespace
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/http_requests_per_second" | jq .
Example output:
{
"kind": "MetricValueList",
"apiVersion": "custom.metrics.k8s.io/v1beta1",
"metadata": {
"selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/http_requests_per_second"
},
"items": [
{
"describedObject": {
"kind": "Pod",
"namespace": "default",
"name": "sample-app-8446f84d77-2zvxq",
"apiVersion": "/v1"
},
"metricName": "http_requests_per_second",
"timestamp": "2023-08-15T14:30:22Z",
"value": "250m"
}
]
}
Practical Example: Scaling a Web Service
Let's walk through a complete practical example of setting up a web service with custom metrics for autoscaling.
Step 1: Deploy a Sample Application with Prometheus Metrics
Here's a simple web application that exposes Prometheus metrics:
apiVersion: apps/v1
kind: Deployment
metadata:
name: sample-web-app
labels:
app: sample-web-app
spec:
replicas: 1
selector:
matchLabels:
app: sample-web-app
template:
metadata:
labels:
app: sample-web-app
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
containers:
- name: sample-web-app
image: ghcr.io/stefanprodan/podinfo:latest
ports:
- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: sample-web-app
spec:
selector:
app: sample-web-app
ports:
- port: 80
targetPort: 8080
This sample application automatically exposes Prometheus metrics at /metrics
.
Step 2: Configure the Prometheus Adapter for the App's Metrics
Update your Prometheus Adapter configuration to recognize the application's metrics:
apiVersion: v1
kind: ConfigMap
metadata:
name: adapter-config
namespace: monitoring
data:
config.yaml: |
rules:
- seriesQuery: '{__name__=~"http_requests_total",kubernetes_pod_name!=""}'
resources:
overrides:
kubernetes_namespace:
resource: namespace
kubernetes_pod_name:
resource: pod
name:
matches: "http_requests_total"
as: "http_requests_per_second"
metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
Step 3: Create HPA for the Sample Application
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: sample-web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: sample-web-app
minReplicas: 1
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 10 # Scale when pods average 10 requests per second
Step 4: Generate Traffic to Test the Autoscaling
You can use a tool like hey
or siege
to generate traffic to your service:
# Install hey if needed
go install github.com/rakyll/hey@latest
# Generate traffic
hey -z 5m -q 20 -c 50 http://sample-web-app.default.svc.cluster.local/
While traffic is being generated, you can monitor your HPA and observe scaling:
kubectl get hpa sample-web-app-hpa --watch
Example output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
sample-web-app-hpa Deployment/sample-web-app 8500m/10 1 10 1 45s
sample-web-app-hpa Deployment/sample-web-app 12500m/10 1 10 1 60s
sample-web-app-hpa Deployment/sample-web-app 12500m/10 1 10 2 75s
sample-web-app-hpa Deployment/sample-web-app 8300m/10 1 10 2 90s
Advanced Configurations
External Metrics API
In addition to Custom Metrics, Kubernetes also offers the External Metrics API, which allows scaling based on metrics not related to Kubernetes objects, such as metrics from external sources like AWS SQS queue length or Cloudwatch metrics.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: queue-processor-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: queue-processor
minReplicas: 1
maxReplicas: 10
metrics:
- type: External
external:
metric:
name: sqs_messages
selector:
matchLabels:
queue_name: processing-queue
target:
type: AverageValue
averageValue: 30
Multi-Metric and Compound Scaling Decisions
You can define multiple metrics in a single HPA, enabling more sophisticated scaling decisions:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: complex-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: complex-app
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 1000m
- type: Pods
pods:
metric:
name: latency_seconds
target:
type: AverageValue
averageValue: 0.5
In this example, the HPA will scale based on whichever metric requires the highest number of replicas.
Troubleshooting
Common Issues and Solutions
-
Metrics not appearing in the Custom Metrics API
Verify your adapter configuration and make sure the metrics exist in Prometheus:
bash# Check if Prometheus is collecting the metrics
kubectl port-forward -n monitoring svc/prometheus-server 9090:80
# Then visit http://localhost:9090 and query your metrics
# Check adapter logs
kubectl logs -n monitoring deployment/prometheus-adapter -
HPA not scaling based on custom metrics
Check if the HPA can see the metrics:
bashkubectl describe hpa sample-web-app-hpa
Look for warnings or errors in the events section.
-
"Unable to fetch metrics" errors
This often indicates connectivity issues between the adapter and Prometheus. Verify the Prometheus URL in the adapter configuration.
Summary
The Custom Metrics API is a powerful extension to Kubernetes that enables autoscaling based on application-specific metrics collected by Prometheus. By setting up the Prometheus Adapter, you can bridge the gap between your application's business metrics and Kubernetes' scaling mechanisms.
Key points to remember:
- The Custom Metrics API allows Kubernetes to scale based on application-specific metrics beyond CPU and memory
- The Prometheus Adapter implements this API by translating Prometheus metrics into a format Kubernetes understands
- Configuring the adapter requires careful mapping between Prometheus metrics and Kubernetes resources
- HPAs can use these custom metrics for making scaling decisions
- You can combine multiple metrics for more sophisticated scaling behavior
By mastering the Custom Metrics API, you can create more intelligent, business-aware scaling behaviors that better match your application's actual needs.
Exercises
- Set up the Custom Metrics API in your Kubernetes cluster using the Prometheus Adapter.
- Configure the adapter to expose a custom metric from your application.
- Create an HPA that scales your application based on this custom metric.
- Test the scaling behavior by generating load on your application.
- Advanced: Create an HPA that uses multiple metrics (both resource and custom metrics) for scaling decisions.
Additional Resources
- Kubernetes HPA Documentation
- Prometheus Adapter GitHub Repository
- Custom Metrics API Design
- Kubernetes Metrics Server (for the core Resource Metrics API)
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)