Prometheus Operator

TL;DR:

Use kbst add service prometheus to add Prometheus Operator to your platform
The kbst CLI scaffolds the Terraform module boilerplate for you
Kubestack platform service modules bundle upstream manifests and are fully customizable

Use the module

The kbst CLI helps you scaffold the Terraform code to provision Prometheus Operator on your platform. It takes care of calling the module once per cluster, and sets the correct source and latest version for the module. And it also makes sure the module's configuration and configuration_base_key match your platform.

# add Prometheus Operator service to all platform clusters
kbst add service prometheus

# or optionally only add Prometheus Operator to a single cluster
# 1. list existing platform modules
kbst list
aks_gc0_westeurope
eks_gc0_eu-west-1
gke_gc0_europe-west1

# 2. add Prometheus Operator to a single cluster
kbst add service prometheus --cluster-name aks_gc0_westeurope

Scaffolding the boilerplate is convenient, but platform service modules are fully documented, standard Terraform modules. They can also be used standalone without the Kubestack framework.

Customize resources

All Kubestack platform service modules support the same module attributes and configuration as all Kubestack modules. The module configuration is a Kustomization set in the per environment configuration map following Kubestack's inheritance model.

The example below shows some options to customize the resources provisioned by the Prometheus Operator module.

module "example_prometheus" {
  providers = {
    kustomization = kustomization.example
  }
  source  = "kbst.xyz/catalog/prometheus/kustomization"
  version = "0.75.1-kbst.0"
  configuration = {
    apps = {
+     # change the namespace of all resources
+     namespace = var.example_prometheus_namespace
+
+     # or add an annotation
+     common_annotations = {
+       "terraform-workspace" = terraform.workspace
+     }
+
+     # use images to pull from an internal proxy
+     # and avoid being rate limited
+     images = [{
+       # refers to the 'pod.spec.container.name' to modify the 'image' attribute of
+       name     = "container-name"
+       
+       # customize the 'registry/name' part of the image
+       new_name = "reg.example.com/nginx"
+     }]
    }
    ops = {
+     # scale down replicas in ops
+     replicas = [{
+       # refers to the 'metadata.name' of the resource to scale
+       name = "example"
+       
+       # sets the desired number of replicas
+       count = 1
+     }]
    }
  }
}

In addition to the example attributes shown above, modules also support secret_generator, config_map_generator, patches and many other Kustomization attributes.

Full documentation how to customize a module's Kubernetes resources is available in the platform service module configuration section of the framework documentation.

Usage

Using the Prometheus operator to provision a Prometheus instance and start monitoring an application requires three steps:

Deploy the operator
Provision a Prometheus instance
Create ServiceMonitors for each service exposing metrics

The first step was to deploy the Prometheus operator following the instructions on the install tab.

Prometheus Instance Manifests

Next, we can continue with step number two and use the operator to provision our Prometheus instance.

Each Prometheus instance needs read-only access to the Kubernetes api in order to keep its monitoring targets up to date. To make this easier, the clusterwide base includes two ClusterRoles. One for the operator, called prometheus-operator, and one for the instances, aptly named prometheus-instance.

So, in order to provision a Prometheus instance we need to first create a ServiceAccount and a RoleBinding linking that service account to the pormetheus-instance cluster role. Finally, we can create a Prometheus resource that instructs the operator to provision a Prometheus instance that uses our service account. Below example does just that.

---
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: example-instance
  namespace: default
  labels:
    prometheus: example-instance
spec:
  serviceAccountName: prometheus-example-instance
  serviceMonitorSelector:
    matchLabels:
      prometheus-instance: example-instance
  resources:
    requests:
      # by default the operator requests 2Gi of memory
      # adapt the line below if required to schedule pods
      memory: 2Gi

---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
  name: prometheus-example-instance
  namespace: default
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus-instance
subjects:
- kind: ServiceAccount
  name: prometheus-example-instance
  namespace: default

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus-example-instance
  namespace: default

ServiceMonitors Manifest

Finally, we need to create a ServiceMonitor to instruct the operator, to configure our Prometheus instance to scrape metrics from each of the replicas of an application that exposes its metrics in the Prometheus format.

Since this step is depending on the application and where it exposes the metrics, you will have to adapt the example below.

But there are basically three things to note. First, the metadata.labels need to match the spec.serviceMonitorSelector.matchLabels from the Prometheus resource. Next, the spec.selector.matchLabels below needs to match the metadata.labels set for the service of the application you are trying to monitor. Finally, the spec.endpoints need to match the port name, configured in the application's service.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: application-example
  labels:
    # this label instructs Prometheus to include this ServiceMonitor
    # based on the `spec.serviceMonitorSelector.matchLabels` above
    prometheus-instance: example-instance
spec:
  selector:
    matchLabels:
      # this selector is how the `ServiceMonitor` finds the
      # application's service
      app: application-example
  endpoints:
  # this tells Prometheus on what port the app exposes its metrics
  - port: web

Prometheus UI

To check if Promethues has started scraping your applicaiton's metrics, you can access the Prometheus UI using kubectl port-forward and use it to run a query.

kubectl port-forward prometheus-example-instance-0 9090

Advanced Configuration

Above instructions only cover the basics. The Prometheus operator has many more features and options. For more extended documentation about Prometheus operator configuration, please refer to the official documentation.

End-to-End Walkthrough Example

For a more in-depth example check out the end-to-end walkthrough article, Deploying Prometheus Operator via the Kubestack Catalog, on the Kubestack blog.