Skip to main content
Version: 2.0 🆕

Runbooks

Introduction​

A Runbook is a predefined action that Devtron runs to apply a change, such as resizing resources or hibernating a namespace. When you approve an AI recommendation, its linked runbook carries out the change with the required approvals.

Using AI-generated Runbooks​

Whenever AI detects an optimization opportunity, it automatically generates a corresponding runbook to carry out the recommended change once approved. These runbooks are auto-linked from Notifications of AI Recommendations.

Figure 1: Runbook Listing

Example of Remediation Action

When AI recommends a cost optimization such as reducing memory allocation, the linked runbook carries out that change.
For example, scaling down a pod’s memory limit from 6 Gi to 3 Gi across selected clusters.

Using Your Runbook​

If you wish to create or modify a runbook beyond what AI generates automatically, Devtron allows you to create one.

info

Follow this section only if you wish to create a runbook different than the one generated by AI.

  1. From the left navigation, go to AI Recommendations → Runbooks.

  2. Click Create Runbook.

Figure 2: Creating New Runbook

  1. Enter the following details:

    • Name - Example: update-resource-limits
    • Description - Example: Updates CPU and memory limits for workloads.
  2. Click Create Runbook to save.


Add Runbook Spec​

You can edit your runbooks here. Each runbook follows a YAML structure that defines its metadata, tags, and executable steps.

Figure 3: Edit Runbook Spec

Use the YAML editor in Devtron to paste and modify this structure.

apiVersion: devtron.ai/v1
kind: Runbook
metadata:
name: <name of the runbook>
description: <description of the runbook, specifying its purpose and usage>
tags:
- <tag1 specifying category or type>
- <tag2 specifying purpose>
spec:
steps:
- name: <name of the step>
action: <predefined action to be executed>
type : <type of action, devtron-action, kubernetes-action, custom-action>
parameters:
param1: <value for parameter 1>
param2: <value for parameter 2>
onFailure:
- nextStep: <name of the next step to execute on failure>

Each step in runbook spec represents one operation that can interact with Kubernetes resources, Devtron apps, or external systems. Below are the most commonly used predefined actions supported by Devtron runbooks.

Example 1: Get Deployment Manifest​

Retrieves the manifest of a specified deployment in a Kubernetes cluster.

spec:
steps:
- name: <name of the step>
action: get-k8s-workload-controller-manifest
type : kubectl-get
parameters:
clusterId: "{{.clusterId}}"
group: "{{.group}}"
version: "{{.version}}"
kind: "Deployment"
namespace: "{{.namespace}}"
resourceName: "{{.resourceName}}"
When to use

To inspect the configuration of an existing deployment before applying any changes.

Example 2: Update Resource Spec in Deployment Manifest​

Updates the CPU and memory requests or limits for a container inside a Kubernetes workload.

spec:
steps:
- name: <name of the step>
action: update-k8s-workload-resource-spec
type : kubectl-patch
parameters:
clusterId: "{{.clusterId}}"
group: "{{.group}}"
version: "{{.version}}"
kind: "Pod"
namespace: "{{.namespace}}"
resourceName: "{{.resourceName}}"
patch:
spec:
container:
name: "{{.containerName}}"
resources:
requests:
cpu: "{{.newCpuRequestValue}}"
memory: "{{.newMemoryRequestValue}}"
limits:
cpu: "{{.newCpuLimitValue}}"
memory: "{{.newMemoryLimitValue}}"
When to use

To rightsize workload resource consumption and optimize costs.

Example 3: Update Resource Spec in Devtron Apps Config​

Applies resource specification updates within Devtron-managed application configurations.

spec:
steps:
- name: <name of the step>
action: update-resource-spec-devtron-apps-config
type : devtron-app-patch
parameters:
clusterId: "{{.clusterId}}"
group: "{{.group}}"
version: "{{.version}}"
kind: "Pod"
namespace: "{{.namespace}}"
resourceName: "{{.resourceName}}"
patch:
spec:
container:
name: "{{.containerName}}"
resources:
requests:
cpu: "{{.newCpuRequestValue}}"
memory: "{{.newMemoryRequestValue}}"
limits:
cpu: "{{.newCpuLimitValue}}"
memory: "{{.newMemoryLimitValue}}"
When to use

To modify resource values for Devtron-managed apps directly through the configuration interface.

Example 4: Update Resource Spec in Helm Chart Values​

Modifies resource settings defined within Helm chart values YAML files.

spec:
steps:
- name: <name of the step>
action: update-resource-spec-helm-chart-values-yaml
type : helm-chart-patch
parameters:
clusterId: "{{.clusterId}}"
group: "{{.group}}"
version: "{{.version}}"
kind: "Pod"
namespace: "{{.namespace}}"
resourceName: "{{.resourceName}}"
patch:
spec:
container:
name: "{{.containerName}}"
resources:
requests:
cpu: "{{.newCpuRequestValue}}"
memory: "{{.newMemoryRequestValue}}"
limits:
cpu: "{{.newCpuLimitValue}}"
memory: "{{.newMemoryLimitValue}}"
When to use

To synchronize Helm chart values with runtime resource adjustments.

Example 5: Webhook to Any Service​

Sends a webhook to an external service for integrations such as Slack notifications, monitoring tools, or CI/CD triggers.

spec:
steps:
- name: <name of the step>
action: webhook
type : devtron-action
parameters:
url: <<"url to which the webhook needs to be sent">>
headers: <<"headers to be included in the webhook">>
httpMethod: <<"HTTP method to be used (GET, POST, etc.)">>
body: <<"body of the webhook">>
When to use

To notify other systems or trigger automated workflows upon completion of a Devtron runbook.


Approval Types​

Before execution, every AI-generated runbook requires an approval decision. You can approve or reject its execution for specific clusters and different durations.

Figure 4: Approve or Reject Runbook

When you take an action, Devtron applies the following logic:

  • If you approve or reject a runbook, the decision auto-applies to all the recommendations linked to that runbook across the selected clusters.
  • If you approve or reject an individual recommendation, the runbook is rejected only for that specific cluster where the recommendation originated.

Approve Options​

OptionBehaviorExample Use Case
ForeverAll future runs of this runbook stands indefinitely auto-approved.For dev or sandbox clusters where downtime or failed runs are acceptable and you want continuous savings.
Till date & timeAuto-approves until a specific expiry date and time.During a maintenance window or before a critical demo, so changes are applied automatically until that period ends.
For durationAuto-approves temporarily for a set number of hours.For short tests or limited-time fixes, such as approving remediation for the next few hours.

Reject Options​

OptionBehaviorExample Use Case
ForeverBlocks all future runs of this runbook permanently.For production clusters where any automated remediation is risky or unwanted.
Till date & timeRejects runs until a specific expiry date and time.When you want the cluster to stay stable (e.g., during a product demo or release).
For durationRejects runs temporarily for a few hours.To pause remediation during high-traffic periods or while verifying manual changes.
What happens when the approval or rejection period expires?

When any approval or rejection period ends, the runbook status resets to Action Pending. The user is expected to take an action again.


Audit Logs​

Every Runbook logs:

  • Created / Updated / Approved / Rejected actions
  • User, timestamp, and resource
  • Full JSON payload for traceability

Figure 5: Audit Log

You can access this under AI Recommendations → Runbooks → Audit Logs.