Version: 2.0 🆕

Runbooks

Introduction

A Runbook is a predefined action that Devtron runs to apply a change, such as resizing resources or hibernating a namespace. When you approve an AI recommendation, its linked runbook carries out the change with the required approvals.

Using AI-generated Runbooks

Whenever AI detects an optimization opportunity, it automatically generates a corresponding runbook to carry out the recommended change once approved. These runbooks are auto-linked from Notifications of AI Recommendations.

Figure 1: Runbook Listing

Example of Remediation Action

When AI recommends a cost optimization such as reducing memory allocation, the linked runbook carries out that change.
For example, scaling down a pod’s memory limit from 6 Gi to 3 Gi across selected clusters.

Next Steps

Verify/Edit Runbook Spec
Approve/Reject Runbook

Using Your Runbook

If you wish to create or modify a runbook beyond what AI generates automatically, Devtron allows you to create one.

info

Follow this section only if you wish to create a runbook different than the one generated by AI.

From the left navigation, go to AI Recommendations → Runbooks.
Click Create Runbook.

Figure 2: Creating New Runbook

Enter the following details:
- Name - Example: update-resource-limits
- Description - Example: Updates CPU and memory limits for workloads.
Click Create Runbook to save.

Next Steps

Verify/Edit Runbook Spec
Approve/Reject Runbook

Add Runbook Spec

You can edit your runbooks here. Each runbook follows a YAML structure that defines its metadata, tags, and executable steps.

Figure 3: Edit Runbook Spec

Use the YAML editor in Devtron to paste and modify this structure.

apiVersion: devtron.ai/v1
kind: Runbook
metadata:
  name: <name of the runbook>
  description: <description of the runbook, specifying its purpose and usage>
  tags:
    - <tag1 specifying category or type>
    - <tag2 specifying purpose>
spec:
  steps:
    - name: <name of the step>
      action: <predefined action to be executed>
      type : <type of action, devtron-action, kubernetes-action, custom-action>
      parameters:
        param1: <value for parameter 1>
        param2: <value for parameter 2>
      onFailure:
        - nextStep: <name of the next step to execute on failure>

Each step in runbook spec represents one operation that can interact with Kubernetes resources, Devtron apps, or external systems. Below are the most commonly used predefined actions supported by Devtron runbooks.

Example 1: Get Deployment Manifest

Retrieves the manifest of a specified deployment in a Kubernetes cluster.

spec:
  steps:
    - name: <name of the step>
      action: get-k8s-workload-controller-manifest
      type : kubectl-get
      parameters:
        clusterId: "{{.clusterId}}"
        group: "{{.group}}"
        version: "{{.version}}"
        kind: "Deployment"
        namespace: "{{.namespace}}"
        resourceName: "{{.resourceName}}"

When to use

To inspect the configuration of an existing deployment before applying any changes.

Example 2: Update Resource Spec in Deployment Manifest

Updates the CPU and memory requests or limits for a container inside a Kubernetes workload.

spec:
  steps:
    - name: <name of the step>
      action: update-k8s-workload-resource-spec
      type : kubectl-patch
      parameters:
        clusterId: "{{.clusterId}}"
        group: "{{.group}}"
        version: "{{.version}}"
        kind: "Pod"
        namespace: "{{.namespace}}"
        resourceName: "{{.resourceName}}"
        patch:
          spec:
            container:
              name: "{{.containerName}}"
              resources:
                requests:
                  cpu: "{{.newCpuRequestValue}}"
                  memory: "{{.newMemoryRequestValue}}"
                limits:
                  cpu: "{{.newCpuLimitValue}}"
                  memory: "{{.newMemoryLimitValue}}"

When to use

To rightsize workload resource consumption and optimize costs.

Example 3: Update Resource Spec in Devtron Apps Config

Applies resource specification updates within Devtron-managed application configurations.

spec:
  steps:
    - name: <name of the step>
      action: update-resource-spec-devtron-apps-config
      type : devtron-app-patch
      parameters:
        clusterId: "{{.clusterId}}"
        group: "{{.group}}"
        version: "{{.version}}"
        kind: "Pod"
        namespace: "{{.namespace}}"
        resourceName: "{{.resourceName}}"
        patch:
          spec:
            container:
              name: "{{.containerName}}"
              resources:
                requests:
                  cpu: "{{.newCpuRequestValue}}"
                  memory: "{{.newMemoryRequestValue}}"
                limits:
                  cpu: "{{.newCpuLimitValue}}"
                  memory: "{{.newMemoryLimitValue}}"

When to use

To modify resource values for Devtron-managed apps directly through the configuration interface.

Example 4: Update Resource Spec in Helm Chart Values

Modifies resource settings defined within Helm chart values YAML files.

spec:
  steps:
    - name: <name of the step>
      action: update-resource-spec-helm-chart-values-yaml
      type : helm-chart-patch
      parameters:
        clusterId: "{{.clusterId}}"
        group: "{{.group}}"
        version: "{{.version}}"
        kind: "Pod"
        namespace: "{{.namespace}}"
        resourceName: "{{.resourceName}}"
        patch:
          spec:
            container:
              name: "{{.containerName}}"
              resources:
                requests:
                  cpu: "{{.newCpuRequestValue}}"
                  memory: "{{.newMemoryRequestValue}}"
                limits:
                  cpu: "{{.newCpuLimitValue}}"
                  memory: "{{.newMemoryLimitValue}}"

When to use

To synchronize Helm chart values with runtime resource adjustments.

Example 5: Webhook to Any Service

Sends a webhook to an external service for integrations such as Slack notifications, monitoring tools, or CI/CD triggers.

spec:
  steps:
    - name: <name of the step>
      action: webhook
      type : devtron-action
      parameters:
        url: <<"url to which the webhook needs to be sent">>
        headers: <<"headers to be included in the webhook">>
        httpMethod: <<"HTTP method to be used (GET, POST, etc.)">>
        body: <<"body of the webhook">>

When to use

To notify other systems or trigger automated workflows upon completion of a Devtron runbook.

Approval Types

Before execution, every AI-generated runbook requires an approval decision. You can approve or reject its execution for specific clusters and different durations.

Figure 4: Approve or Reject Runbook

When you take an action, Devtron applies the following logic:

If you approve or reject a runbook, the decision auto-applies to all the recommendations linked to that runbook across the selected clusters.
If you approve or reject an individual recommendation, the runbook is rejected only for that specific cluster where the recommendation originated.

Approve Options

Option	Behavior	Example Use Case
Forever	All future runs of this runbook stands indefinitely auto-approved.	For dev or sandbox clusters where downtime or failed runs are acceptable and you want continuous savings.
Till date & time	Auto-approves until a specific expiry date and time.	During a maintenance window or before a critical demo, so changes are applied automatically until that period ends.
For duration	Auto-approves temporarily for a set number of hours.	For short tests or limited-time fixes, such as approving remediation for the next few hours.

Reject Options

Option	Behavior	Example Use Case
Forever	Blocks all future runs of this runbook permanently.	For production clusters where any automated remediation is risky or unwanted.
Till date & time	Rejects runs until a specific expiry date and time.	When you want the cluster to stay stable (e.g., during a product demo or release).
For duration	Rejects runs temporarily for a few hours.	To pause remediation during high-traffic periods or while verifying manual changes.

What happens when the approval or rejection period expires?

When any approval or rejection period ends, the runbook status resets to Action Pending. The user is expected to take an action again.

Audit Logs

Every Runbook logs:

Created / Updated / Approved / Rejected actions
User, timestamp, and resource
Full JSON payload for traceability

Figure 5: Audit Log

You can access this under AI Recommendations → Runbooks → Audit Logs.

Introduction​

Using AI-generated Runbooks​

Using Your Runbook​

Add Runbook Spec​

Example 1: Get Deployment Manifest​

Example 2: Update Resource Spec in Deployment Manifest​

Example 3: Update Resource Spec in Devtron Apps Config​

Example 4: Update Resource Spec in Helm Chart Values​

Example 5: Webhook to Any Service​

Approval Types​

Approve Options​

Reject Options​

Audit Logs​

Introduction

Using AI-generated Runbooks

Using Your Runbook

Add Runbook Spec

Example 1: Get Deployment Manifest

Example 2: Update Resource Spec in Deployment Manifest

Example 3: Update Resource Spec in Devtron Apps Config

Example 4: Update Resource Spec in Helm Chart Values

Example 5: Webhook to Any Service

Approval Types

Approve Options

Reject Options

Audit Logs