Skip to main content

Auto-Rollback

Auto-rollback automatically reverts a deployment when a CloudWatch alarm fires. This provides a safety net for production deployments -- if your health metrics degrade, DevRamps rolls back to the last known good revision without manual intervention.

How It Works

  1. You configure a CloudWatch alarm in your AWS account that monitors your application's health (e.g., error rates, latency, CPU utilization).
  2. You reference the alarm name in your pipeline stage configuration.
  3. During bootstrap, DevRamps sets up an EventBridge rule in your account to forward alarm state changes.
  4. When the alarm fires:
    • During a deployment: DevRamps cancels the current deployment and rolls back to the last successful revision.
    • After a deployment: DevRamps blocks new deployments to the stage until the alarm clears.
  5. When the alarm clears (returns to OK): The deployment blocker is removed and deployments can resume.

Configuration

Add auto_rollback_alarm_name to any stage:

stages:
- name: production
account_id: "222222222222"
region: us-east-1
auto_rollback_alarm_name: my-service-health-alarm
vars:
env: production

The alarm must exist in the same AWS account and region as the stage.

Alarm Behavior

Alarm fires during deployment (stage is In Progress)

DevRamps:

  1. Cancels the current deployment.
  2. Finds the last successfully deployed revision.
  3. Starts a rollback deployment with that revision.
  4. Skips infrastructure synthesis during rollback (to avoid unnecessary Terraform changes).
  5. Blocks the stage and the next stage from receiving new deployments. The next stage is also blocked because the current stage's rolled-back state means the next stage could be running a newer (potentially broken) revision — blocking prevents inconsistency.

Alarm fires after deployment (stage is Succeeded)

DevRamps:

  1. Adds a deployment blocker to the stage.
  2. New pushes to this stage are queued but won't execute.
  3. When the alarm clears, the blocker is removed and queued deployments can proceed.

Alarm fires during rollback (stage is Rolling Back)

No action is taken -- the rollback is already in progress.

Alarm clears (returns to OK)

DevRamps removes the deployment blocker, allowing deployments to resume.

After a Rollback

After an auto-rollback completes, the stage is blocked from automatic promotions. This prevents the problematic code from being immediately redeployed. To resume normal operations:

  1. Investigate and fix the issue that caused the alarm.
  2. Manually unblock the stage in the dashboard.
  3. Push the fix -- the new deployment will proceed through the pipeline normally.

Setting Up CloudWatch Alarms

To use auto-rollback, you need a CloudWatch alarm in your target account. Here's an example Terraform configuration:

resource "aws_cloudwatch_metric_alarm" "service_health" {
alarm_name = "my-service-health-alarm"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 2
metric_name = "5XXError"
namespace = "AWS/ApplicationELB"
period = 60
statistic = "Sum"
threshold = 10

dimensions = {
LoadBalancer = aws_lb.main.arn_suffix
}

alarm_actions = [] # DevRamps handles the alarm via EventBridge
}

The alarm name in your Terraform must match the auto_rollback_alarm_name in your pipeline YAML. You can configure one alarm per stage.

Choosing alarm metrics

Good metrics for auto-rollback alarms include:

  • 5xx error rate — Catches application errors introduced by a bad deployment.
  • P99 latency — Detects performance regressions.
  • Target group healthy host count — Catches containers failing to start.
  • Custom application metrics — Any business-critical metric your application emits.

Limitations

  • Auto-rollback can only roll back to a previous revision that has been successfully deployed. If the stage has never had a successful deployment, rollback isn't possible.
  • Infrastructure synthesis (Terraform) is skipped during auto-rollback to keep the rollback fast. If you need to revert infrastructure changes, use a manual rollback.