CICD Phased Rollout

Introduction

A phased rollout (also known as incremental deployment or progressive delivery) is a CI/CD deployment strategy that gradually releases new versions of an application to subsets of users before deploying to the entire user base. Unlike a traditional all-at-once deployment, phased rollouts reduce risk by allowing teams to validate changes with a limited audience before affecting all users.

This approach is especially valuable when you're uncertain about how a new release might perform in production or when you want to gather user feedback on new features incrementally.

Why Use Phased Rollouts?

Phased rollouts offer several key benefits:

Reduced Risk: By limiting the scope of deployment, potential issues affect fewer users
Early Detection: Problems can be identified before they impact the entire user base
Controlled Testing: Real-world usage data can be gathered from a subset of users
Easy Rollback: If problems occur, only a portion of the infrastructure needs to be reverted
Confidence Building: Teams can gain confidence in changes before full deployment

Common Phased Rollout Strategies

Let's explore the most popular phased rollout approaches:

1. Canary Deployments

Canary deployments (named after the "canary in a coal mine" concept) direct a small percentage of traffic to the new version while the majority continues to use the stable version.

Example Implementation with Kubernetes:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: my-service
spec:
  hosts:
  - my-service
  http:
  - route:
    - destination:
        host: my-service
        subset: v1
      weight: 95
    - destination:
        host: my-service
        subset: v2
      weight: 5

In this example, Istio (a service mesh) is used to direct 5% of traffic to the new version (v2) while keeping 95% on the stable version (v1).

2. Blue/Green with Phased Cutover

In this approach, you deploy the new version (green) alongside the existing version (blue), but instead of switching all traffic at once, you gradually shift users from blue to green.

Example Implementation with AWS:

# Using AWS CloudFormation with weighted DNS routing
Resources:
  DNSRecord:
    Type: "AWS::Route53::RecordSet"
    Properties:
      HostedZoneName: "example.com."
      Name: "service.example.com"
      Type: "A"
      SetIdentifier: "Primary"
      Weight: 90
      AliasTarget:
        HostedZoneId: !GetAtt BlueLoadBalancer.CanonicalHostedZoneID
        DNSName: !GetAtt BlueLoadBalancer.DNSName
        
  NewVersionDNSRecord:
    Type: "AWS::Route53::RecordSet"
    Properties:
      HostedZoneName: "example.com."
      Name: "service.example.com"
      Type: "A"
      SetIdentifier: "Secondary"
      Weight: 10
      AliasTarget:
        HostedZoneId: !GetAtt GreenLoadBalancer.CanonicalHostedZoneID
        DNSName: !GetAtt GreenLoadBalancer.DNSName

This AWS CloudFormation example sets up weighted DNS routing with 10% of traffic going to the new (green) environment.

3. Feature Flags / Feature Toggles

This strategy involves deploying code with new features turned off by default, then gradually enabling them for specific user segments.

// Example feature flag implementation
const userIsInBetaGroup = (userId) => {
  // Check if user is in beta test group
  return betaUserIds.includes(userId);
};

// In application code
if (featureFlags.newDashboard.isEnabledFor(currentUser) || 
    userIsInBetaGroup(currentUser.id)) {
  // Show new dashboard
  renderNewDashboard();
} else {
  // Show existing dashboard
  renderClassicDashboard();
}

Many companies offer feature flag services (LaunchDarkly, Split.io, etc.) that make it easy to control which users see which features.

Implementing a Phased Rollout Pipeline

Let's walk through how to set up a basic phased rollout pipeline:

Step 1: Define Your Deployment Stages

First, define the stages and thresholds for your phased rollout:

# Example configuration in a CI/CD pipeline
phases:
  - name: canary
    percentage: 5
    duration: 60m
    successCriteria:
      errorRate: < 0.1%
      latency99thPercentile: < 200ms
  
  - name: phase1
    percentage: 25
    duration: 120m
    successCriteria:
      errorRate: < 0.1%
      latency99thPercentile: < 200ms
  
  - name: phase2
    percentage: 50
    duration: 60m
    successCriteria:
      errorRate: < 0.1%
      latency99thPercentile: < 200ms
  
  - name: complete
    percentage: 100

Step 2: Set Up Monitoring and Metrics

Implement comprehensive monitoring to compare performance between versions:

// Example monitoring setup with Prometheus and Node.js
const express = require('express');
const client = require('prom-client');
const app = express();

// Create a Registry to register metrics to
const register = new client.Registry();

// Create metrics
const httpRequestDurationMs = new client.Histogram({
  name: 'http_request_duration_ms',
  help: 'Duration of HTTP requests in ms',
  labelNames: ['route', 'status_code', 'app_version'],
  buckets: [0.1, 5, 15, 50, 100, 500]
});

// Register the metrics
register.registerMetric(httpRequestDurationMs);

// Add middleware to measure request duration
app.use((req, res, next) => {
  const end = httpRequestDurationMs.startTimer();
  res.on('finish', () => {
    end({ route: req.path, status_code: res.statusCode, app_version: 'v2' });
  });
  next();
});

// Expose metrics endpoint for Prometheus scraping
app.get('/metrics', async (req, res) => {
  res.setHeader('Content-Type', register.contentType);
  res.send(await register.metrics());
});

Step 3: Implement Automated Rollback Triggers

Set up automated rollback if metrics indicate problems:

// Example rollback logic in a Node.js monitoring service
const axios = require('axios');
const prometheus = require('prom-client');

async function checkMetrics() {
  try {
    // Query error rate for the new version
    const errorRateResponse = await axios.get('http://prometheus:9090/api/v1/query', {
      params: {
        query: 'sum(rate(http_requests_total{status=~"5..",app_version="v2"}[5m])) / sum(rate(http_requests_total{app_version="v2"}[5m])) * 100'
      }
    });
    
    const errorRate = errorRateResponse.data.data.result[0].value[1];
    
    // If error rate exceeds threshold, trigger rollback
    if (parseFloat(errorRate) > 1.0) {
      console.log(`Error rate too high: ${errorRate}%. Triggering rollback...`);
      await axios.post('http://deployment-controller/rollback', {
        deployment: 'my-application',
        version: 'v2',
        reason: 'Error rate exceeded threshold'
      });
    }
  } catch (error) {
    console.error('Error monitoring metrics:', error);
  }
}

// Check metrics every minute
setInterval(checkMetrics, 60000);

Step 4: Create a Deployment Pipeline

Here's an example GitHub Actions workflow for a phased rollout:

name: Phased Deployment

on:
  push:
    branches: [ main ]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Build and push Docker image
        run: |
          docker build -t myapp:${{ github.sha }} .
          docker push myapp:${{ github.sha }}
          
  deploy-canary:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to canary (5%)
        run: |
          kubectl set image deployment/myapp-canary myapp=myapp:${{ github.sha }}
          kubectl scale deployment myapp-canary --replicas=1
          
      - name: Wait and evaluate
        run: |
          sleep 600  # Wait 10 minutes
          # Query metrics and evaluate
          ERROR_RATE=$(curl -s http://prometheus:9090/api/v1/query?query=sum\(rate\(http_requests_total\{status=~\"5..\",app_version=\"v2\"\}\[5m\]\)\)/sum\(rate\(http_requests_total\{app_version=\"v2\"\}\[5m\]\)\)*100 | jq -r '.data.result[0].value[1]')
          if (( $(echo "$ERROR_RATE > 1.0" | bc -l) )); then
            echo "Error rate too high: $ERROR_RATE%. Aborting deployment."
            exit 1
          fi
          
  deploy-phase1:
    needs: deploy-canary
    runs-on: ubuntu-latest
    steps:
      - name: Scale to 25%
        run: |
          kubectl scale deployment/myapp-canary --replicas=5
          kubectl scale deployment/myapp-stable --replicas=15
          
      # Similar monitoring steps...
      
  deploy-phase2:
    needs: deploy-phase1
    runs-on: ubuntu-latest
    steps:
      - name: Scale to 50%
        run: |
          kubectl scale deployment/myapp-canary --replicas=10
          kubectl scale deployment/myapp-stable --replicas=10
          
      # Similar monitoring steps...
      
  complete-deployment:
    needs: deploy-phase2
    runs-on: ubuntu-latest
    steps:
      - name: Complete deployment (100%)
        run: |
          kubectl set image deployment/myapp-stable myapp=myapp:${{ github.sha }}
          kubectl scale deployment/myapp-canary --replicas=0
          kubectl scale deployment/myapp-stable --replicas=20

Real-World Example: Web Application Phased Rollout

Let's look at a practical example of how an e-commerce company might implement a phased rollout for a new checkout process:

1. Planning Phase

The team decides on the following rollout plan:

Initially deploy to 5% of users
Monitor key metrics: conversion rate, error rate, and checkout time
If metrics remain stable, increase to 20%, then 50%, then 100%
Implement automated rollback if critical metrics degrade

2. Implementation

First, they create a feature flag to control the visibility of the new checkout:

// In frontend code
import { initializeFeatureFlags } from './featureFlags';

const featureFlags = initializeFeatureFlags();

function renderCheckoutProcess() {
  // User ID-based percentage rollout
  if (featureFlags.isEnabled('new-checkout-process', {
    userId: currentUser.id,
    percentage: 5 // Starting with 5%
  })) {
    return <NewCheckoutProcess />;
  }
  
  return <CurrentCheckoutProcess />;
}

3. Monitoring Setup

They set up a dashboard that compares the key metrics between old and new versions:

// Tracking checkout success rates
function trackCheckoutCompletion(isSuccessful, checkoutVersion) {
  analytics.track('checkout_completion', {
    successful: isSuccessful,
    duration: performance.now() - checkoutStartTime,
    version: checkoutVersion,
    // Other relevant data
  });
}

// Called when checkout completes or fails
trackCheckoutCompletion(
  orderProcessed, 
  featureFlags.isEnabled('new-checkout-process') ? 'new' : 'current'
);

4. Gradual Expansion

After confirming the 5% rollout is successful, they update the percentage via their feature flag system:

// Update percentage via API or configuration
featureFlagService.updateFlag('new-checkout-process', {
  percentage: 20, // Increase from 5% to 20%
  description: 'Phase 2 of new checkout rollout'
});

5. Handling Issues

If they detect problems, they can quickly disable the feature:

// Emergency rollback via API
featureFlagService.updateFlag('new-checkout-process', {
  enabled: false,
  description: 'Emergency rollback due to payment issues'
});

// Then notify the team
notificationService.alert('New checkout process rollback triggered due to increased payment failures');

Best Practices for Phased Rollouts

Define Clear Metrics: Establish the KPIs that will determine success or failure
Automate Everything: Make progression and rollback decisions automatic when possible
User Segmentation: Consider rolling out to internal users before external customers
Consistent Infrastructure: Ensure new and old versions run in similar environments
Session Stickiness: Make sure users don't bounce between versions during a session
Feature Toggles: Use feature flags to enable/disable features independent of deployment
Progressive Timeframes: Start with longer evaluation periods, then shorten as confidence grows
Comprehensive Logging: Ensure you can trace issues between versions
Communications Plan: Keep stakeholders informed during the rollout process

Common Challenges and Solutions

Challenge	Solution
Database schema changes	Use schema migration tools that support backward compatibility
API versioning	Implement API versioning to support both old and new clients
Monitoring overhead	Use sampling to reduce the volume of metrics while maintaining accuracy
Rollback complexity	Create automated rollback scripts and test them before deployment
User session consistency	Implement sticky sessions to keep users on the same version
Multiple concurrent rollouts	Establish a clear process for prioritizing and scheduling rollouts

Summary

Phased rollout strategies provide a safer approach to deploying applications by gradually introducing changes to a limited audience before extending to all users. The key benefits include reduced risk, early problem detection, and the ability to gather real user feedback.

By implementing canary deployments, blue/green with phased cutover, or feature flags, development teams can significantly reduce the impact of potential issues while gaining confidence in their changes.

Remember to:

Define clear metrics for success
Automate both progression and rollback
Maintain consistent infrastructure
Use feature flags for fine-grained control
Ensure adequate monitoring and alerting

Exercises for Practice

Basic Exercise: Set up a simple feature flag system in a sample application to toggle between two versions of a UI component.
Intermediate Exercise: Implement a canary deployment for a sample microservice using Kubernetes and a service mesh like Istio.
Advanced Exercise: Create a complete phased rollout pipeline with automatic progression and rollback based on performance metrics.

Additional Resources

Books:
- "Accelerate: The Science of Lean Software and DevOps" by Nicole Forsgren, Jez Humble, and Gene Kim
- "Continuous Delivery" by Jez Humble and David Farley
Online Learning:
- Google Cloud's Canary Deployments Guide
- Feature Flags Best Practices
Tools:
- Feature Flag Systems: LaunchDarkly, Split.io, Flagsmith
- Traffic Management: Istio, Linkerd, AWS App Mesh
- Observability: Prometheus, Grafana, New Relic, Datadog

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Why Use Phased Rollouts?​

Common Phased Rollout Strategies​

1. Canary Deployments​

2. Blue/Green with Phased Cutover​

3. Feature Flags / Feature Toggles​

Implementing a Phased Rollout Pipeline​

Step 1: Define Your Deployment Stages​

Step 2: Set Up Monitoring and Metrics​

Step 3: Implement Automated Rollback Triggers​

Step 4: Create a Deployment Pipeline​

Real-World Example: Web Application Phased Rollout​

1. Planning Phase​

2. Implementation​

3. Monitoring Setup​

4. Gradual Expansion​

5. Handling Issues​

Best Practices for Phased Rollouts​

Common Challenges and Solutions​

Summary​

Exercises for Practice​

Additional Resources​