CI/CD Shadow Deployment

Introduction

Shadow deployment is a powerful risk-mitigation strategy in the continuous integration and continuous deployment (CI/CD) pipeline. Unlike traditional deployment methods where new code immediately replaces existing functionality, shadow deployments run new code alongside the existing production code without exposing it to end users. This allows developers to evaluate how new code would perform in real production environments without risking user experience or system stability.

Think of shadow deployment as having a "ghost" version of your application running in parallel with your live version. The ghost receives the same inputs but its outputs are only monitored, not actually used.

Why Use Shadow Deployments?

Shadow deployments offer several advantages:

Risk Reduction: Test new code against real production traffic without affecting users
Performance Validation: Measure how new code performs against production workloads
Bug Detection: Discover issues that might only appear in production environments
Confidence Building: Gain confidence in changes before exposing them to users
Feature Readiness: Determine if a new feature is ready for full deployment

How Shadow Deployments Work

Let's explore how shadow deployments function in a CI/CD pipeline:

User requests are sent to the production environment as normal
Each request is duplicated and sent to both the production service and the shadow service
The production service processes the request and returns results to users
The shadow service processes the identical request, but its results are not returned to users
Results from both services are logged and compared for analysis

Implementing Shadow Deployments

Let's look at how to implement shadow deployments in different scenarios:

Basic Infrastructure Setup

Here's a simple example using Nginx as a proxy to duplicate traffic:

nginx
http {
    upstream production_backend {
        server production-app:8080;
    }
    
    upstream shadow_backend {
        server shadow-app:8080;
    }
    
    server {
        listen 80;
        
        location / {
            # Send request to production
            proxy_pass http://production_backend;
            
            # Mirror request to shadow service
            mirror /shadow;
            mirror_request_body on;
        }
        
        # Handle mirrored requests
        location /shadow {
            proxy_pass http://shadow_backend$request_uri;
            proxy_ignore_client_abort on;
            proxy_set_header X-Shadow-Request true;
            
            # Discard the response from the shadow service
            proxy_pass_request_headers off;
            proxy_pass_request_body off;
        }
    }
}

Cloud-Based Implementation

For a more robust solution in AWS, you might use AWS Lambda and API Gateway:

javascript
// AWS Lambda function to handle traffic mirroring
exports.handler = async (event) => {
    try {
        // Forward the original request to production
        const productionResponse = await forwardToProduction(event);
        
        // Asynchronously forward to shadow service without waiting for response
        forwardToShadow(event).catch(err => {
            console.error("Shadow service error:", err);
        });
        
        // Return only the production response to the user
        return productionResponse;
    } catch (error) {
        console.error("Error in request handler:", error);
        return {
            statusCode: 500,
            body: JSON.stringify({ error: "Internal Server Error" })
        };
    }
};

async function forwardToProduction(event) {
    // Implementation to forward request to production service
    // ...
}

async function forwardToShadow(event) {
    // Implementation to forward request to shadow service
    // ...
    // Log response metrics for comparison
}

Monitoring and Comparison

The key to effective shadow deployments is proper monitoring and comparison of results:

javascript
// Example code for comparing shadow and production responses
function compareResponses(productionResponse, shadowResponse) {
    const metrics = {
        statusCodeMatch: productionResponse.statusCode === shadowResponse.statusCode,
        responseTimeRatio: shadowResponse.duration / productionResponse.duration,
        payloadSizeRatio: JSON.stringify(shadowResponse.body).length / 
                          JSON.stringify(productionResponse.body).length,
        errors: []
    };
    
    // Check for specific response differences
    try {
        const prodData = JSON.parse(productionResponse.body);
        const shadowData = JSON.parse(shadowResponse.body);
        
        // Compare key fields in the response
        if (prodData.items?.length !== shadowData.items?.length) {
            metrics.errors.push("Item count mismatch");
        }
        
        // Add more specific comparisons as needed
        
    } catch (error) {
        metrics.errors.push(`Comparison error: ${error.message}`);
    }
    
    // Log metrics to your monitoring system
    logMetricsToMonitoringSystem(metrics);
    
    return metrics;
}

Real-World Example: E-commerce Product Recommendation

Let's consider a practical example of implementing shadow deployment for a new recommendation algorithm in an e-commerce application:

Current System: Your e-commerce site uses Algorithm A to show product recommendations
New Feature: You've developed Algorithm B that you believe will improve conversion rates

Instead of immediately replacing Algorithm A with Algorithm B, you implement a shadow deployment:

javascript
// Pseudo-code for handling product recommendations
async function getProductRecommendations(userId, productContext) {
    // Get recommendations from the current production algorithm
    const productionRecommendations = await algorithmA.getRecommendations(userId, productContext);
    
    try {
        // Asynchronously call the shadow algorithm
        algorithmB.getRecommendations(userId, productContext)
            .then(shadowRecommendations => {
                // Log both recommendations for comparison
                compareAndLogRecommendations(
                    userId, 
                    productContext,
                    productionRecommendations,
                    shadowRecommendations
                );
            })
            .catch(error => {
                console.error("Shadow algorithm error:", error);
                // Log failure metrics
                logShadowFailure(userId, productContext, error);
            });
    } catch (error) {
        // Shadow errors should never affect the main flow
        console.error("Error setting up shadow call:", error);
    }
    
    // Return only the production recommendations to the user
    return productionRecommendations;
}

In this example, users continue to receive recommendations from Algorithm A, while Algorithm B runs in parallel. The system logs both sets of recommendations for comparison, allowing you to evaluate if Algorithm B is truly better before making it live.

Graduating from Shadow to Production

Once you've collected enough data and are confident that your shadow deployment is performing well, you can "graduate" it to production. This typically follows these steps:

Data Analysis: Review performance metrics comparing shadow and production
Fix Issues: Address any problems discovered during shadow period
Controlled Rollout: Use techniques like canary deployment or feature flags to gradually introduce the new code
Full Deployment: Replace the old code entirely once confident

Challenges and Considerations

While shadow deployments are powerful, they come with challenges:

Resource Requirements

Shadow deployments essentially double the processing resources for shadowed components. Ensure your infrastructure can handle this load.

Stateful Operations

For operations that change state (like database writes), you need to prevent the shadow service from making actual changes:

javascript
class ShadowDatabaseClient extends DatabaseClient {
    async write(data) {
        if (this.isShadowMode) {
            // In shadow mode, validate the write operation but don't actually perform it
            await this.validateWrite(data);
            
            // Log what would have happened
            this.logShadowWrite(data);
            
            // Return a mock response
            return { success: true, id: "shadow-" + Date.now() };
        } else {
            // Normal write for production
            return super.write(data);
        }
    }
}

External Service Calls

Be careful with external API calls that might have side effects:

javascript
async function processPayment(paymentDetails, isShadowMode) {
    if (isShadowMode) {
        // Don't actually charge the customer in shadow mode!
        // Instead, validate the payment details
        const validationResult = await validatePaymentDetails(paymentDetails);
        
        // Log what would have happened
        logShadowPaymentAttempt(paymentDetails, validationResult);
        
        return createMockPaymentResponse(validationResult);
    } else {
        // Real payment processing for production
        return paymentProcessor.charge(paymentDetails);
    }
}

Best Practices

To get the most out of shadow deployments:

Start Small: Begin with non-critical components
Monitor Thoroughly: Compare performance, errors, and business metrics
Set Clear Criteria: Define what success looks like before starting
Handle Side Effects: Be careful with operations that affect external systems
Automate Analysis: Build tools to automatically compare and report differences
Limit Duration: Don't run shadow deployments indefinitely; set a timeframe

Summary

Shadow deployments offer a powerful approach to mitigate risks when deploying new code. By running new code alongside production code without exposing it to users, you can:

Test in real production environments
Gain confidence in your changes
Detect potential issues before they affect users
Make data-driven decisions about when to deploy

This technique is particularly valuable for critical systems where downtime or bugs could have significant consequences.

Further Learning

To build on this knowledge:

Exercise 1: Design a shadow deployment architecture for a microservice you're familiar with
Exercise 2: Create a comparison framework to evaluate metrics between production and shadow systems
Exercise 3: Implement a simple shadow deployment for a non-critical component in your application

Canary Deployments
A/B Testing
Feature Flags
Blue/Green Deployments
Traffic Mirroring

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Why Use Shadow Deployments?​

How Shadow Deployments Work​

Implementing Shadow Deployments​

Basic Infrastructure Setup​

Cloud-Based Implementation​

Monitoring and Comparison​

Real-World Example: E-commerce Product Recommendation​

Graduating from Shadow to Production​

Challenges and Considerations​

Resource Requirements​

Stateful Operations​

External Service Calls​

Best Practices​

Summary​

Further Learning​

Related Topics​