Ansible Tower Clustering

Introduction

Ansible Tower (or its open-source counterpart AWX) is a powerful web-based solution that adds a user interface, role-based access control, job scheduling, and more to Ansible automation. By default, Ansible Tower runs as a standalone instance, which might be sufficient for small environments. However, as your automation needs grow, you might need more capacity and redundancy.

This is where Ansible Tower Clustering comes into play. Clustering allows you to distribute Ansible automation workloads across multiple nodes, providing high availability, fault tolerance, and increased capacity for your automation platform.

What is Ansible Tower Clustering?

Ansible Tower Clustering is the process of connecting multiple Tower instances together to function as a single logical unit. In a Tower cluster:

All nodes share a common database
All nodes share a common message queue (RabbitMQ)
Job workloads are distributed across all available nodes
If one node fails, others continue to process jobs

Benefits of Clustering

High Availability: If one Tower node fails, the others continue to operate, preventing disruption to your automation workflows.
Increased Capacity: Distribute automation jobs across multiple nodes to handle more concurrent tasks.
Horizontal Scalability: Add more nodes to the cluster as your automation needs grow.
Load Balancing: Evenly distribute user interface traffic and job processing across available nodes.
Redundancy: Eliminate single points of failure in your automation infrastructure.

Prerequisites for Tower Clustering

Before setting up an Ansible Tower cluster, ensure you have:

A minimum of three Tower nodes (recommended for production)
Shared PostgreSQL database (external to Tower nodes)
Shared message queue (RabbitMQ)
Network connectivity between all nodes
Sufficient resources (CPU, memory) on each node
Valid Tower licenses for all nodes

Setting Up an Ansible Tower Cluster

Let's walk through the process of setting up a basic Ansible Tower cluster.

Step 1: Prepare Your Inventory File

First, create an inventory file that defines your Tower nodes, database, and other components:

[tower]
tower1.example.com
tower2.example.com
tower3.example.com

[database]
db.example.com

[all:vars]
admin_password='password'
pg_host='db.example.com'
pg_port='5432'
pg_database='tower'
pg_username='tower'
pg_password='dbpassword'
rabbitmq_port=5672
rabbitmq_vhost=tower
rabbitmq_username='tower'
rabbitmq_password='rabbitpassword'
rabbitmq_cookie=cookiemonster

# Isolated Tower nodes automatically generate an auth token to authenticate
# with the cluster nodes. This token can be re-used to add more nodes later.
# Set to a blank string if you don't want the installer to create one.
# tower_isolated_key=''

Step 2: Run the Tower Setup Script

Execute the Tower setup script with your inventory file:

./setup.sh -i inventory_file

The installation process will:

Install Tower on all nodes
Configure the shared database connection
Set up the shared message queue
Establish cluster communication

Step 3: Verify Cluster Status

After installation, verify your cluster by logging into the Tower web interface. Navigate to Settings → Instance Groups to see all your cluster nodes:

Instance Name    | Capacity | Used Capacity | State
----------------|----------|--------------|--------
tower1.example.com | 100      | 0            | running
tower2.example.com | 100      | 0            | running  
tower3.example.com | 100      | 0            | running

Understanding Instance Groups

In Ansible Tower, nodes are organized into Instance Groups. By default, all nodes belong to the tower instance group, but you can create additional groups for workload isolation.

Creating a Custom Instance Group

Navigate to Settings → Instance Groups
Click Add
Provide a name for your instance group (e.g., production)
Add nodes to this instance group
Click Save

Now you can associate specific job templates with this instance group.

// Example API call to create an instance group
const createInstanceGroup = async () => {
  const response = await fetch('/api/v2/instance_groups/', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': 'Bearer YOUR_TOKEN'
    },
    body: JSON.stringify({
      name: 'production',
      instances: [1, 2, 3] // IDs of Tower nodes
    })
  });
  return await response.json();
};

Configuring Job Templates for Instance Groups

To specify which instance group should run a particular job template:

Edit a job template
Scroll to the Instance Groups section
Select one or more instance groups
Click Save

This allows you to direct specific workloads to designated nodes, which can be useful for resource-intensive or environment-specific jobs.

Capacity Management

Each node in a Tower cluster has a capacity value that determines how many jobs it can run concurrently. By default, this is set based on the number of CPU cores available:

Capacity = (Number of CPU cores - 1)
Minimum capacity: 1

You can adjust this value in the Tower settings:

Navigate to Settings → Instance Groups
Click on an instance
Modify the Capacity Adjustment field
Click Save

# Example: Set capacity of a node using the API
curl -X PATCH \
  https://tower.example.com/api/v2/instances/1/ \
  -H 'Authorization: Bearer TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{"capacity_adjustment": 0.5}'

This would reduce the node's capacity by 50%.

Monitoring Cluster Health

Regularly monitoring your Tower cluster is essential. Here are key metrics to watch:

Instance Status: Ensure all nodes show as "running"
Instance Capacity: Monitor used vs. total capacity
Database Performance: Watch for slow queries or high load
Message Queue Health: Check RabbitMQ status and queue sizes

Tower provides health check endpoints for monitoring:

# Health check endpoint
curl https://tower.example.com/api/v2/ping/

# Expected response
{"ping": "pong"}

Troubleshooting Common Cluster Issues

Issue: Node Not Joining Cluster

Possible Causes:

Network connectivity issues
Firewall blocking communication
Wrong database credentials

Solution: Check Tower logs on the affected node:

sudo tail -f /var/log/tower/tower.log

Issue: Uneven Job Distribution

Possible Causes:

Capacity imbalance
Node health issues

Solution: Check instance capacity settings and adjust if necessary.

Issue: Database Connectivity Problems

Possible Causes:

Database overload
Network issues

Solution: Verify database connection and status:

psql -h db.example.com -U tower -d tower -c "SELECT 1"

Advanced Clustering Features

Isolated Nodes

Isolated nodes are Tower instances that exist in a network-restricted environment but are controlled by the cluster. They're useful for running jobs in segregated networks like DMZs.

To configure isolated nodes:

[tower]
tower1.example.com
tower2.example.com

[isolated_group_dmz]
isolated1.example.com
isolated2.example.com

[isolated_group_dmz:vars]
controller=tower1.example.com

Container Groups

Container Groups allow Tower to dispatch jobs to OpenShift or Kubernetes pods instead of Tower nodes. This provides dynamic scaling based on workload.

To set up a Container Group:

Navigate to Settings → Instance Groups
Click Add
Toggle Container Group
Configure your Kubernetes or OpenShift connection
Click Save

Real-World Example: Scaling for Peak Automation Periods

Let's look at a practical scenario where clustering helps solve a real business problem.

Scenario: A retail company needs to process thousands of inventory updates every weekend, but their standalone Tower instance can't handle the load.

Solution: Implement a cluster architecture with workload scheduling:

Set up a 3-node Tower cluster
Create a dedicated instance group called inventory_processing
Configure weekend job templates to use this instance group
Schedule inventory jobs with appropriate concurrency limits

Result: The company can now process all inventory updates within their maintenance window without overloading their system.

# Example Ansible playbook to run inventory updates
- name: Update Store Inventory
  hosts: all
  gather_facts: false
  tasks:
    - name: Process inventory feeds
      include_role:
        name: process_inventory
      vars:
        store_id: "{{ inventory_hostname }}"
        update_type: full

Summary

Ansible Tower Clustering provides an effective way to scale your automation platform for high availability and increased capacity. By understanding the clustering architecture and properly configuring instance groups, you can build a robust automation infrastructure that meets your organization's needs.

Key takeaways:

Tower clusters share a common database and message queue
Instance groups allow for workload isolation and targeting
Proper capacity planning ensures efficient job execution
Regular monitoring helps maintain cluster health

Additional Resources

Exercises

Set up a three-node Tower cluster in a test environment using virtual machines.
Create two separate instance groups and configure different job templates to target each group.
Simulate a node failure and observe how the cluster handles job distribution.
Write an Ansible playbook that uses the Tower API to check the health of all nodes in your cluster.
Configure a container group and run jobs using Kubernetes or OpenShift pods.

💡 Found a typo or mistake? Click "Edit this page" to suggest a correction. Your feedback is greatly appreciated!

Introduction​

What is Ansible Tower Clustering?​

Benefits of Clustering​

Prerequisites for Tower Clustering​

Setting Up an Ansible Tower Cluster​

Step 1: Prepare Your Inventory File​

Step 2: Run the Tower Setup Script​

Step 3: Verify Cluster Status​

Understanding Instance Groups​

Creating a Custom Instance Group​

Configuring Job Templates for Instance Groups​

Capacity Management​

Monitoring Cluster Health​

Troubleshooting Common Cluster Issues​

Issue: Node Not Joining Cluster​

Issue: Uneven Job Distribution​

Issue: Database Connectivity Problems​

Advanced Clustering Features​

Isolated Nodes​

Container Groups​

Real-World Example: Scaling for Peak Automation Periods​

Summary​

Additional Resources​

Exercises​

Introduction

What is Ansible Tower Clustering?

Benefits of Clustering

Prerequisites for Tower Clustering

Setting Up an Ansible Tower Cluster

Step 1: Prepare Your Inventory File

Step 2: Run the Tower Setup Script

Step 3: Verify Cluster Status

Understanding Instance Groups

Creating a Custom Instance Group

Configuring Job Templates for Instance Groups

Capacity Management

Monitoring Cluster Health

Troubleshooting Common Cluster Issues

Issue: Node Not Joining Cluster

Issue: Uneven Job Distribution

Issue: Database Connectivity Problems

Advanced Clustering Features

Isolated Nodes

Container Groups

Real-World Example: Scaling for Peak Automation Periods

Summary

Additional Resources

Exercises