Terraform Data Sources
Introduction
When working with Terraform, you'll often need to reference resources or information that already exists in your infrastructure or that's defined outside of your Terraform configuration. This is where data sources come in.
Data sources in Terraform allow you to fetch or compute values that can be used elsewhere in your configuration. Unlike resources that create and manage infrastructure, data sources only read information. Think of them as read-only queries that bring external information into your Terraform project.
What Are Data Sources?
Data sources are a way to query external systems and fetch data without actually creating or modifying anything. They help you:
- Reference existing infrastructure not managed by Terraform
- Fetch information from your cloud provider
- Query attributes of resources managed in other Terraform configurations
- Import data from external systems or APIs
Basic Syntax
The basic syntax for a data source is:
data "provider_type" "name" {
[CONFIG ...]
}
Where:
provider_type
is the type of data source (likeaws_ami
,azurerm_resource_group
, etc.)name
is a unique identifier you choose for this data source[CONFIG ...]
represents the configuration arguments specific to that data source
Accessing Data Source Attributes
Once you've declared a data source, you can reference its attributes using the syntax:
data.provider_type.name.attribute
Common Data Source Examples
Example 1: Finding the Latest Amazon Machine Image (AMI)
One of the most common uses of data sources is to find the latest Amazon Machine Image for EC2 instances:
data "aws_ami" "ubuntu" {
most_recent = true
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
}
filter {
name = "virtualization-type"
values = ["hvm"]
}
owners = ["099720109477"] # Canonical's AWS account ID
}
resource "aws_instance" "web_server" {
ami = data.aws_ami.ubuntu.id
instance_type = "t2.micro"
tags = {
Name = "WebServer"
}
}
What's happening here?
- We define a data source of type
aws_ami
named "ubuntu" - We set filters to find Ubuntu 20.04 images with HVM virtualization
- We specify that we want the most recent image
- We use the AMI ID from this data source when creating our EC2 instance
Output: When you run terraform apply
, Terraform will:
- Query AWS to find the latest Ubuntu image matching our criteria
- Store all the image attributes in the data source
- Use the image ID when creating the EC2 instance
Example 2: Reading Environment Variables
Terraform includes a special data source for reading environment variables:
data "external" "env" {
program = ["bash", "-c", "env | grep TF_ | sort | jq -R 'split(\"=\") | {key: .[0], value: .[1]}' | jq -s ."]
}
output "environment_variables" {
value = data.external.env.result
sensitive = true
}
What's happening here?
- We use the
external
data source to run a command - The command outputs environment variables starting with "TF_" in JSON format
- We capture these in an output variable
Example 3: Fetching IP Ranges for a Service
You can use data sources to get IP ranges for services like AWS:
data "aws_ip_ranges" "ec2" {
regions = ["us-east-1", "us-west-2"]
services = ["ec2"]
}
resource "aws_security_group" "from_ec2" {
name = "from_ec2"
ingress {
from_port = "443"
to_port = "443"
protocol = "tcp"
cidr_blocks = data.aws_ip_ranges.ec2.cidr_blocks
}
}
What's happening here?
- We fetch all IP ranges used by EC2 in specific regions
- We use these IP ranges to create security group rules
How Data Sources Work
Let's break down the lifecycle of a data source:
When you run terraform apply
:
- Terraform identifies all data sources in your configuration
- For each data source, Terraform makes API calls to fetch the requested information
- The data is stored in the Terraform state file
- Resources that reference the data source can access its attributes
Advanced Usage Patterns
Computed Data Sources
Some data sources can accept arguments that come from other resources:
resource "aws_vpc" "example" {
cidr_block = "10.0.0.0/16"
tags = {
Name = "example-vpc"
}
}
data "aws_vpc_endpoint_service" "s3" {
service = "s3"
service_type = "Gateway"
}
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.example.id
service_name = data.aws_vpc_endpoint_service.s3.service_name
}
What's happening here?
- We create a VPC resource
- We query for the S3 endpoint service details
- We create a VPC endpoint for S3 using both the VPC ID and the service name from the data source
Filtering and Selecting
Many data sources allow you to filter results:
data "aws_subnet_ids" "private" {
vpc_id = aws_vpc.main.id
tags = {
Tier = "Private"
}
}
resource "aws_lb" "internal" {
name = "internal-lb"
internal = true
load_balancer_type = "application"
subnets = data.aws_subnet_ids.private.ids
}
What's happening here?
- We're finding all subnets in a VPC that have the tag
Tier = "Private"
- We're using these subnet IDs for an internal load balancer
Local Data Sources
Terraform also has data sources that don't interact with providers:
data "local_file" "config" {
filename = "${path.module}/config.json"
}
output "config_content" {
value = jsondecode(data.local_file.config.content)
}
What's happening here?
- We read a local file's content
- We decode it as JSON and output it
Common Data Sources by Provider
AWS
aws_ami
: Find Amazon Machine Imagesaws_availability_zones
: List available AZsaws_region
: Get current region detailsaws_vpc
: Find an existing VPC
Azure
azurerm_resource_group
: Reference an existing resource groupazurerm_virtual_network
: Get an existing VNetazurerm_subscription
: Get details about the current subscription
Google Cloud
google_compute_image
: Find a compute imagegoogle_compute_zones
: List available zonesgoogle_project
: Get details about a project
Best Practices
-
Use data sources for read-only operations
- Data sources should only read information, not modify it
-
Handle changes gracefully
- Data sources might return different values over time (e.g., latest AMI)
- Consider using version constraints when applicable
-
Cache data when appropriate
- Some data sources make API calls that count against rate limits
- For static data, you might want to store the values in variables instead
-
Use count or for_each with data sources
- You can dynamically create multiple instances of a data source:
data "aws_availability_zones" "available" {
state = "available"
}
resource "aws_subnet" "primary" {
count = length(data.aws_availability_zones.available.names)
vpc_id = aws_vpc.main.id
availability_zone = data.aws_availability_zones.available.names[count.index]
cidr_block = "10.0.${count.index}.0/24"
}
Common Pitfalls
-
Data source depends on a resource that doesn't exist yet
- Solution: Use
depends_on
to create explicit dependencies
- Solution: Use
-
Data source returns too many or too few results
- Solution: Refine your filters or use more specific queries
-
Values change between plan and apply
- Solution: For critical values that shouldn't change, consider hardcoding them
Summary
Data sources are a powerful feature in Terraform that allow you to query and use information from existing infrastructure or external systems. They help you:
- Reference resources created outside of Terraform
- Fetch dynamic information from your cloud provider
- Make your configurations more flexible and reusable
- Integrate with existing infrastructure
By mastering data sources, you can build more dynamic and adaptable Terraform configurations that work seamlessly with both managed and unmanaged resources.
Additional Resources
- Terraform Data Sources Documentation
- AWS Provider Data Sources
- Azure Provider Data Sources
- Google Cloud Provider Data Sources
Exercises
- Use the
aws_ami
data source to find the latest Amazon Linux 2 AMI in your region. - Create a configuration that uses data sources to fetch all availability zones in your region, and then creates a subnet in each one.
- Use the
http
data source to fetch information from a public API and use it in your Terraform configuration. - Create a data source that queries for existing security groups with specific tags, and then references them in a new EC2 instance.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)