Skip to main content

python-data-types

md
---
title: Python Data Types
description: A comprehensive guide to Python's built-in data types with examples and applications for PyTorch programming

---

# Python Data Types

## Introduction

Understanding data types is fundamental to programming in Python, especially when working with PyTorch for machine learning. Python is dynamically typed, which means you don't need to declare variable types explicitly. However, knowing how Python handles different kinds of data is essential for writing efficient code and avoiding errors when building deep learning models.

In this tutorial, we'll explore Python's built-in data types that you'll frequently use when working with PyTorch. We'll cover how to define, manipulate, and convert between these types with practical examples relevant to machine learning tasks.

## Numeric Types

Python offers several numeric data types that form the foundation of mathematical operations in PyTorch.

### Integers

Integers (`int`) represent whole numbers without decimal points. They're commonly used for indexing, counting, and defining network architecture parameters.

```python
# Integer examples
batch_size = 32
num_epochs = 100
hidden_layers = 3

print(type(batch_size)) # Output: <class 'int'>
print(batch_size) # Output: 32

Floating-Point Numbers

Floating-point numbers (float) represent real numbers with decimal points. In PyTorch, most tensor values are floats.

python
# Float examples
learning_rate = 0.001
weight_decay = 1e-5
accuracy = 0.97

print(type(learning_rate)) # Output: <class 'float'>
print(learning_rate) # Output: 0.001

Complex Numbers

Complex numbers (complex) contain a real and imaginary part. They're less common in basic PyTorch operations but can appear in signal processing applications.

python
# Complex number example
complex_num = 3 + 4j
print(type(complex_num)) # Output: <class 'complex'>
print(complex_num.real) # Output: 3.0
print(complex_num.imag) # Output: 4.0

Sequential Types

Sequential types store collections of items and are extensively used in PyTorch for batching and data handling.

Lists

Lists are ordered, mutable collections that can hold items of different types. They're dynamic, allowing you to add, remove, or modify elements.

python
# List example
dimensions = [224, 224, 3] # Height, width, channels for an image
architectures = ['ResNet', 'VGG', 'Inception']

# Accessing elements
print(dimensions[0]) # Output: 224
print(architectures[1]) # Output: VGG

# Modifying a list
dimensions.append(1) # Add batch dimension
print(dimensions) # Output: [224, 224, 3, 1]

# Slicing
print(architectures[0:2]) # Output: ['ResNet', 'VGG']

Lists are commonly used to store sequences of values like model configurations, training metrics over epochs, or class labels.

Tuples

Tuples are similar to lists but immutable (cannot be changed after creation). They're useful for fixed collections like image dimensions or model parameters.

python
# Tuple example
image_shape = (28, 28, 1) # MNIST image shape
version = (1, 0, 2) # Software version number

print(type(image_shape)) # Output: <class 'tuple'>
print(image_shape[0]) # Output: 28

# Tuples are immutable
try:
image_shape[0] = 32 # This will raise an error
except TypeError as e:
print("Tuples are immutable:", e)

Strings

Strings (str) are sequences of characters used for text data, file paths, or model descriptions.

python
# String examples
model_name = "ResNet18"
dataset_path = "/data/imagenet/"

# String methods
print(model_name.lower()) # Output: resnet18
print(dataset_path.split('/')) # Output: ['', 'data', 'imagenet', '']
print("Epoch: " + str(10)) # Output: Epoch: 10

# Formatted strings (f-strings)
accuracy = 0.95
print(f"Model accuracy: {accuracy:.1%}") # Output: Model accuracy: 95.0%

Mapping Type: Dictionaries

Dictionaries (dict) store key-value pairs, making them efficient for structured data like model configurations or dataset metadata.

python
# Dictionary example
model_config = {
"name": "ResNet18",
"num_classes": 10,
"learning_rate": 0.001,
"batch_size": 32
}

# Accessing dictionary values
print(model_config["name"]) # Output: ResNet18

# Adding or modifying entries
model_config["optimizer"] = "Adam"
model_config["learning_rate"] = 0.0005

# Dictionary methods
print(model_config.keys()) # Output: dict_keys(['name', 'num_classes', 'learning_rate', 'batch_size', 'optimizer'])
print(model_config.get("epochs", 100)) # Output: 100 (default value if key doesn't exist)

# Dictionary comprehension
squared = {x: x**2 for x in range(5)}
print(squared) # Output: {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

Set Types

Sets are unordered collections of unique elements, useful for removing duplicates and membership testing.

python
# Set example
classes = {"cat", "dog", "bird"}
more_classes = {"dog", "fish", "horse"}

# Set operations
union = classes.union(more_classes)
print(union) # Output: {'cat', 'dog', 'bird', 'fish', 'horse'}

intersection = classes.intersection(more_classes)
print(intersection) # Output: {'dog'}

# Add and remove elements
classes.add("lizard")
classes.remove("bird")
print(classes) # Output: {'cat', 'dog', 'lizard'}

Boolean Type

The boolean type (bool) has two values: True and False. Booleans are essential for conditional operations and flow control.

python
# Boolean examples
is_training = True
has_gpu = False

# Boolean operations
print(is_training and has_gpu) # Output: False
print(is_training or has_gpu) # Output: True
print(not is_training) # Output: False

# Comparisons return booleans
accuracy = 0.85
target = 0.9
print(accuracy >= target) # Output: False
print(5 == 5) # Output: True

None Type

Python has a special None type representing the absence of a value. It's often used as a default parameter or to initialize variables.

python
# None examples
model = None # No model loaded yet
result = None # No result calculated yet

print(model is None) # Output: True

# Common pattern in PyTorch functions
def create_model(pretrained=None):
if pretrained is not None:
# Load pretrained weights
print("Loading pretrained model")
else:
# Initialize randomly
print("Creating new model")

create_model() # Output: Creating new model
create_model("resnet18") # Output: Loading pretrained model

Type Conversion

Python allows conversion between different data types, which is useful when preparing data for PyTorch tensors.

python
# Type conversion examples
# String to int
epoch_str = "10"
epoch_num = int(epoch_str)
print(epoch_num, type(epoch_num)) # Output: 10 <class 'int'>

# Float to int (truncates decimal part)
accuracy = 0.985
accuracy_percent = int(accuracy * 100)
print(accuracy_percent) # Output: 98

# Int to float
batch = 32
batch_float = float(batch)
print(batch_float) # Output: 32.0

# List to tensor (PyTorch specific)
import torch
data_list = [1, 2, 3, 4]
tensor = torch.tensor(data_list)
print(tensor) # Output: tensor([1, 2, 3, 4])

# NumPy array to tensor
import numpy as np
np_array = np.array([1, 2, 3])
tensor = torch.from_numpy(np_array)
print(tensor) # Output: tensor([1, 2, 3], dtype=torch.int64)

Real-world PyTorch Applications

Let's see how these data types are used in practical PyTorch scenarios:

Model Configuration

python
# Using dictionaries for model configuration
config = {
"model": "CNN",
"layers": [
{"type": "conv", "filters": 32, "kernel_size": 3, "activation": "relu"},
{"type": "maxpool", "size": 2},
{"type": "conv", "filters": 64, "kernel_size": 3, "activation": "relu"},
{"type": "flatten"},
{"type": "dense", "units": 128, "activation": "relu"},
{"type": "dense", "units": 10, "activation": "softmax"}
],
"optimizer": {
"name": "adam",
"learning_rate": 0.001
},
"batch_size": 32,
"epochs": 10
}

# Accessing nested configuration
print(f"Training for {config['epochs']} epochs with {config['optimizer']['name']} optimizer")
# Output: Training for 10 epochs with adam optimizer

Dataset Processing

python
# Loading and processing image dataset
image_paths = [
"data/cat_1.jpg",
"data/cat_2.jpg",
"data/dog_1.jpg"
]

# Creating labels using dictionaries
class_to_idx = {
"cat": 0,
"dog": 1
}

# Processing file names to get labels
dataset = []
for path in image_paths:
filename = path.split('/')[-1]
class_name = filename.split('_')[0]
label = class_to_idx[class_name]
dataset.append((path, label))

print(dataset)
# Output: [('data/cat_1.jpg', 0), ('data/cat_2.jpg', 0), ('data/dog_1.jpg', 1)]

Training Loop Stats Tracking

python
# Using lists to track training metrics
epochs = 5
train_losses = []
val_accuracies = []

# Simulated training loop
for epoch in range(epochs):
# Training would happen here
train_loss = 1.0 / (epoch + 1) # Simulated decreasing loss
val_accuracy = 0.8 + epoch * 0.04 # Simulated increasing accuracy

# Store metrics
train_losses.append(train_loss)
val_accuracies.append(val_accuracy)

print(f"Epoch {epoch+1}/{epochs}: loss={train_loss:.4f}, accuracy={val_accuracy:.2%}")

# Final stats with dictionary
final_stats = {
"best_epoch": val_accuracies.index(max(val_accuracies)) + 1,
"best_accuracy": max(val_accuracies),
"final_loss": train_losses[-1]
}

print(f"Best model at epoch {final_stats['best_epoch']} with {final_stats['best_accuracy']:.2%} accuracy")

Summary

Understanding Python's data types is essential for effective PyTorch programming:

  • Numeric types (int, float) form the basis of tensor values and model parameters
  • Sequential types (lists, tuples, strings) help organize data and model configurations
  • Dictionaries provide structured storage for complex configurations and metadata
  • Sets help with unique value handling and set operations
  • Booleans enable conditional processing and evaluation
  • None represents absence of values and serves as default parameters
  • Type conversion facilitates moving between Python datatypes and PyTorch tensors

These foundational types enable you to manipulate data, configure models, and track results in your PyTorch projects. As you progress, you'll build on this knowledge to work with PyTorch's specialized tensor types, which inherit many behaviors from Python's native numeric types.

Exercises

  1. Create a dictionary containing a PyTorch model configuration with at least three layers.
  2. Write a function that converts a list of floating-point accuracies into percentage strings.
  3. Create a set of unique labels from a list that contains duplicate class names.
  4. Write code to track training and validation losses in separate lists and find the epoch with the smallest difference between them.
  5. Create a nested data structure (using lists and dictionaries) to represent a dataset with features and labels.

Additional Resources



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)