PyTorch Siamese Networks

Introduction

Siamese Networks are a special type of neural network architecture that consists of two or more identical subnetworks used to generate feature vectors for each input and compare them. Unlike traditional neural networks that learn to classify inputs directly, Siamese Networks learn to differentiate between inputs by computing similarity or dissimilarity metrics between them.

Named after Siamese twins, these networks share exactly the same parameters and weights. This architecture makes them particularly powerful for tasks where we need to compare two examples and determine their similarity, such as:

Face verification and recognition
Signature verification
Similar image retrieval
One-shot learning
Tracking objects in videos

In this tutorial, we'll explore how Siamese Networks work, implement them in PyTorch, and demonstrate their application on a practical example.

Understanding Siamese Networks

Core Concept

The fundamental concept behind Siamese Networks is simple yet powerful:

Take two input examples
Process both through identical neural networks (with shared weights)
Compare the resulting feature representations using a distance metric
Determine similarity based on the distance

Siamese Network Architecture

Key Components

A Siamese Network consists of:

Twin Networks - Identical neural networks that share weights
Feature Extraction - Each network produces an encoding/embedding of its input
Distance Metric - A function that measures similarity between the encodings (e.g., Euclidean distance, cosine similarity)
Loss Function - Typically contrastive loss or triplet loss to train the model

Implementing a Siamese Network in PyTorch

Let's implement a simple Siamese Network for determining if two MNIST digits are the same or different.

Step 1: Setting up the Environment

First, let's import the necessary libraries:

python
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image

Step 2: Creating a Custom Dataset

We'll create a custom dataset for training our Siamese Network:

python
class SiameseDataset(Dataset):
    def __init__(self, mnist_dataset):
        self.mnist_dataset = mnist_dataset
        self.train = True
        self.transform = transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,))
        ])
        
    def __getitem__(self, index):
        # Randomly choose if the pair should be similar (1) or dissimilar (0)
        should_be_same_class = random.randint(0, 1)
        
        # Get the first image
        img1, label1 = self.mnist_dataset[index]
        
        # Get the second image based on whether it should be the same class
        if should_be_same_class:
            # Find another image of the same class
            indices = [i for i, (_, label) in enumerate(self.mnist_dataset) if label == label1]
            index2 = random.choice(indices)
        else:
            # Find an image of a different class
            indices = [i for i, (_, label) in enumerate(self.mnist_dataset) if label != label1]
            index2 = random.choice(indices)
        
        img2, label2 = self.mnist_dataset[index2]
        
        return (img1, img2), torch.FloatTensor([should_be_same_class])
        
    def __len__(self):
        return len(self.mnist_dataset)

Step 3: Building the Siamese Network Model

Now, let's create our Siamese Network architecture:

python
class SiameseNetwork(nn.Module):
    def __init__(self):
        super(SiameseNetwork, self).__init__()
        # Feature extractor CNN
        self.feature_extractor = nn.Sequential(
            nn.Conv2d(1, 64, kernel_size=5),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2),
            nn.Conv2d(64, 128, kernel_size=5),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2),
            nn.Flatten(),
            nn.Linear(128 * 4 * 4, 256),
            nn.ReLU(inplace=True)
        )
        
        # Distance layer (fully connected)
        self.fc = nn.Sequential(
            nn.Linear(256, 128),
            nn.ReLU(inplace=True),
            nn.Linear(128, 1),
            nn.Sigmoid()
        )
        
    def forward_one(self, x):
        # Forward pass for one input
        return self.feature_extractor(x)
        
    def forward(self, input1, input2):
        # Get features for both inputs
        output1 = self.forward_one(input1)
        output2 = self.forward_one(input2)
        
        # Calculate absolute difference between feature vectors
        diff = torch.abs(output1 - output2)
        
        # Pass through fully connected layers
        out = self.fc(diff)
        
        return out

Step 4: Implementing the Training Function

Now we'll create a function to train our Siamese Network:

python
def train_siamese_network(model, train_loader, criterion, optimizer, device, epochs=10):
    model.train()
    losses = []
    
    for epoch in range(epochs):
        running_loss = 0.0
        
        for i, (data, labels) in enumerate(train_loader):
            # Get the pair of images and labels
            (img1, img2), labels = data, labels
            img1, img2, labels = img1.to(device), img2.to(device), labels.to(device)
            
            # Zero the parameter gradients
            optimizer.zero_grad()
            
            # Forward pass
            outputs = model(img1, img2)
            
            # Calculate loss
            loss = criterion(outputs, labels)
            
            # Backward pass and optimize
            loss.backward()
            optimizer.step()
            
            # Print statistics
            running_loss += loss.item()
            if i % 100 == 99:  # print every 100 mini-batches
                print(f'[{epoch + 1}, {i + 1}] loss: {running_loss / 100:.4f}')
                losses.append(running_loss / 100)
                running_loss = 0.0
                
        print(f'Epoch {epoch + 1} completed')
    
    print('Finished Training')
    return losses

Step 5: Setting up the Training Process

Let's set up the data loading and training process:

python
def main():
    # Set random seed for reproducibility
    torch.manual_seed(42)
    
    # Check if GPU is available
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    
    # Load MNIST dataset
    mnist_train = torchvision.datasets.MNIST(
        root='./data', train=True, download=True, transform=transforms.ToTensor())
    
    # Create Siamese dataset
    siamese_train_dataset = SiameseDataset(mnist_train)
    
    # Create data loaders
    train_loader = DataLoader(
        siamese_train_dataset, batch_size=64, shuffle=True, num_workers=4)
    
    # Initialize the model
    model = SiameseNetwork().to(device)
    
    # Define the loss function and optimizer
    criterion = nn.BCELoss()
    optimizer = optim.Adam(model.parameters(), lr=0.0005)
    
    # Train the model
    losses = train_siamese_network(model, train_loader, criterion, optimizer, device, epochs=10)
    
    # Plot the loss curve
    plt.figure(figsize=(10, 5))
    plt.plot(losses)
    plt.title('Training Loss')
    plt.xlabel('Iterations (x100)')
    plt.ylabel('Loss')
    plt.show()
    
    # Save the trained model
    torch.save(model.state_dict(), 'siamese_network.pth')

if __name__ == '__main__':
    main()

Visualizing Results

Let's implement a function to visualize how our Siamese Network performs:

python
def visualize_results(model, test_loader, device, num_samples=5):
    model.eval()
    fig, axes = plt.subplots(num_samples, 3, figsize=(10, num_samples * 3))
    
    with torch.no_grad():
        for i, (data, labels) in enumerate(test_loader):
            if i >= num_samples:
                break
                
            (img1, img2), label = data, labels
            img1, img2 = img1.to(device), img2.to(device)
            
            output = model(img1, img2)
            prediction = "Same" if output.item() > 0.5 else "Different"
            actual = "Same" if label.item() > 0.5 else "Different"
            
            # Display the images and results
            axes[i, 0].imshow(img1.cpu().squeeze(), cmap='gray')
            axes[i, 0].set_title('Image 1')
            axes[i, 0].axis('off')
            
            axes[i, 1].imshow(img2.cpu().squeeze(), cmap='gray')
            axes[i, 1].set_title('Image 2')
            axes[i, 1].axis('off')
            
            axes[i, 2].axis('off')
            axes[i, 2].text(0.5, 0.5, f"Prediction: {prediction}\nActual: {actual}\nConfidence: {output.item():.4f}", 
                           ha='center', va='center', fontsize=12)
    
    plt.tight_layout()
    plt.show()

The output from visualizing several pairs of digits might look like this:

Siamese Network Results

Real-World Applications

Siamese Networks are used in several real-world applications:

1. Face Recognition

Siamese Networks are widely used for facial recognition systems. Rather than classifying a fixed set of faces, they can determine if two face images belong to the same person, even if the system has never seen that person before.

python
# Example pseudo-code for face verification
def verify_face(known_face, new_face, siamese_model, threshold=0.7):
    similarity = siamese_model(known_face, new_face)
    if similarity > threshold:
        return "Same person"
    else:
        return "Different person"

2. Signature Verification

Banks and financial institutions use Siamese Networks to verify if a signature matches the one they have on record.

3. Product Recommendations

E-commerce platforms can use Siamese Networks to recommend similar products based on visual similarity.

4. Person Re-identification

Security systems use Siamese Networks to track individuals across multiple camera views.

5. One-shot Learning

In scenarios where we have very limited data per class, Siamese Networks can learn to recognize new categories from just one or a few examples.

Advanced Techniques with Siamese Networks

Triplet Loss

While our implementation uses contrastive loss, another popular approach is triplet loss. This involves training with triplets of examples:

Anchor (A): The reference example
Positive (P): An example of the same class as the anchor
Negative (N): An example of a different class from the anchor

The goal is to learn embeddings such that the distance between A and P is smaller than the distance between A and N by some margin:

python
class TripletLoss(nn.Module):
    def __init__(self, margin=1.0):
        super(TripletLoss, self).__init__()
        self.margin = margin
        
    def forward(self, anchor, positive, negative):
        distance_positive = (anchor - positive).pow(2).sum(1)
        distance_negative = (anchor - negative).pow(2).sum(1)
        loss = F.relu(distance_positive - distance_negative + self.margin)
        return loss.mean()

Siamese Network with Attention

Adding attention mechanisms to Siamese Networks can help focus on the most discriminative features:

python
class AttentionModule(nn.Module):
    def __init__(self, in_channels):
        super(AttentionModule, self).__init__()
        self.attention = nn.Sequential(
            nn.Conv2d(in_channels, 1, kernel_size=1),
            nn.Sigmoid()
        )
        
    def forward(self, x):
        attention_map = self.attention(x)
        return x * attention_map

Summary

In this tutorial, we've explored Siamese Networks - a powerful neural network architecture for similarity learning:

We learned the core concept of Siamese Networks: using identical neural networks with shared weights to compare inputs
We implemented a Siamese Network in PyTorch for comparing MNIST digits
We explored applications like face recognition, signature verification, and one-shot learning
We discussed advanced techniques like triplet loss and attention mechanisms

Siamese Networks are versatile tools for tasks where we need to measure similarity between examples, especially when we have limited training data or need to compare with previously unseen classes.

Additional Resources

Exercises

Modify the Siamese Network implementation to use triplet loss instead of contrastive loss
Implement a Siamese Network for a different dataset like CIFAR-10
Enhance the network architecture by adding more layers or using pre-trained models like ResNet as the feature extractor
Implement a Siamese Network for text similarity using word embeddings
Build a simple face verification system using the Siamese Network architecture

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Understanding Siamese Networks​

Core Concept​

Key Components​

Implementing a Siamese Network in PyTorch​

Step 1: Setting up the Environment​

Step 2: Creating a Custom Dataset​

Step 3: Building the Siamese Network Model​

Step 4: Implementing the Training Function​

Step 5: Setting up the Training Process​

Visualizing Results​

Real-World Applications​

1. Face Recognition​

2. Signature Verification​

3. Product Recommendations​

4. Person Re-identification​

5. One-shot Learning​

Advanced Techniques with Siamese Networks​

Triplet Loss​

Siamese Network with Attention​

Summary​

Additional Resources​

Exercises​