TensorFlow Recommenders

Recommendation systems are everywhere in our digital lives - from suggesting which products you might want to buy on e-commerce platforms to recommending which movies to watch on streaming services. TensorFlow Recommenders (TFRS) is a library that simplifies the process of building, evaluating, and deploying sophisticated recommendation models.

In this guide, we'll explore how to use TensorFlow Recommenders to build effective recommendation systems, even if you're just getting started with machine learning.

What is TensorFlow Recommenders?

TensorFlow Recommenders is an open-source library built on top of TensorFlow that provides tools and algorithms specifically designed for building recommendation systems. It was developed to address the unique challenges that come with recommendation tasks, such as:

Working with sparse user-item interaction data
Handling large-scale categorical features
Balancing multiple objectives (relevance, diversity, freshness)
Creating efficient retrieval systems that can operate on millions or billions of items

TFRS simplifies these tasks by providing modular components that can be easily combined to create custom recommendation models.

Getting Started with TensorFlow Recommenders

Installation

Before we dive into examples, let's install TensorFlow Recommenders:

bash
pip install tensorflow-recommenders

You'll also need TensorFlow installed:

bash
pip install tensorflow

Basic Imports

To begin using TFRS, we need to import the necessary libraries:

python
import tensorflow as tf
import tensorflow_recommenders as tfrs
import numpy as np
import pandas as pd

Building a Basic Recommendation Model

Let's start with a simple movie recommendation system using the MovieLens dataset. We'll build a model that learns to recommend movies to users based on their past interactions.

Step 1: Prepare the Data

First, let's load and prepare a small sample of the MovieLens dataset:

python
# Load the MovieLens 100K dataset
ratings = pd.read_csv('https://files.grouplens.org/datasets/movielens/ml-100k/u.data', 
                     sep='\t', names=['user_id', 'movie_id', 'rating', 'timestamp'])

# Convert to tensors
user_ids = tf.convert_to_tensor(ratings['user_id'].unique(), dtype=tf.string)
movie_ids = tf.convert_to_tensor(ratings['movie_id'].unique(), dtype=tf.string)

# Create datasets of users and movies
users = tf.data.Dataset.from_tensor_slices(user_ids)
movies = tf.data.Dataset.from_tensor_slices(movie_ids)

# Create a dataset of (user_id, movie_id) pairs with positive ratings
positive_ratings = ratings[ratings['rating'] >= 4]
rating_pairs = tf.data.Dataset.from_tensor_slices({
    "user_id": tf.cast(positive_ratings['user_id'].values, tf.string),
    "movie_id": tf.cast(positive_ratings['movie_id'].values, tf.string),
})

# Shuffle the data and split into training and testing
tf.random.set_seed(42)
shuffled = rating_pairs.shuffle(100_000, seed=42, reshuffle_each_iteration=False)

train = shuffled.take(80_000)
test = shuffled.skip(80_000).take(20_000)

# Batch the data
train_batches = train.batch(8192).cache()
test_batches = test.batch(4096).cache()

Step 2: Define the Model

Now, let's define our recommendation model using TFRS:

python
class MovieRecommenderModel(tfrs.Model):
    def __init__(self, user_model, movie_model, task):
        super().__init__()
        
        # Set up user and movie representations
        self.user_model = user_model
        self.movie_model = movie_model
        
        # Set up a retrieval task
        self.task = task
        
    def compute_loss(self, features, training=False):
        # Extract user and movie IDs
        user_embeddings = self.user_model(features["user_id"])
        movie_embeddings = self.movie_model(features["movie_id"])
        
        # Compute the loss
        return self.task(user_embeddings, movie_embeddings)

# Define embedding dimensions
embedding_dimension = 32

# Create user and movie models
user_model = tf.keras.Sequential([
    tf.keras.layers.StringLookup(vocabulary=user_ids, mask_token=None),
    tf.keras.layers.Embedding(len(user_ids) + 1, embedding_dimension)
])

movie_model = tf.keras.Sequential([
    tf.keras.layers.StringLookup(vocabulary=movie_ids, mask_token=None),
    tf.keras.layers.Embedding(len(movie_ids) + 1, embedding_dimension)
])

# Define the task
task = tfrs.tasks.Retrieval(
    metrics=tfrs.metrics.FactorizedTopK(
        candidates=movies.batch(128).map(movie_model)
    )
)

# Create the model
model = MovieRecommenderModel(user_model, movie_model, task)
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

Step 3: Train the Model

Now let's train our recommendation model:

python
# Train the model
history = model.fit(train_batches, validation_data=test_batches, epochs=5)

# Output example:
# Epoch 1/5
# 10/10 [==============================] - 3s 241ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0140 - factorized_top_k/top_5_categorical_accuracy: 0.0658 - factorized_top_k/top_10_categorical_accuracy: 0.1222 - factorized_top_k/top_50_categorical_accuracy: 0.4291 - factorized_top_k/top_100_categorical_accuracy: 0.6192 - loss: 68233.9688 - val_factorized_top_k/top_1_categorical_accuracy: 0.0153 - val_factorized_top_k/top_5_categorical_accuracy: 0.0722 - val_factorized_top_k/top_10_categorical_accuracy: 0.1325 - val_factorized_top_k/top_50_categorical_accuracy: 0.4539 - val_factorized_top_k/top_100_categorical_accuracy: 0.6385 - val_loss: 16499.2715
# ...
# Epoch 5/5
# 10/10 [==============================] - 2s 229ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0425 - factorized_top_k/top_5_categorical_accuracy: 0.1596 - factorized_top_k/top_10_categorical_accuracy: 0.2631 - factorized_top_k/top_50_categorical_accuracy: 0.6608 - factorized_top_k/top_100_categorical_accuracy: 0.8288 - loss: 50321.1562 - val_factorized_top_k/top_1_categorical_accuracy: 0.0440 - val_factorized_top_k/top_5_categorical_accuracy: 0.1650 - val_factorized_top_k/top_10_categorical_accuracy: 0.2685 - val_factorized_top_k/top_50_categorical_accuracy: 0.6697 - val_factorized_top_k/top_100_categorical_accuracy: 0.8349 - val_loss: 12341.4043

Step 4: Create a Recommender

Once the model is trained, we can create a recommender for making predictions:

python
# Create a model for recommendations
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)

# Recommend the top 10 movies for each user
index.index_from_dataset(
    tf.data.Dataset.zip((
        movies.batch(100),
        movies.batch(100).map(model.movie_model)
    ))
)

# Get recommendations for a specific user
user_id = "42"  # Example user ID
_, titles = index(tf.constant([user_id]))
print(f"Recommendations for user {user_id}: {titles[0, :10].numpy()}")

# Output:
# Recommendations for user 42: ['318' '169' '222' '173' '733' '174' '181' '313' '234' '903']

Advanced Techniques with TensorFlow Recommenders

Now that we've built a basic model, let's explore some more advanced techniques that TFRS offers.

Two-Tower Model for Content-Based Recommendations

For content-based recommendations, we can use a two-tower model that incorporates item features:

python
# Let's assume we have movie features
movie_titles = tf.data.Dataset.from_tensor_slices({
    'movie_id': tf.cast(movies_df['movie_id'].values, tf.string),
    'title': tf.cast(movies_df['title'].values, tf.string),
    'genres': tf.cast(movies_df['genres'].values, tf.string),
})

# Define a more complex movie tower
class MovieModel(tf.keras.Model):
    def __init__(self):
        super().__init__()
        
        # Movie ID embedding
        self.movie_id_embedding = tf.keras.Sequential([
            tf.keras.layers.StringLookup(vocabulary=movie_ids, mask_token=None),
            tf.keras.layers.Embedding(len(movie_ids) + 1, 32)
        ])
        
        # Genre embedding
        self.genre_embedding = tf.keras.Sequential([
            tf.keras.layers.TextVectorization(max_tokens=1000),
            tf.keras.layers.Embedding(1000, 16),
            tf.keras.layers.GlobalAveragePooling1D()
        ])
        
        # Combine embeddings
        self.combine = tf.keras.layers.Dense(32, activation="relu")
        
    def call(self, inputs):
        movie_id = self.movie_id_embedding(inputs["movie_id"])
        genre = self.genre_embedding(inputs["genres"])
        
        # Combine features
        combined = tf.concat([movie_id, genre], axis=1)
        return self.combine(combined)

# Use this model in our recommender system
complex_movie_model = MovieModel()

Multi-Task Learning

TFRS also supports multi-task learning, where we can optimize for multiple objectives simultaneously:

python
class MultiTaskModel(tfrs.Model):
    def __init__(self, user_model, movie_model):
        super().__init__()
        self.user_model = user_model
        self.movie_model = movie_model
        
        # Define tasks
        self.retrieval_task = tfrs.tasks.Retrieval(
            metrics=tfrs.metrics.FactorizedTopK(
                candidates=movies.batch(128).map(movie_model)
            )
        )
        
        self.rating_task = tfrs.tasks.Ranking(
            metrics=[tf.keras.metrics.RootMeanSquaredError()]
        )
        
        self.rating_model = tf.keras.Sequential([
            tf.keras.layers.Dense(256, activation="relu"),
            tf.keras.layers.Dense(128, activation="relu"),
            tf.keras.layers.Dense(1)
        ])
        
    def call(self, features):
        user_embeddings = self.user_model(features["user_id"])
        movie_embeddings = self.movie_model(features["movie_id"])
        
        return (
            user_embeddings,
            movie_embeddings,
            self.rating_model(tf.concat([user_embeddings, movie_embeddings], axis=1))
        )
        
    def compute_loss(self, features, training=False):
        ratings = features.pop("rating")
        
        user_embeddings, movie_embeddings, rating_predictions = self(features)
        
        # Calculate retrieval loss
        retrieval_loss = self.retrieval_task(user_embeddings, movie_embeddings)
        
        # Calculate rating loss
        rating_loss = self.rating_task(
            labels=ratings,
            predictions=rating_predictions
        )
        
        # Combine losses with weights
        return retrieval_loss + rating_loss

Real-World Application: E-Commerce Product Recommendations

Now let's see how TFRS can be applied in a real-world e-commerce scenario where we want to recommend products to users based on their browsing history.

python
# Define model layers for user features
class UserModel(tf.keras.Model):
    def __init__(self):
        super().__init__()
        # Demographics embedding
        self.age_embedding = tf.keras.layers.Embedding(100, 16)  # Age bucketized
        self.gender_embedding = tf.keras.layers.Embedding(3, 8)  # M/F/Unknown
        
        # Browsing history embedding
        self.history_embedding = tf.keras.Sequential([
            tf.keras.layers.StringLookup(vocabulary=product_ids, mask_token=None),
            tf.keras.layers.Embedding(len(product_ids) + 1, 32)
        ])
        
        # Pooling layer for history
        self.history_pooling = tf.keras.layers.GlobalAveragePooling1D()
        
        # Combine all features
        self.combine = tf.keras.Sequential([
            tf.keras.layers.Dense(64, activation="relu"),
            tf.keras.layers.Dense(32)
        ])
        
    def call(self, inputs):
        # Process user features
        age = self.age_embedding(inputs["age_bucket"])
        gender = self.gender_embedding(inputs["gender"])
        
        # Process browsing history
        history = self.history_embedding(inputs["product_history"])
        history = self.history_pooling(history)
        
        # Combine features
        combined = tf.concat([age, gender, history], axis=1)
        return self.combine(combined)

# Product model with features
class ProductModel(tf.keras.Model):
    def __init__(self):
        super().__init__()
        # Product ID embedding
        self.product_embedding = tf.keras.Sequential([
            tf.keras.layers.StringLookup(vocabulary=product_ids, mask_token=None),
            tf.keras.layers.Embedding(len(product_ids) + 1, 32)
        ])
        
        # Category embedding
        self.category_embedding = tf.keras.Sequential([
            tf.keras.layers.StringLookup(vocabulary=category_ids, mask_token=None),
            tf.keras.layers.Embedding(len(category_ids) + 1, 16)
        ])
        
        # Text description embedding
        self.text_embedding = tf.keras.Sequential([
            tf.keras.layers.TextVectorization(max_tokens=10000),
            tf.keras.layers.Embedding(10000, 32),
            tf.keras.layers.GlobalAveragePooling1D()
        ])
        
        # Combine features
        self.combine = tf.keras.Sequential([
            tf.keras.layers.Dense(64, activation="relu"),
            tf.keras.layers.Dense(32)
        ])
        
    def call(self, inputs):
        # Process product features
        product = self.product_embedding(inputs["product_id"])
        category = self.category_embedding(inputs["category"])
        description = self.text_embedding(inputs["description"])
        
        # Combine features
        combined = tf.concat([product, category, description], axis=1)
        return self.combine(combined)

# Create a retrieval model
ecommerce_model = tfrs.models.Model(
    user_model=UserModel(),
    item_model=ProductModel(),
    task=tfrs.tasks.Retrieval(
        metrics=tfrs.metrics.FactorizedTopK(
            candidates=products.batch(128).map(ProductModel())
        )
    )
)

# Compile the model
ecommerce_model.compile(optimizer=tf.keras.optimizers.Adam(0.01))

Deploying TensorFlow Recommenders Models

After training your recommender model, you'll want to deploy it to serve recommendations. Here's a basic approach using TensorFlow Serving:

python
# Save the model
tf.saved_model.save(
    index,
    "path/to/export/dir"
)

# For deployment, you can use TensorFlow Serving
# docker run -t --rm -p 8501:8501 \
#   -v "path/to/export/dir:/models/recommender" \
#   -e MODEL_NAME=recommender \
#   tensorflow/serving

Then you can make API requests to get recommendations:

python
import requests
import json

data = {
    "instances": [
        {"user_id": "42"}
    ]
}

response = requests.post(
    "http://localhost:8501/v1/models/recommender:predict", 
    data=json.dumps(data)
)

print(response.json())

Best Practices for Building Recommendation Systems

When building recommendation systems with TFRS, keep these best practices in mind:

Data quality is crucial - Clean your data and handle missing values appropriately
Balance recency and relevance - Consider time-decay factors for older interactions
Evaluate with proper metrics - Use metrics that align with your business goals
Handle the cold start problem - Have strategies for new users and items
Consider diversity and fairness - Avoid filter bubbles by introducing some novelty
Monitor performance over time - Data drift can affect recommendation quality
A/B test before deployment - Compare your new model against existing systems

Summary

In this guide, we've explored TensorFlow Recommenders (TFRS), a powerful library for building recommendation systems. We've covered:

Basic concepts of recommendation systems
Building a simple movie recommender with TFRS
Advanced techniques like two-tower models and multi-task learning
A real-world e-commerce recommendation example
Deployment strategies
Best practices for effective recommendation systems

TensorFlow Recommenders makes it easier to implement complex recommendation models that can scale to millions of users and items. By combining deep learning with specialized recommendation components, TFRS helps you create personalized experiences for your users.

Additional Resources

Exercises

Starter Project: Modify the simple movie recommender to include movie genres as features
Intermediate Project: Build a content-based book recommender using book descriptions and author information
Advanced Project: Create a hybrid recommendation system that combines collaborative filtering with content-based approaches using TFRS's multi-task capabilities

By working through these exercises, you'll gain practical experience with TensorFlow Recommenders and develop the skills needed to build effective recommendation systems for real-world applications.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

What is TensorFlow Recommenders?​

Getting Started with TensorFlow Recommenders​

Installation​

Basic Imports​

Building a Basic Recommendation Model​

Step 1: Prepare the Data​

Step 2: Define the Model​

Step 3: Train the Model​

Step 4: Create a Recommender​

Advanced Techniques with TensorFlow Recommenders​

Two-Tower Model for Content-Based Recommendations​

Multi-Task Learning​

Real-World Application: E-Commerce Product Recommendations​

Deploying TensorFlow Recommenders Models​

Best Practices for Building Recommendation Systems​

Summary​

Additional Resources​

Exercises​