TensorFlow Recommenders
Recommendation systems are everywhere in our digital lives - from suggesting which products you might want to buy on e-commerce platforms to recommending which movies to watch on streaming services. TensorFlow Recommenders (TFRS) is a library that simplifies the process of building, evaluating, and deploying sophisticated recommendation models.
In this guide, we'll explore how to use TensorFlow Recommenders to build effective recommendation systems, even if you're just getting started with machine learning.
What is TensorFlow Recommenders?
TensorFlow Recommenders is an open-source library built on top of TensorFlow that provides tools and algorithms specifically designed for building recommendation systems. It was developed to address the unique challenges that come with recommendation tasks, such as:
- Working with sparse user-item interaction data
- Handling large-scale categorical features
- Balancing multiple objectives (relevance, diversity, freshness)
- Creating efficient retrieval systems that can operate on millions or billions of items
TFRS simplifies these tasks by providing modular components that can be easily combined to create custom recommendation models.
Getting Started with TensorFlow Recommenders
Installation
Before we dive into examples, let's install TensorFlow Recommenders:
pip install tensorflow-recommenders
You'll also need TensorFlow installed:
pip install tensorflow
Basic Imports
To begin using TFRS, we need to import the necessary libraries:
import tensorflow as tf
import tensorflow_recommenders as tfrs
import numpy as np
import pandas as pd
Building a Basic Recommendation Model
Let's start with a simple movie recommendation system using the MovieLens dataset. We'll build a model that learns to recommend movies to users based on their past interactions.
Step 1: Prepare the Data
First, let's load and prepare a small sample of the MovieLens dataset:
# Load the MovieLens 100K dataset
ratings = pd.read_csv('https://files.grouplens.org/datasets/movielens/ml-100k/u.data',
sep='\t', names=['user_id', 'movie_id', 'rating', 'timestamp'])
# Convert to tensors
user_ids = tf.convert_to_tensor(ratings['user_id'].unique(), dtype=tf.string)
movie_ids = tf.convert_to_tensor(ratings['movie_id'].unique(), dtype=tf.string)
# Create datasets of users and movies
users = tf.data.Dataset.from_tensor_slices(user_ids)
movies = tf.data.Dataset.from_tensor_slices(movie_ids)
# Create a dataset of (user_id, movie_id) pairs with positive ratings
positive_ratings = ratings[ratings['rating'] >= 4]
rating_pairs = tf.data.Dataset.from_tensor_slices({
"user_id": tf.cast(positive_ratings['user_id'].values, tf.string),
"movie_id": tf.cast(positive_ratings['movie_id'].values, tf.string),
})
# Shuffle the data and split into training and testing
tf.random.set_seed(42)
shuffled = rating_pairs.shuffle(100_000, seed=42, reshuffle_each_iteration=False)
train = shuffled.take(80_000)
test = shuffled.skip(80_000).take(20_000)
# Batch the data
train_batches = train.batch(8192).cache()
test_batches = test.batch(4096).cache()
Step 2: Define the Model
Now, let's define our recommendation model using TFRS:
class MovieRecommenderModel(tfrs.Model):
def __init__(self, user_model, movie_model, task):
super().__init__()
# Set up user and movie representations
self.user_model = user_model
self.movie_model = movie_model
# Set up a retrieval task
self.task = task
def compute_loss(self, features, training=False):
# Extract user and movie IDs
user_embeddings = self.user_model(features["user_id"])
movie_embeddings = self.movie_model(features["movie_id"])
# Compute the loss
return self.task(user_embeddings, movie_embeddings)
# Define embedding dimensions
embedding_dimension = 32
# Create user and movie models
user_model = tf.keras.Sequential([
tf.keras.layers.StringLookup(vocabulary=user_ids, mask_token=None),
tf.keras.layers.Embedding(len(user_ids) + 1, embedding_dimension)
])
movie_model = tf.keras.Sequential([
tf.keras.layers.StringLookup(vocabulary=movie_ids, mask_token=None),
tf.keras.layers.Embedding(len(movie_ids) + 1, embedding_dimension)
])
# Define the task
task = tfrs.tasks.Retrieval(
metrics=tfrs.metrics.FactorizedTopK(
candidates=movies.batch(128).map(movie_model)
)
)
# Create the model
model = MovieRecommenderModel(user_model, movie_model, task)
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))
Step 3: Train the Model
Now let's train our recommendation model:
# Train the model
history = model.fit(train_batches, validation_data=test_batches, epochs=5)
# Output example:
# Epoch 1/5
# 10/10 [==============================] - 3s 241ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0140 - factorized_top_k/top_5_categorical_accuracy: 0.0658 - factorized_top_k/top_10_categorical_accuracy: 0.1222 - factorized_top_k/top_50_categorical_accuracy: 0.4291 - factorized_top_k/top_100_categorical_accuracy: 0.6192 - loss: 68233.9688 - val_factorized_top_k/top_1_categorical_accuracy: 0.0153 - val_factorized_top_k/top_5_categorical_accuracy: 0.0722 - val_factorized_top_k/top_10_categorical_accuracy: 0.1325 - val_factorized_top_k/top_50_categorical_accuracy: 0.4539 - val_factorized_top_k/top_100_categorical_accuracy: 0.6385 - val_loss: 16499.2715
# ...
# Epoch 5/5
# 10/10 [==============================] - 2s 229ms/step - factorized_top_k/top_1_categorical_accuracy: 0.0425 - factorized_top_k/top_5_categorical_accuracy: 0.1596 - factorized_top_k/top_10_categorical_accuracy: 0.2631 - factorized_top_k/top_50_categorical_accuracy: 0.6608 - factorized_top_k/top_100_categorical_accuracy: 0.8288 - loss: 50321.1562 - val_factorized_top_k/top_1_categorical_accuracy: 0.0440 - val_factorized_top_k/top_5_categorical_accuracy: 0.1650 - val_factorized_top_k/top_10_categorical_accuracy: 0.2685 - val_factorized_top_k/top_50_categorical_accuracy: 0.6697 - val_factorized_top_k/top_100_categorical_accuracy: 0.8349 - val_loss: 12341.4043
Step 4: Create a Recommender
Once the model is trained, we can create a recommender for making predictions:
# Create a model for recommendations
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
# Recommend the top 10 movies for each user
index.index_from_dataset(
tf.data.Dataset.zip((
movies.batch(100),
movies.batch(100).map(model.movie_model)
))
)
# Get recommendations for a specific user
user_id = "42" # Example user ID
_, titles = index(tf.constant([user_id]))
print(f"Recommendations for user {user_id}: {titles[0, :10].numpy()}")
# Output:
# Recommendations for user 42: ['318' '169' '222' '173' '733' '174' '181' '313' '234' '903']
Advanced Techniques with TensorFlow Recommenders
Now that we've built a basic model, let's explore some more advanced techniques that TFRS offers.
Two-Tower Model for Content-Based Recommendations
For content-based recommendations, we can use a two-tower model that incorporates item features:
# Let's assume we have movie features
movie_titles = tf.data.Dataset.from_tensor_slices({
'movie_id': tf.cast(movies_df['movie_id'].values, tf.string),
'title': tf.cast(movies_df['title'].values, tf.string),
'genres': tf.cast(movies_df['genres'].values, tf.string),
})
# Define a more complex movie tower
class MovieModel(tf.keras.Model):
def __init__(self):
super().__init__()
# Movie ID embedding
self.movie_id_embedding = tf.keras.Sequential([
tf.keras.layers.StringLookup(vocabulary=movie_ids, mask_token=None),
tf.keras.layers.Embedding(len(movie_ids) + 1, 32)
])
# Genre embedding
self.genre_embedding = tf.keras.Sequential([
tf.keras.layers.TextVectorization(max_tokens=1000),
tf.keras.layers.Embedding(1000, 16),
tf.keras.layers.GlobalAveragePooling1D()
])
# Combine embeddings
self.combine = tf.keras.layers.Dense(32, activation="relu")
def call(self, inputs):
movie_id = self.movie_id_embedding(inputs["movie_id"])
genre = self.genre_embedding(inputs["genres"])
# Combine features
combined = tf.concat([movie_id, genre], axis=1)
return self.combine(combined)
# Use this model in our recommender system
complex_movie_model = MovieModel()
Multi-Task Learning
TFRS also supports multi-task learning, where we can optimize for multiple objectives simultaneously:
class MultiTaskModel(tfrs.Model):
def __init__(self, user_model, movie_model):
super().__init__()
self.user_model = user_model
self.movie_model = movie_model
# Define tasks
self.retrieval_task = tfrs.tasks.Retrieval(
metrics=tfrs.metrics.FactorizedTopK(
candidates=movies.batch(128).map(movie_model)
)
)
self.rating_task = tfrs.tasks.Ranking(
metrics=[tf.keras.metrics.RootMeanSquaredError()]
)
self.rating_model = tf.keras.Sequential([
tf.keras.layers.Dense(256, activation="relu"),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dense(1)
])
def call(self, features):
user_embeddings = self.user_model(features["user_id"])
movie_embeddings = self.movie_model(features["movie_id"])
return (
user_embeddings,
movie_embeddings,
self.rating_model(tf.concat([user_embeddings, movie_embeddings], axis=1))
)
def compute_loss(self, features, training=False):
ratings = features.pop("rating")
user_embeddings, movie_embeddings, rating_predictions = self(features)
# Calculate retrieval loss
retrieval_loss = self.retrieval_task(user_embeddings, movie_embeddings)
# Calculate rating loss
rating_loss = self.rating_task(
labels=ratings,
predictions=rating_predictions
)
# Combine losses with weights
return retrieval_loss + rating_loss
Real-World Application: E-Commerce Product Recommendations
Now let's see how TFRS can be applied in a real-world e-commerce scenario where we want to recommend products to users based on their browsing history.
# Define model layers for user features
class UserModel(tf.keras.Model):
def __init__(self):
super().__init__()
# Demographics embedding
self.age_embedding = tf.keras.layers.Embedding(100, 16) # Age bucketized
self.gender_embedding = tf.keras.layers.Embedding(3, 8) # M/F/Unknown
# Browsing history embedding
self.history_embedding = tf.keras.Sequential([
tf.keras.layers.StringLookup(vocabulary=product_ids, mask_token=None),
tf.keras.layers.Embedding(len(product_ids) + 1, 32)
])
# Pooling layer for history
self.history_pooling = tf.keras.layers.GlobalAveragePooling1D()
# Combine all features
self.combine = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation="relu"),
tf.keras.layers.Dense(32)
])
def call(self, inputs):
# Process user features
age = self.age_embedding(inputs["age_bucket"])
gender = self.gender_embedding(inputs["gender"])
# Process browsing history
history = self.history_embedding(inputs["product_history"])
history = self.history_pooling(history)
# Combine features
combined = tf.concat([age, gender, history], axis=1)
return self.combine(combined)
# Product model with features
class ProductModel(tf.keras.Model):
def __init__(self):
super().__init__()
# Product ID embedding
self.product_embedding = tf.keras.Sequential([
tf.keras.layers.StringLookup(vocabulary=product_ids, mask_token=None),
tf.keras.layers.Embedding(len(product_ids) + 1, 32)
])
# Category embedding
self.category_embedding = tf.keras.Sequential([
tf.keras.layers.StringLookup(vocabulary=category_ids, mask_token=None),
tf.keras.layers.Embedding(len(category_ids) + 1, 16)
])
# Text description embedding
self.text_embedding = tf.keras.Sequential([
tf.keras.layers.TextVectorization(max_tokens=10000),
tf.keras.layers.Embedding(10000, 32),
tf.keras.layers.GlobalAveragePooling1D()
])
# Combine features
self.combine = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation="relu"),
tf.keras.layers.Dense(32)
])
def call(self, inputs):
# Process product features
product = self.product_embedding(inputs["product_id"])
category = self.category_embedding(inputs["category"])
description = self.text_embedding(inputs["description"])
# Combine features
combined = tf.concat([product, category, description], axis=1)
return self.combine(combined)
# Create a retrieval model
ecommerce_model = tfrs.models.Model(
user_model=UserModel(),
item_model=ProductModel(),
task=tfrs.tasks.Retrieval(
metrics=tfrs.metrics.FactorizedTopK(
candidates=products.batch(128).map(ProductModel())
)
)
)
# Compile the model
ecommerce_model.compile(optimizer=tf.keras.optimizers.Adam(0.01))
Deploying TensorFlow Recommenders Models
After training your recommender model, you'll want to deploy it to serve recommendations. Here's a basic approach using TensorFlow Serving:
# Save the model
tf.saved_model.save(
index,
"path/to/export/dir"
)
# For deployment, you can use TensorFlow Serving
# docker run -t --rm -p 8501:8501 \
# -v "path/to/export/dir:/models/recommender" \
# -e MODEL_NAME=recommender \
# tensorflow/serving
Then you can make API requests to get recommendations:
import requests
import json
data = {
"instances": [
{"user_id": "42"}
]
}
response = requests.post(
"http://localhost:8501/v1/models/recommender:predict",
data=json.dumps(data)
)
print(response.json())
Best Practices for Building Recommendation Systems
When building recommendation systems with TFRS, keep these best practices in mind:
- Data quality is crucial - Clean your data and handle missing values appropriately
- Balance recency and relevance - Consider time-decay factors for older interactions
- Evaluate with proper metrics - Use metrics that align with your business goals
- Handle the cold start problem - Have strategies for new users and items
- Consider diversity and fairness - Avoid filter bubbles by introducing some novelty
- Monitor performance over time - Data drift can affect recommendation quality
- A/B test before deployment - Compare your new model against existing systems
Summary
In this guide, we've explored TensorFlow Recommenders (TFRS), a powerful library for building recommendation systems. We've covered:
- Basic concepts of recommendation systems
- Building a simple movie recommender with TFRS
- Advanced techniques like two-tower models and multi-task learning
- A real-world e-commerce recommendation example
- Deployment strategies
- Best practices for effective recommendation systems
TensorFlow Recommenders makes it easier to implement complex recommendation models that can scale to millions of users and items. By combining deep learning with specialized recommendation components, TFRS helps you create personalized experiences for your users.
Additional Resources
- TensorFlow Recommenders Official Documentation
- TensorFlow Recommenders GitHub Repository
- Recommendation Systems with TensorFlow on Google Cloud
- End-to-End Movie Recommendation System Tutorial
Exercises
- Starter Project: Modify the simple movie recommender to include movie genres as features
- Intermediate Project: Build a content-based book recommender using book descriptions and author information
- Advanced Project: Create a hybrid recommendation system that combines collaborative filtering with content-based approaches using TFRS's multi-task capabilities
By working through these exercises, you'll gain practical experience with TensorFlow Recommenders and develop the skills needed to build effective recommendation systems for real-world applications.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)