The Surprisingly Lucrative Journey of Bootstrapping a Brand Recommender System: From Chaos to Cash

While everyone seems captivated by AGI and busy building Just-Another-Chatbot (JAC^TM), they’re overlooking the real problems that can be solved (and the money to be made) through practical ML Engineering. In this inaugural series of blog posts, I’ll dive deep into one such problem: bootstrapping and building a brand recommender system from the ground up. Drawing from my experience as an engineer, executive, and consultant across multiple consumer tech companies and marketplaces, I’ll guide you through the process of creating a recommendation engine that not only predicts customer preferences but also enhances their overall shopping experience.

We’ll explore the journey from raw data to a sophisticated system that can significantly boost your bottom line while offering your customers a personalized and delightful experience. So, let’s embark on this data-driven adventure and unlock the potential of personalized recommendations in e-commerce. By the end of this series, you’ll have the tools to transform your e-commerce platform into a cash machine that keeps both your customers and CFOs happy.

The Problem: E-commerce Is a Jungle (And Your Customers Are Lost)

Imagine you’ve just launched an e-commerce platform. It’s sleek, efficient, and boasts an impressive array of brands. Initially, you’re confident in its success. However, user feedback quickly reveals a common challenge in the e-commerce world:

“I’m overwhelmed by the number of options.”
“The recommendations don’t seem relevant to my interests.”
“I can’t find products that match my specific needs.”

These concerns are not uncommon. In the realm of online retail, the balance between variety and accessibility is crucial. Insufficient options can leave customers feeling limited, while an overabundance can lead to decision fatigue. This phenomenon, often referred to as the “paradox of choice,” can significantly impact user experience and, consequently, your conversion rates.

This, my friends, is why a good recommendation system is invaluable. It’s like having a wise, all-knowing friend who gently guides your customers to their next favorite purchase. And today, we’re going to build that friend from scratch.

Step 0: The “Oh Crap, We Have No Data” Phase - Totally Random Model

Let’s start at the beginning (a very good place to start…). You’ve just launched your e-commerce platform. It looks great, it works smoothly, but there’s one big problem: you have no data. Your recommendation system is a blank slate. What do you do?

Take a deep breath and repeat after me: “Random is better than nothing.” Now, I know what you’re thinking. “But this is just throwing darts blindfolded! How is this helping anyone?” And you’re right, it’s not ideal. But here’s the secret: it’s not about being perfect; it’s about starting the flywheel.

Every time a user sees a random recommendation, you’re gathering data. Maybe they ignore it (data point!). Maybe they click on it (data point!). Maybe they buy it (cha-ching and data point!). Every product interaction (or lack thereof) is a breadcrumb that will lead you out of the data desert.

Let’s whip up a quick Python function to generate random recommendations:

import random

def get_random_recommendations(all_brands, n=5):
    return random.sample(all_brands, min(n, len(all_brands)))

# Example usage
all_brands = df['recommending_brand'].unique().tolist()
print(get_random_recommendations(all_brands))

Pro tip: While you’re showing random recommendations, make sure you’re logging EVERYTHING. Every view, every click, every purchase. This data will be worth its weight in gold later on. Trust me, future you will thank present you for this foresight. And while you are at it do make sure that the data is high quality. Algorithms are fickle and state of the art on those changes every week – nay day! But poor quality data once logged sets the ceiling on what you can do with it.

Step 1: The “We Have Some Data, But It’s Not About Users” Phase - Feature-Based Clustering Model

Alright, so you’ve been running your random recommendation engine for a while. You’ve got some sales, you’ve got some brand data, but you still don’t have enough user interaction data to build a proper collaborative filtering system. Don’t worry, we’re going to make lemonade out of these lemons.

Enter: Feature-Based Clustering.

Now, gather ’round, because I’m about to share a secret: brands, like people, have personalities. And just like you wouldn’t set up your quiet, bookish friend with your party-animal cousin (trust me, I’ve made that mistake), you shouldn’t be recommending wildly dissimilar brands to your users.

Let’s create a simple example using K-means clustering. Don’t let the fancy name scare you - it’s just a way of grouping similar things together.

from sklearn.cluster import KMeans
import numpy as np

# Example brand features (price, target_age, sportiness)
brand_features = {
    "Nike": [80, 25, 9],
    "Adidas": [75, 30, 8],
    "Puma": [60, 28, 7],
    "Reebok": [65, 35, 6],
    "Under Armour": [70, 27, 9],
    "New Balance": [85, 40, 5],
    "Asics": [90, 35, 8],
    "Converse": [55, 22, 3],
    "Vans": [50, 20, 2],
    "Skechers": [45, 45, 4]
}

def cluster_brands(brand_features, n_clusters=3):
    brands = list(brand_features.keys())
    features = np.array(list(brand_features.values()))
    
    kmeans = KMeans(n_clusters=n_clusters, random_state=42)
    clusters = kmeans.fit_predict(features)
    
    brand_clusters = {brand: cluster for brand, cluster in zip(brands, clusters)}
    return brand_clusters

brand_clusters = cluster_brands(brand_features)
print(brand_clusters)

def get_cluster_recommendations(purchased_brand, brand_clusters, n=5):
    cluster = brand_clusters[purchased_brand]
    cluster_brands = [brand for brand, c in brand_clusters.items() if c == cluster]
    return random.sample(cluster_brands, min(n, len(cluster_brands)))

# Example usage
purchased_brand = "Nike"
print(get_cluster_recommendations(purchased_brand, brand_clusters))

Let me share a real-world example that illustrates the power of this approach. Back in the day I worked with a startup that had a range of products - everything from $5 friendship bracelets that screamed “summer camp chic” to $500 mid-century modern chairs that whispered “I have a trust fund.” Their recommendation system? About as sophisticated as a Magic 8-Ball with a hangover. It was the digital equivalent of that one clueless sales associate who tries to upsell you a tuxedo when you’re shopping for gym shorts.

The result? Their conversion rate was lower than my undergrad GPA (and trust me, that’s saying something). They addressed this by implementing a clustering system based on various product attributes. We’re talking price, category, style, color palette - if it could be quantified, they clustered it. And boom! Faster than you can say “artisanal hand-knitted cat sweater,” their click-through rates went through the roof! Their engineering blog was practically giddy with excitement (in that restrained, data-scientist kind of way) about the boost in overall sales.

Now a good data science algorithm needs to do at least two things (and definitely the second of these two): 1. It needs needs to improve a KPI - in this case conversions, although it can be anything. (e.g. if you are a Ferengi it will invariably be profits as laid out in the Ferengi Rules of Acquisition) 2. Regardless of whether it does 1. it needs to light the way to how you might improve the KPI in the future.

In this case, it wasn’t just the algorithm doing the work automatically, but the entire system we built around it. We maintained a degree of randomness in our recommendations to continue generating new data and learning from it. It’s like saying, “We know you’re interested in vintage decor, but have you considered this modern minimalist piece?” This approach not only serves immediate customer interests but also introduces them to new products they might enjoy. The lesson here? Even a relatively simple clustering approach can significantly improve your recommendation engine. It’s not about having the fanciest algorithm on the block; it’s about understanding your customers and not trying to sell snowshoes to someone shopping for flip-flops. So, whether you’re dealing with luxury items and budget options, or niche products and mainstream goods, remember: cluster wisely, but maintain some variety. Your conversion rates (and your customers) will thank you.

Step 2: The “Now We’re Cooking with Gas” Phase - Purchase-Based Association Model

Alright, my data-hungry friends, we’ve arrived at the juicy part. You’ve been diligently collecting user interaction data (you have, haven’t you?), and now it’s time to put it to use. We’re going to build a purchase-based association model.

This is where the magic really starts to happen. We’re going to create a system that understands that people who buy brand A often buy brand B, even if we don’t know why. It’s like being a really good matchmaker without understanding the intricacies of human psychology.

Let’s cook up a simple association model:

from collections import defaultdict

def build_association_model(purchase_data):
    associations = defaultdict(lambda: defaultdict(int))
    for purchase in purchase_data:
        for i, brand1 in enumerate(purchase):
            for brand2 in purchase[i+1:]:
                associations[brand1][brand2] += 1
                associations[brand2][brand1] += 1
    return associations

# Example purchase data
purchase_data = [
    ["Nike", "Adidas"],
    ["Nike", "Under Armour"],
    ["Adidas", "Puma"],
    ["Puma", "Reebok"],
    ["Nike", "Converse"],
    ["Vans", "Converse"],
    ["New Balance", "Asics"],
    ["Skechers", "New Balance"]
]

association_model = build_association_model(purchase_data)

def get_associated_brands(brand, association_model, n=5):
    associated = sorted(association_model[brand].items(), key=lambda x: x[1], reverse=True)
    return [b for b, _ in associated[:n]]

# Example usage
purchased_brand = "Nike"
print(get_associated_brands(purchased_brand, association_model))

Now, let me tell you why this is a game-changer. Consider the case of Amazon, the e-commerce giant. In their early days, they primarily sold books. But as they expanded into other product categories, they faced a massive challenge: how to effectively cross-sell across these diverse categories? Their solution was to implement a sophisticated association model, much like the one we’ve just built (though admittedly, theirs was far more complex). This “item-to-item collaborative filtering” approach, as they called it, allowed them to say, “Customers who bought this item also bought…”

The impact was significant. According to a paper published by Amazon’s engineers in 2003 titled “Amazon.com Recommendations: Item-to-Item Collaborative Filtering”, this recommendation system offered substantial advantages over traditional collaborative filtering techniques as it could:

Handle a massive scale of data - tens of millions of customers and millions of catalog items.
Produce high-quality recommendations in real-time, scaling well to sudden spikes in traffic.
Recommend across diverse product categories, from books to electronics to clothing.

While the paper doesn’t provide specific sales figures, it does mention that Amazon’s recommendation system significantly improved click-through and conversion rates compared to untargeted content such as top sellers.

Step 3: The “Six Degrees of Kevin Bacon” Phase - Transitive Association Model

Ever played “Six Degrees of Kevin Bacon”? Well, we’re about to do something similar with our brands, and it’s going to blow your recommendation socks off.

Direct associations are great, but they’re limited. What if we could create a web of associations, where brand A is connected to brand B, which is connected to brand C, creating an indirect link between A and C? It’s like being at a party where your network explodes as friends introduce you to their friends.

By considering second-order and third-order connections, we can uncover hidden relationships in our data, leading to nuanced and unexpected recommendations. Here’s how to do it in < 10 lines of code:

def expand_adjacency_matrix(adj_matrix, max_order=5, weight_factor=0.5):
    expanded_matrix = adj_matrix.copy()
    current_matrix = normalize_matrix(adj_matrix)

    for order in range(2, max_order + 1):
        current_matrix = current_matrix.dot(normalize_matrix(adj_matrix))
        weighted_connections = current_matrix * (weight_factor ** (order - 1))
        expanded_matrix += weighted_connections

    return normalize_matrix(expanded_matrix)

Let’s break this down:

We start with our original adjacency matrix of direct brand connections.
We calculate higher-order connections up to a specified maximum (default 5, because six degrees of separation, right?).
Here’s the clever bit: we apply a weight factor. Each higher-order connection gets reduced weight because your friend’s friend’s friend’s opinion shouldn’t count as much as your direct friend’s.
We add all these weighted connections to our original matrix, creating a rich tapestry of brand relationships.
Finally, we normalize everything to keep our numbers manageable.

The result? A supercharged adjacency matrix capturing not just direct relationships, but a whole network of indirect connections.

Now, imagine a niche online bookstore specializing in obscure academic texts. Their inventory is so specific that direct associations are as rare as a first-edition Gutenberg Bible at a yard sale.

A customer buys “The Mating Habits of 12th Century Mongolian Horses” (yes, I’m having fun with these titles). In a traditional system, we’d be stuck. But with our transitive associations, we might recommend “The Economic Impact of Horse Trading in Medieval Asia”, even if no one had ever purchased these together. Our system could find a chain of associations linking them through other related books.

This approach enables academics to joyfully tumble down rabbit holes of related obscure topics, much to the store owner’s delight. It illustrates the power of transitive associations, especially for businesses with niche catalogs or sparse collaborative filtering matrices. We’re transforming a sparse matrix into a dense one, uncovering “hidden” connections that create a discovery engine to surprise and delight customers, potentially leading them down purchasing paths they never knew existed.

Step 4: Fine-Tuning the Magic - Hyperparameters and Evaluation

Alright, data enthusiasts, we’ve built our transitive association model, but now it’s time to give it that extra polish. Think of it as tuning a high-performance engine - we need to adjust the nitrous levels just right.

Hyperparameter Tuning: Finding the Sweet Spot

Remember our expand_adjacency_matrix function? It comes with two key hyperparameters:

max_order: How far down the rabbit hole of connections we’re willing to go
weight_factor: Our trust factor for the friend of a friend of a friend

These aren’t just arbitrary numbers we pulled out of a magician’s hat. They’re the secret sauce that can make or break our recommendations.

Let’s take a closer look at weight_factor. Set it too high, and you might end up recommending winter parkas to someone shopping for swimwear. Set it too low, and you’re barely scratching the surface of potential connections.

So how do we find the Goldilocks zone? Enter: hyperparameter tuning. It’s like finding the perfect recipe, but instead of slurping soup, we’re crunching numbers.

def evaluate_model(adj_matrix, test_data, max_order, weight_factor):
    expanded_matrix = expand_adjacency_matrix(adj_matrix, max_order, weight_factor)
    return calculate_hit_rate(expanded_matrix, test_data)

# Grid search for best hyperparameters
best_score = 0
best_params = {}

for max_order in range(2, 7):
    for weight_factor in [0.1, 0.3, 0.5, 0.7, 0.9]:
        score = evaluate_model(train_adj_matrix, test_data, max_order, weight_factor)
        if score > best_score:
            best_score = score
            best_params = {'max_order': max_order, 'weight_factor': weight_factor}

print(f"Best parameters: {best_params}")
print(f"Best score: {best_score}")

Model Evaluation: Separating the Wheat from the Chaff

Now, how do we know if our model is actually any good? We can’t just take it out for a test drive on the same roads we built it on. That’s where train-test splits come in handy.

Here’s our game plan:

Split your data into training and test sets. Think of it as studying for an exam (training) and then taking the final (testing).
Build your adjacency matrix using only the training data.
Use your tuned model to make predictions on the test set.
Compare your transitive model against simpler approaches, like random recommendations or direct associations only.

Let’s see how it’s done:

def compare_models(train_data, test_data):
    # Build adjacency matrix from train data
    train_adj_matrix = build_adjacency_matrix(train_data)
    
    # Random model (aka "The Dart Board Approach")
    random_score = evaluate_model(random_recommendations, test_data)
    
    # Direct associations model (aka "The One-Track Mind")
    direct_score = evaluate_model(train_adj_matrix, test_data)
    
    # Transitive model with best hyperparameters (aka "The Six Degrees of Kevin Bacon")
    transitive_score = evaluate_model(
        expand_adjacency_matrix(train_adj_matrix, **best_params),
        test_data
    )
    
    print(f"Random model score: {random_score}")
    print(f"Direct associations score: {direct_score}")
    print(f"Transitive model score: {transitive_score}")

compare_models(train_data, test_data)

The Secret Sauce: Continuous Improvement

Now, here’s what separates the good recommendation systems from the great ones: continuous improvement. Everything we’ve built so far is just the foundation. The real magic happens when you start iterating and refining.

Fine-tune your clustering: As you gather more data, you might discover that certain features are more predictive than others. Don’t hesitate to adjust your approach.
Adjust association weights: Consider the context of purchases. Perhaps items bought together in the same transaction should carry more weight than those bought by the same customer on different days.
Optimize hyperparameters: Regularly revisit your max_order and weight_factor settings. As your data grows and evolves, so too should your model’s parameters.
Incorporate user feedback: If customers consistently ignore certain recommendations, use that information to refine your model.
A/B test rigorously: Test different versions of your model against each other. Let the data guide your decisions on which approaches work best for your specific use case.

Remember, the goal isn’t perfection - it’s continuous improvement. Aim to build a system that consistently outperforms random chance, and then focus on making it a little better every day.

The Real-World Impact: Beyond the Numbers

Now, I won’t give you specific numbers here because, let’s face it, your mileage may vary. But if you’ve done everything right, your transitive model should be outperforming the others like a sports car in a bicycle race. But here’s the real kicker: this isn’t just about better numbers on a spreadsheet. It’s about creating a recommendation system that feels almost eerily intuitive to your customers.

While we’ve covered significant ground in this post, we’ve only scratched the surface of what’s possible. In our next installment, we’ll explore how cutting-edge AI techniques can take our recommendation system to the next level. We’ll delve into methods that can create truly bespoke shopping experiences, predicting not just what a customer might want now, but what they’ll want next. From leveraging deep learning to harnessing the power of contextual bandits, we’ll explore how to create a recommendation engine that doesn’t just react to customer behavior, but anticipates it. It’s the difference between a skilled salesperson and a personal shopping psychic (with a Ph.D. in data science). In the end, that’s what separates a good recommendation system from a great one. It’s not just about predicting what customers want - it’s about inspiring them, delighting them, and yes, maybe even surprising them a little. So go forth, tune those hyperparameters, split that data, and may your conversion rates be ever in your favor!