Recommendations: from vanilla to personalization – Part III.


Here’s Part III. of my series about Machine Learning and personalization – read the first article here, and the second one here.

How to personalize recommendations

Simply recommending specific and relevant products to customers cannot be called “personalization.” To really personalize recommendations, beyond the what question, we need to grasp the bigger picture by addressing three additional questions:

  • When: When is the right time to reach the customer, either to make new recommendations or to push them towards a purchase?
  • Why: Why do we have customer churn? We need to know the reasons or at least have some idea of why customers stop using our products or leave us (and I’m talking about learning some causality from data, which would be foolish since a purchase act is subject to tons of external factors that we cannot track).
  • How: How can we reverse the churn? Once we know something about a churning customer, we can take action to reverse their propensity to churn; this is very true for durable goods and to some extent in the case of fast-moving goods, too.

Propensity-to-churn modeling

To address the when question, we are used to taking a binary label of loyal vs. churning customer at a specific snapshot in time as a key performance indicator (KPI) and then training in a binary classification fashion propensity-to-churn models that output likelihoods.

Once such models have been trained, we can use them to address the why question.

More specifically, from model explainability techniques (e.g. most important features, most prominent decision rules), we can extract insights explaining what makes a customer churn.

Let’s dive a little deeper into such insights and, more precisely, about what they provide.

Customer segmentation

Let’s take the example of a propensity-to-churn model to train.

Usually we engineer features from very different, even heterogeneous, data sources. For example, from Google Analytics 360 data to, let’s say, some weather forecast data.

When we are faced with such different data sources, we generally encounter two caveats:

  • It is usually unlikely that our features exhibit some kind of linear interactions between one another or against the target variable; that’s why models with linear components, like neural nets, are not the norm.
  • Customer segmentation is no more than clustering; clustering instances based on features, even normalized while originally lying on different scales, remains tricky.

Decision tree ensembles are usually the best models to deploy.

To perform robust customer segmentation, instead of using the original features, normalized or not, we work from instances.

Since these instances are embedded into the space of the decision rules generated from the decision trees, embedding makes sparse representations lying on the same scale.

Such embedding was democratized by Friedman and Popescu when they published their seminal paper about RuleFit in 2005.

Segment trends

Once we have our customer segments, the trends are basically given by the most prominent decision rules ticked in each group, like:

  • What are the common purchase behavior characteristics?
  • What are the common characteristics when customers churn?

This is a matter of simple statistics to calculate.

Such decision rules can also be evaluated in terms of importance, such as feature importance.

Simple linear models like a Lasso regression can do the job, exactly as in Friedman and Popescu’s RuleFit.

Personalized diagnosis

Trends are made out of a group of instances, i.e., a subgroup of the population.

In this sense, the model analysis is a global interpretation.

To diagnose a specific instance, we need to go to a local interpretation, at the scale of a single instance/point, i.e., a single customer.

To do so, various methods exist: Local Surrogate models like LIME, computing Shapley values in a Monte-Carlo fashion, or explanation models such as KernelSHAP and TreeSHAP.

Local surrogate models suffer from various caveats, beginning with setting the right neighborhood in terms of the kernel and its related hyperparameters, which greatly fluctuate the output.

Shapley methods are usually preferred as their interpretation is more straightforward. For each customer, we know precisely the most prominent feature values influencing the model’s decision to output such a customer as more or less likely to churn.

Moreover, we also know the force and the direction of the contribution of each prominent feature value to the churn prediction.

Here is an illustration of using Shapley values on a dummy use case.

We are a hypothetical website selling all sorts of electronics, from TVs to cameras to drones or weight balances, whatever.

Our problem statement is straightforward: 

  • It’s the Black Friday sale. We have customers currently on the website, scrolling many pages. Basically they are in session and we can track their activities via Google Analytics 360.

Let’s run our propensity-to-churn model to discover which customers are likely to churn, i.e., not purchase the items put in their cart at the end of the session.

Our churn model is running in real-time giving online predictions while our customers are still in session.

We are also running SHAP real-time to interpret for each customer, especially the churning ones, what makes the model think that they can churn.

Customer A is checking products related to TVs.

Customer A is not churning, with a predicted likelihood to churn of only 0.06.

Feature values which drive the 0.06 prediction are:

  • The time period of Black Friday.
  • The high number of similar products checked: 10 different TVs.

Those feature values are highlighted in blue and make the prediction drop.

Since Customer A is not likely to churn, let’s not take any action.

Customer B is checking products related to weight balances.

Customer B is churning with a predicted likelihood to churn of 0.71.

Feature values which drive the 0.71 prediction are:

  • The time period of Black Friday; weight balances may not be very popular.
  • The low number of similar products checked: two different weight balances.
  • Zero comparisons made.
  • The current time spent in session.

All those features are highlighted in red, i.e., they increase the prediction, the churn likelihood.

For the sake of simplicity, let’s assume that such features should move monotonically and inversely to the churning likelihood, i.e., if such feature values were increased, then the likelihood of churn should decrease.

To increase the number of similar products to check and the number of comparisons to make, we just have to suggest  similar products to the customer and compare them with one another.

We can do so with a small sliding window popping up from the right of the checkout page or the main product page while the customer is still in session.

And if the customer checks the additional products presented in the pop-up window, inevitably their time spent in session will increase.

So the three most prominent feature insights are actionable with a simple change in the website layout.

Regarding the other, less important feature values, nothing is really actionable.

We cannot change the time period and we can neither suggest to the customer to check other product categories nor come back to our website once they’re in front of a laptop for instance.

We can extrapolate the feature direction to reverse the tendency to churn and take the right actions.

Such actions address the last question: the how.

Key takeaways

Product recommendations are widespread. They quantitatively improve the customer experience and create diversification.

But recommending specific products relevant to a customer is not enough to personalize the customer experience.

Targeting the customer at the right time is mandatory so as  not to confuse them. Doing so qualitatively improves their experience.

Applying a custom strategy to retain our customer is one step further toward a qualitative improvement of their experience.

Sometimes recommending new or similar products is not the right move to convert customers into a purchase, especially for expensive goods.

It may be better to leverage actionable insights behind the scenes to convince  the customer to buy a product while they are in session.

Adding this additional layer of personalization is a must-have.

In summary, a fully personalized recommendation engine should handle all the aspects described:

  • Stricto sensu product recommendations.
  • Retrieval of churning customers.
  • Analysis of churning customers and extraction of actionable insights.
  • Actions from insights to embed them into various channels (from email campaigns to websites or apps).

Such tasks are generally complex but are customizable to the specificities of the company’s industry and its customers.

That’s why fully personalized recommendation engines cannot be easily standardized, and automated Machine Learning may never be able to handle them entirely.