top of page

Which type of customer lifetime value model should you use?



As you learn more about customer lifetime value and choose to invest in developing your own model, you'll inevitably reach a fork in the road. There are three main types of CLV model, each with their own pros and cons.


Historical CLV Models

Historical models are the simplest, and only look at past transactional data of your customers. These models answer simple questions like "How much has my average customer historically spent in their lifetime?", and "What's the average order value of my customer?".


The downside of historical models is that they will tell you how things were in the past, but don't provide guidance on where you stand today or how things will look tomorrow. This means that historical models are primarily helpful for descriptive analytics, but are weak when it comes to prescribing future actions.


Historical models are far better than nothing, and give you the first opportunity to flex your customer-focused muscles.


Probabilistic CLV Models

Probabilistic models are the second type of CLV model, and are a monumental advancement over historical models. Instead of focusing on past customers who have already left your business, these models leap allowing you to understand the customer lifetime value of your current customers based on predictions of their future behavior.


Probabilistic models open the door to predictive and prescriptive analytics, since you now know which customers you need to act on today to save, the trajectory of your cohort-level metrics, the present value of your business, and many other useful insights.


These models have been around for a long time, but saw a renaissance in the early 2000's with the efforts of academics like Professor Fader of Wharton to guide these models from academic halls into everyday corporate conversations.


Probabilistic models are the most reliable, stable models of the three classes; they most accurately tracing the narrative of your customers' lifetimes with you, from acquisition to churn. These models rely on probability distributions that describe customer purchase frequency and spend. Many are Bayesian, meaning they will make assumptions based on behaviors of similar customers when purchase history is limited, and bring more data to bear as it becomes available, making these models extremely valuable for estimating the value of low-frequency purchases and high-frequency purchasers alike.


Examples of the most popular probabilistic models include Pareto/NBD and BG/NBD, with many variations stemming from these.


The weakness of probabilistic models is also one of their strengths: by design, these models require very little data to work, only needing three things: customer ID, transaction date, and transaction amount. If you need to start controlling for other variables like seasonality, marketing efforts or other external impacts, adjusting these models can be painful and computational expensive.


Machine Learning CLV models

Machine learning models save the day when probabilistic models become too unwieldy to handle data outside of the RFM (recency, frequency, monetary) framework.


Machine learning CLV models can incorporate just about any data that you think may be useful for predicting CLV, ranging from top-of-funnel metrics like site behavior to unstructured data from customer reviews.


The downside of these models is their sheer level of complexity and customizability, requiring a mature internal data environment and talented data scientists to properly feature engineer the data and train and validate the resulting models. Machine learning models are excellent for snapshots of predicted CLV of customers, though can quickly lose their predictive power if not properly maintained and retrained, and don't have as cohesive of a "narrative" of the customer lifecycle as the Probabilistic models do.


Many companies offer "off-the-shelf" machine learning CLV models - but buyer beware! These often require significant upfront work to develop properly, and anyone telling you otherwise is probably a salesperson.


Also be aware that consumers of the output of CLV models are often non-technical, whether from your marketing or finance departments, or even executive leadership. The more complicated and hard to explain a model is and the more of a black box its calculations, the harder it will be to get buy-in and become a customer-focused organization based on the model.


So, which type of model should I use?

Now that we've touched on the three main types of CLV models - historical, probabilistic and machine learning - which should you use?


Nobody with experience working with customer lifetime value models would confidently tell you to only use any of the three model types. As we've discussed, all of these model types have their own strengths and weaknesses, and may be more useful at different times for different organizations.

Instead of thinking of this decision as a winner-take-all, think about it as a spectrum of complexity.


For a client who has never touched CLV and has no idea the value of their customers, I recommend starting with historical models. Simply seeing the average customer lifetime value of past customers may be a valuable insight that catalyzes better decision making about customer acquisition, development and retention.


However, once a company has identified its best source of customer transactional data, learned to work with it to derive insight, and feels comfortable thinking through a customer lens, I recommend moving to using probabilistic models.


Probabilistic models provide significantly more insight and actionability without any more effort in terms of data collection than historical models. The effort comes from the modeling itself. Fortunately, there are already best-of-breed models you can rely on for giving you reliable estimates of customer lifetime value. A model can be built rapidly and put into production in only a few weeks by a talented technical resource or consulting partner (as long as you pick the right one!).


In practice, historical and probabilistic models cover the analytical needs of the majority of firms. Most companies simply do not need the incremental value the machine learning models present. By leveraging probabilistic models, companies in most industries will enjoy significant differentiation from their competitors.


However, in highly competitive, data-driven industries, or industries that are being disrupted by data-first companies (think Capital One for banking, or Starbucks for coffee houses), machine learning models might be required to sufficiently optimize and personalize the customer experience to keep valuable customers from going to competitors. Similarly, companies with advanced data-driven marketing organizations may want the added flexibility of a machine learning model that can incorporate (or even estimate) the impacts of marketing efforts.


Pick one and get going!

The only wrong choice here is to have no model and to ignore the benefits of customer-driven analytical tools like CLV. No matter where you're starting, the upside of advancing your customer analytics practice by incorporating CLV is too big to ignore.


If you acknowledge from the start that some growing pains will happen, and that challenges endured during the CLV adoption process will only help you and your team grow stronger, then there's nothing in the way of developing your first (but not last) CLV model!


コメント


bottom of page