Churn-at-Risk: Applying Survival Analysis to Control Telecom Subscription Churn
2017 Jan 15Introduction
One of the most recurring topics in any subscription service is how to reduce Churn (customer attrition), as acquiring new customers is much more difficult (and expensive) than retaining existing ones.
Around 70% of companies know that it is cheaper to retain a customer than to go after a new one.
Using a simple analogy, the profit from subscription services is like a kind of blood in the bloodstream of a company, and any interruption harms the entire business, as this is a revenue model based on the recurrence of billing, not on the development or even sale of other products.
In business models based on the volume of people willing to have a recurring charge, the business becomes much more complicated, as unlike products that have greater elasticity, revenue flow is extremely subject to market and customer preferences.
Within this scenario, for all companies whose revenue flow is based on this type of business, knowing when a customer is about to churn (cancel) is fundamental for creating more effective retention mechanisms, or even for creating customer communication rules to prevent or minimize the chance of a customer leaving the database.
Therefore, any mechanism or effort to minimize this effect is of great value. We base our approach on statistical theory to seek answers to the following questions:
- How to reduce Churn?
- How to identify a potential customer who will churn? What strategies should be followed to minimize this Churn?
- What communication rules should we have with customers to understand the reasons for subscription cancellation and what are the possible customer winback strategies in this scenario?
To answer these questions, we turned to survival analysis, as this area of statistics best handles probability of lifespan with censored data, whether it be for materials (e.g., time to failure of a mechanical system) or for people’s lifespan (e.g., given a certain dosage, what is the estimated survival time for a patient with cancer). In our case, it’s about how long a subscriber stays before canceling their subscription.
Survival Analysis
Survival analysis is a statistical technique developed in medicine whose primary purpose is to estimate the survival time or time to death of a given patient within a time horizon.
The Kaplan-Meier estimator (1958) uses a survival function that considers a division between the number of observations that did not fail at time t by the total number of observations in the study, where in each time interval there is a distinct number of failures/deaths/churns, and the risk is calculated according to the number of individuals remaining in the subsequent time.
The Nelson-Aalen estimator (1978) is an estimator with the same characteristics as Kaplan-Meier, with the difference that this estimator works with a survival function that is the cumulative hazard rate function.
The fundamental elements for characterizing a study involving survival analysis are: (a) initial time, (b) time interval measurement scale, and (c) whether the churn event occurred.
The main papers are by Aalen (1978), Kaplan-Meier (1958), and Cox (1972).
This post does not aim to provide an introduction to survival analysis, as there are many references on the internet about the subject, and there is nothing to add in this regard from this humble blogger.
Like cohort analysis, survival analysis is primarily a longitudinal study, meaning its results have a temporal characteristic, both in retrospective and prospective aspects. It provides a typically temporal answer to a specific event of interest.
What we will use for sample comparison is the longitudinal behavior according to specific characteristics of different samples over time, and the factors influencing churn.
Due to obvious NDA issues, we will not post characteristics here that could indicate any business strategy or characterize any information of any nature.
We can say that survival analysis applied to a telecom case can help estimate the probability over time of how long a subscription will last until the churn event (cancellation). This allows for the development of strategies to prevent this event, as acquiring a new customer is more expensive than keeping an existing one, and it falls entirely within a Customer Winback strategy. (Note: This book Customer Winback by Jill Griffin and Michael Lowenstein is mandatory for everyone working with subscription services or businesses that depend heavily on recurrence, such as e-commerce).
In our case, the time to failure or time to death, as we are talking about subscription services, our event of interest would be churn, or subscription cancellation. In other words, we would have something like Time-to-Churn or Churn-at-Risk. Keep this term in mind.
Methodology
We used anonymized data from two older products, where a uniform shuffling hash (obeying a specific distribution) was applied to the attributes (for privacy reasons). The attributes are:
- id: Record identifier;
- product: Product;
- channel: Channel through which the customer entered the database;
- free_user: Flag indicating if the customer entered for free or not;
- user_plan: If the user is prepaid or postpaid;
- t: Time the subscriber has been in the database; and
- c: Indicates if the event of interest (churn/cancellation) occurred.
We eliminated the effect of left-censoring by removing reactivation cases, as we wanted to understand the subscriber’s journey as a whole without any bias related to customer winback issues. Regarding right-censoring, we have some specific cases where several months have passed since this dataset was extracted.
An important technical aspect to consider is that these two products are comparable, as without this comparability, any characterization would be null.
At the end of this implementation, we will have a life table for these products.
Implementation
First, let’s import the libraries: Pandas (for data manipulation), matplotlib (for plot generation), and lifelines for applying survival analysis:
%matplotlib inline import matplotlib.pyplot as plt import numpy as np import pandas as pd import lifelines
After importing the libraries, let’s adjust the image size for better visualization:
%pylab inline pylab.rcParams[‘figure.figsize’] = (14, 9)
Let’s upload our dataset by creating an object called df and using the read_csv class from Pandas:
df = pd.read_csv(‘https://raw.githubusercontent.com/fclesio/learning-space/master/Datasets/07%20-%20Survival/survival_data.csv’)
Let’s check our dataset:
df.head()
id
product
channel
free_user
user_plan
t
c
0
3315
B
HH
1
0
22
0
1
2372
A
FF
1
1
16
0
2
1098
B
HH
1
1
22
0
3
2758
B
HH
1
1
4
1
4
2377
A
FF
1
1
29
0
So, as we can see, we have the 7 variables in our dataset.
Next, let’s import the Lifelines library, specifically the KaplanMeier estimator:
from lifelines import KaplanMeierFitter kmf = KaplanMeierFitter()
After importing the class related to the Kaplan Meier estimator into the kmf object, let’s assign our time (T) and event of interest (C) variables.
T = df[“t”] C = df[“c”]
What was done previously is that we searched for the array t in the dataframe df and assigned it to the object T, and we searched for the array in the column c in the dataframe and assigned it to the object C. Now, let’s call the fit method using these two objects in the snippet below:
kmf.fit(T, event_observed=C )
Out[7]:
<lifelines.KaplanMeierFitter: fitted with 10000 observations, 6000 censored>
Object fitted, let’s now view the plot related to this object using the Kaplan Meier estimator.
kmf.survival_function_.plot() plt.title(‘Survival function of Service Valued Add Products’); plt.ylabel(‘Probability of Living (%)’) plt.xlabel(‘Lifespan of the subscription (in days)’)
Out[8]:
<matplotlib.text.Text at 0x101b24a90>
As we can see in the plot, there are some pertinent observations when we treat the survival probability of these two products in aggregate:
- Right on the first day, there is a substantial reduction in subscription survival time of approximately 22%;
- There is an almost linear decay after the fifth day of subscription; and
- After day 30, the survival probability of a subscription is approximately 50%. In other words, after 30 days, half of the new subscribers will already be out of the subscriber base.
However, let’s plot the same survival function considering statistical confidence intervals.
kmf.plot() plt.title(‘Survival function of Service Valued Add Products - Confidence Interval in 85-95%’); plt.ylabel(‘Probability of Living (%)’) plt.xlabel(‘Lifespan of the subscription’)
Out[9]:
<matplotlib.text.Text at 0x10ad8e0f0>
However, in this initial model, we have two clear limitations:
- The aggregated data doesn’t tell us much regarding dynamics that might exist in the specificity of certain attributes/dimensions;
- The dimensions (or breakdowns) according to the attributes in the dataset are not explored; and
- There is no division by product.
To address this, we will start detailing each dimension and see how each influences the survival function. Let’s begin by breaking down by the dimension that determines if the customer entered via a free trial or not (free_user).
ax = plt.subplot(111) free = (df[“free_user”] == 1) kmf.fit(T[free], event_observed=C[free], label=”Free Users”) kmf.plot(ax=ax, ci_force_lines=True) kmf.fit(T[~free], event_observed=C[~free], label=”Non-Free Users”) kmf.plot(ax=ax, ci_force_lines=True) plt.ylim(0,1); plt.title(“Lifespans of different subscription types”); plt.ylabel(‘Probability of Living (%)’) plt.xlabel(‘Lifespan’)
Out[10]:
<matplotlib.text.Text at 0x10ad8e908>
This plot presents some important information for initial insights regarding each survival curve in relation to the type of free trial offered as an influencing factor for churn:
- Subscribers who start as non-free (i.e. do not have any initial free trial) show a substantial decay of over 40% in survival chance after the 15th day (considering the confidence interval);
- After the 15th day, non-free subscribers’ survival curve shows relative stability around 60% survival probability until the censored period;
- For non-free users, given the degree of variability in the confidence interval, we can conclude that many cancellations are happening very rapidly, which should be investigated more closely by the product team; and
- Users who enter via a free trial (i.e. get some free days before being charged) show a greater decay in survival level, both in the initial period and over time, but stability is found throughout the series without major surprises.
Given this initial analysis of the survival curves, let’s now evaluate the survival probabilities according to the product.
ax = plt.subplot(111) product = (df[“product”] == “A”) kmf.fit(T[product], event_observed=C[product], label=”Product A”) kmf.plot(ax=ax, ci_force_lines=True) kmf.fit(T[~product], event_observed=C[~product], label=”Product B”) kmf.plot(ax=ax, ci_force_lines=True) plt.ylim(0,1); plt.title(“Survival Curves of different Products”); plt.ylabel(‘Probability of Living (%)’) plt.xlabel(‘Lifespan’)
Out[11]:
<matplotlib.text.Text at 0x10aeaabe0>
This plot presents the first clear distinction between the two products. Even with confidence intervals varying by 5%, we can see that Product A (blue line) has a higher survival probability with a percentage difference of over 15%; this difference is amplified after the twentieth day. In other words: Given a certain cohort of users, if a user enters Product A, they have approximately a 15% higher retention probability compared to a user who enters Product B, or Product A has a longer retention tail than Product B. Empirically, it is known that one of the main influencing factors for SVA products is the media channels through which these products are offered. The media channel is the thermometer that tells us if we are offering our products to the correct target audience. However, for a better understanding, let’s analyze the channels through which subscriptions originate. First, let’s normalize the channel variable to segment the channels according to the dataset.
df[‘channel’] = df[‘channel’].astype(‘category’); channels = df[‘channel’].unique()
After normalization and transformation of the variable to categorical type, let’s see the array.
channels
Out[13]:
[‘HH’, ‘FF’, ‘CC’, ‘AA’, ‘GG’, ‘II’, ‘JJ’, ‘BB’, ‘DD’, ‘ZZ’, ‘EE’] Length: 11 Categories (11, object): [‘HH’, ‘FF’, ‘CC’, ‘AA’, …, ‘EE’, ‘DD’, ‘JJ’, ‘ZZ’]
Here we have the representation of 11 media channels through which customers entered the service. With these channels, we will identify the survival probability by channel.
for i,channel_type in enumerate(channels): ax = plt.subplot(3,4,i+1) ix = df[‘channel’] == channel_type kmf.fit( T[ix], C[ix], label=channel_type ) kmf.plot(ax=ax, legend=True) plt.title(channel_type) plt.xlim(0,40) if i==0: plt.ylabel(‘Probability of Survival by Channel (%)’) plt.tight_layout()
Analyzing each of these plots, we have some considerations about each of the channels:
-
HH, DD: High mortality rate (churn) shortly before the first 5 days, indicating an ephemeral or attractive characteristic of the product for the audience of this media channel.
-
FF: Shows less than 10% mortality rate in the first 20 days and has a very particular pattern after the 25th day with practically no high mortality. It has a confidence interval with very strong oscillations.
-
CC: Along with HH, despite having a high mortality rate before the 10th day, it shows a good degree of predictability, which can be used in media incentive strategies that require greater certainty in medium-term retention.
-
GG, BB: Show a good survival rate at the beginning of the period but have severe oscillations in their respective confidence intervals. This variable should be considered when developing an investment strategy in these channels.
-
JJ: If there were a definition of uncertainty in terms of survival, this channel would be its best representative. With its confidence intervals oscillating by more than 40% relative to the lower and upper limits, this media channel appears extremely risky for investment, as there is no regularity/predictability according to this data.
-
II: Despite having a good degree of predictability regarding the survival rate in the first 10 days, after this period, it has a very severe hazard curve, indicating that this type of channel can be used in a short-term strategy.
-
AA, EE, ZZ: Due to some form of data censoring, they require further analysis at this initial stage. (Delve into the data details and check if it’s right-censoring or some form of truncation).
Now that we have some understanding of the dynamics of each channel, let’s create a life table for this data. A life table is simply a tabular representation of the survival function in relation to survival days. For this, we will use the utils library from lifelines to obtain this value.
from lifelines.utils import survival_table_from_events
Library imported, let’s now use our T and C variables again to fit the life table.
lifetable = survival_table_from_events(T, C)
Table imported, let’s take a look at the dataset.
print (lifetable)
removed observed censored entrance at_risk event_at 0 2250 2247 3 10000 10000 1 676 531 145 0 7750 2 482 337 145 0 7074 3 185 129 56 0 6592 4 232 94 138 0 6407 5 299 85 214 0 6175 6 191 73 118 0 5876 7 127 76 51 0 5685 8 211 75 136 0 5558 9 2924 21 2903 0 5347 10 121 27 94 0 2423 11 46 27 19 0 2302 12 78 26 52 0 2256 13 111 16 95 0 2178 14 55 35 20 0 2067 15 107 29 78 0 2012 16 286 30 256 0 1905 17 156 23 133 0 1619 18 108 18 90 0 1463 19 49 11 38 0 1355 20 50 17 33 0 1306 21 61 13 48 0 1256 22 236 23 213 0 1195 23 99 6 93 0 959 24 168 9 159 0 860 25 171 7 164 0 692 26 58 6 52 0 521 27 77 2 75 0 463 28 29 6 23 0 386 29 105 1 104 0 357 30 69 0 69 0 252 31 183 0 183 0 183
Unlike R, which has a life table with the percentage relative to the survival probability, here we will have to make a small adjustment to get the percentage according to the entrance and at_risk attributes. The adjustment will be as follows:
survivaltable = lifetable.at_risk / np.amax(lifetable.entrance)
Adjustments made, let’s look at our life table.
survivaltable
event_at 0 1.0000 1 0.7750 2 0.7074 3 0.6592 4 0.6407 5 0.6175 6 0.5876 7 0.5685 8 0.5558 9 0.5347 10 0.2423 11 0.2302 12 0.2256 13 0.2178 14 0.2067 15 0.2012 16 0.1905 17 0.1619 18 0.1463 19 0.1355 20 0.1306 21 0.1256 22 0.1195 23 0.0959 24 0.0860 25 0.0692 26 0.0521 27 0.0463 28 0.0386 29 0.0357 30 0.0252 31 0.0183 Name: at_risk, dtype: float64
Let’s transform our life table into a Pandas object for easier data manipulation.
survtable = pd.DataFrame(survivaltable)
For updating Churn-at-Risk cases, we can define a function that will have the life table and can assign the survival probability according to the survival days. For this, we will create a simple function using Python itself.
def survival_probability( days ): # Accessing the ‘at_risk’ column for the given number of days # .iloc is used for integer-location based indexing # We multiply by 100 to get the percentage survival_percentage = survtable.loc[days, “at_risk”] * 100 print(f”The probability of Survival after {days} days is {survival_percentage:.2f} %”)
Nesse caso vamos ver a chance de sobrevivência usando o nosso modelo Kaplan-Meier já ajustado para uma assinatura que tenha 22 dias de vida.
In[22]:
survival_probability(22)
The probability of Survival after 22 days is 11.95 %
In other words, this subscription has only an 11.95% probability of being active, which means it may be canceled very soon.
Conclusion
As we can see above, using survival analysis, we can extract interesting insights from our dataset, especially for discovering the duration of subscriptions in our database and estimating the time to churn.
The data used reflect the behavior of two real products, however, they were anonymized for obvious NDA reasons. Nevertheless, this does not prevent the use and adaptation of this code for other experiments. An important point regarding this dataset is that, as can be observed, we have significant right-censoring, which somewhat limits the long-term view of the data, especially if there is a long tail in the churn event.
As I mentioned at the São Paulo Big Data Meetup in March, there are several architectures that can be combined with this type of analysis, especially Deep Learning methods, which can serve as an endpoint for a prediction pipeline.
I hope you enjoyed it, and for any questions, please send a message to flavioclesio at gmail.com.
PS: Special thanks to my colleagues and reviewers Eiti Kimura, Gabriel Franco, and Fernanda Eleuterio.




