Western Power Plc is a mid-range
energy supply company providing gas and electricity to about an estimated
million households.The company has faced consumer churn, effecting the profits
of the business.The project aims to produce predictive models of customer
attrition for single-product (electricity or gas) customers cancelling and for
dual product (electricity and gas) customers cancelling one product or both.This
project provides theoretical analysis using scenario based data and
predictions.Suggesting and highlighting the importance of the data modelling
tolls chosen for Western Power PLC.
Over the last decade the energy market has faced
continues changes across the country. Until recent changes in policy and
ownerships, the energy industry concerns were not on retention of customers due
to the monopolies in power, gas and water supplier. OFGEM the government
regulator for the electricity and downstream natural gas markets in Great
Britain in a study in December 2017 confirmed there was an estimated 15%
increase in the number of consumers switching energy suppliers. With the
changes in the market company’s like Western Power plc, a mid-range energy
supply company face increases in competition from the bigger service providers
and entry firms.
surrounding customer attrition consistently defined it as a measure of
consumers within the company leaving or choosing alternative providers. Churn
rate can be formulated as :
Customer Churn Rate = (Customers beginning of
period – Current customers end of period ) / Customers beginning of period
Robert C. Blattberg (2008), highlights the two major types of customer churn as “voluntary
and involuntary. Voluntary leads back to customer satisfaction and competition involuntary
being more the choice of the company to terminate contract. Data mining
techniques including cluster analysis and decision trees can be used to analyse
and predict churn. Hung,
Yen, and Wang (2006) whilst studying customer
churn found neural networks gave more reliable results when compared to
predictive models like for example decision trees. However decision trees and
linear regression have been found to be the most common form of predictive
modelling in consumer engagement and retention studies. Further studies found
solutions to the resolution of customer churn and preventing it. The neural
network model was more effective in identifying the behaviours of consumers
within the market. Wei and Chiu (2002) highlighted the importance
of consumer data and used this to estimate the customer attrition. The chosen methodology within the assignment is Cross-industry
standard process for data mining. Crisp DM is defined by co- founder Tom
Khabaza in CRISP-DM Overview, October
2016 as cross-industry standard process for data mining. It is a
hierarchical process model, set out in phases. These phases are broken down
within a commodity based industry, Western PLC has to deal with variety of
policy changes and agreements whilst remaning competitively attractive. The
industry pricing is hugely based on supply and demand. When the demand of the
product increases (usually during the colder periods of the year) the more the
company is assumed to supply at a higher price. This leads to a continues
changes in the supply and demand of the commodity. As earlier mentioned the
increase in customer churn is predominantly due to the changes in regulations
and an increase in smaller suppliers within the market. Amongst the big six
Western Plc would be within this category, attracting consumers seeking a
“better deal” Many customer are using price comparison services online to make
IN order for us to get the predictive model which best suits
the companys needs we need to under stand what sor of churn ther company is
facing. Based on the literature review and the industry background the biggest cause
is usually income or region. However there can be a variety of test hypothesis
which can be use din the evaluation and testing phase of the model . Exmaples
Does income impact the churn?
Regional differences in churn rate?
is causing the consumers to leave?
area of the business is leading to churn?
it a particular type of customer leaving?
customers are most at risk of cancelling?
the increase in churn seasonal?
Critical Success Factor (CSF), in this occasion will be a solution to reduce or
reverse this trend in consumer attrion. This can be measured against the values
within the last three years. Being that the company is in the early stages of
predictive analytics this will be used in future to continue improving the
energy industry segmentation of the consumer is often difficult. This is due to
the variety of consumer bases available. If the predictive model was to be
based on gender for example this might not be reflective of the core reasons
surrounding the churn. Historically the industry has used demographic customer segmentation
leading on to income based and usage/ needs-based. It is also to be noted and later investigated
whether or not cyclical changes in churn are experienced .
For the project I choose ot use crisp dm as my main process
to structure the project. Each phase outlines specific deliverables and
documentation requirements, which will aid in the data mining process for
understanding is the core of crisp dm. The phase aims to gain insight into what
the company’s goals are and how the data mining process can be used within
these. It has four main tasks defined:
the Business Goals
Western PLC project, the business goals
have been set out as producing or suggestion a predictive model for reducing
customer attrition for single-product (electricity or gas) customers cancelling
and for dual product (electricity and gas) customers cancelling one product or
both. Based on the baseline numbers and data from the last 3 years, this should
reduce by 10% within a 6 month period. This should fundamentally improve the
suppliers overall revenue and increase in margin. In order to meet the business goals, test and
sample data will be used for the decision tree model. Figure 2: Funnel of Business
objectives, highlights the key areas and data which will be used for modelling.
Funnel of Business objectives
understanding and preparation
understanding phase of Crisp DM aims to gather, describe, explore and verify
data quality. Western Plc mentioned 5 databases which hold different
information. Figure 3 provides a brief example of the information which would
be stored in each data set.
be key similarities in the Customer, Billing, and Maintenance databases. This
could be an advantage in this instance as it will allow for a comparison in
regression modelling especially if we are investigating any linear
relationships between the assumed variables. If there is churn showing within
the maintenance area of the business in comparison to the marketing area the
directors are left with a choice of which area to target.
There are over 30
variables in the data sets however the key areas for the project are
highlighted below: Key areas of interest
for Western PLC would be :
Consumers demographic location: Cluster
modelling can be used to see if there is a better provider within a specific
region or area . This will allow the marketing department to target this area
of business. This could also depend on how rural consumers live and the service
being provided. This is also found in the marketing database.
Age: The Age distribution of the data will allow
for cluster modelling. It was previously assumed older age groups tend to be
more loyal to utility providers. Comparison sites have over the last few
years changed target market and aprovided ser vices including door to doo
Contract length : Consumers with a longer
running contract are more likely to stay with Western PLC.
Salary – Full time part time employement,
catergorically splitting the incomes can also give us a straight forward
Payment plan – within the billing database the
payment plan and average billing amount can provide insight into churn.
Usage: Demand and supply analysis proves
with the increase demand for the service the supply will reduce increasing the
price. At this point Western PLC churning consumers can be profiled.
PLC data will be sampled to estimated 20% of the consumers at 20,000 records
selected at random. This will provide 20,000 Consumers, including churned
between January 2015 – 2018. A general summary of the data can be produced in R
with the coding Summary (mydata) code.
Over the last 3 years we will find a mean average and immediately see
the highest churn occurred in Year 2 (quarter 2)
The selection of the sample data is
the first step to data gathering and preparation. It allows us to find a correlation
between consumers who have churned. This will give us a rough guide of where to
start with the data mining process. Variables evident in the data for Western
power plc are included within the demographic structure of the consumer base
and salary/ household incomes. Data preparation can be summaraise in figure 4.
Data Preparation tasks.
preparation takes the biggest percentage of time during the project process. Despite
it taking the longest time it often referred back to during the modelling phase
as it impacts the results and the overall analytics achieved. Data preparation
phase covers all most important phases to construct the final data set that
will be fed into the models from the original selected data set. With Western
PLC we have selected, 20,000 sample data set including churned consumers. Once the selected data set is identified this
will need to be cleaned. Cleaning of the data refers back to a key element of
data mining, the quality of the data.
Data quality investigates the
core foundations of the data used within our model. This is considered one of
the roots to good analytics and data mining. To ensure the data used is fit for
purpose, in this process we verify the following elements of the data set:
Completeness – Verifying if the majority of the data includes one or more
values. This is critical for our regional model as the address and location
filter needs to be completed.
Uniqueness – ensuring when measured with the other data there is only one of
its kind. This can be an issue when consumers are based in the same
flat/apartment complex. It is important to differentiate consumers.
Timeliness – Ensuring we have up to date contact details for the marketing
team in Western plc to offer new products and keep consumers.
Validity and Accuracy – human error-
is also included in the preparation phase. This involves the validity, accuracy
and completeness element of data quality. However we then have to decide what
to do if the data is in accurate or missing. Missing data can have significant
impact on predictive modelling and decision tree modelling in software’s like
for example R. Some researchers suggest removal of null values however this is
dependent on the size and the amount of data that is indeed null. We will also
need to ensure all duplicates are removed from the data set and how to deal
with the consumers who have moved to new address’s during the 3 year phase.
At the end
of this phase we should be confident in the data we are testing and using in
the model. Normalisations of the data should be completed and data construction
may be relevant to in the case of Western Plc deciding if we are going to
verify the model based on region, area code, city, town, post code etc.
all the data is consistent and arranged accordingly in the different databases
we can now integrate the data.
have the same data across different databases, the opportunity is ripe for
errors and duplicates. The first step toward successful integration is seeing
where the data is and then combining that data in a way that’s consistent. Here
it can be extremely worthwhile to invest in proven data quality and accuracy
tools to help coordinate and sync information across databases
exploration stage will be key for us determining relationships between pairs or
small number of attributes. In the project this will show a direct correlation
between the area/ postcode to the income within the area. This analysis may
directly address your data mining goals. They may also contribute to or refine
the data description and quality reports, and feed into the transformation and
other data preparation steps needed for further analysis
In the project
I will be using classification and regression model. The modelling phase
Modelling is the part of the Cross-Industry Standard Process
for Data Mining (CRISP-DM) process model that most data miners like best. Your
data is already in good shape, and now you can search for useful patterns in
tree builds classification or regression models in the form of a tree
a classification problem, you typically have historical data (labelled
examples) and unlabelled examples. Each labelled example consists of multiple
predictor attributes and one target attribute (dependent variable). The value
of the target attribute is a class label. The unlabelled examples consist of
the predictor attributes only. The goal of classification is to construct a
model using the historical data that accurately predicts the label (class) of
the unlabelled examples.
–we take the categorical area vs the payment plans?
Regression modelling aims to estimate and identify a pattern between two
variables. IN this instance my dependent variable is the amount of consumers
who churned. As mentioned above, regression analysis estimates the
relationship between two or more variables. This will be through, investigating
the relationship between Churn and the different variables. The strongest
linear regression variable will be further investigates through the
classification predictive model.
Regression creates predictive models. The
difference between regression and classification is that regression deals with
numerical/continuous target attributes, whereas classification deals with
discrete/categorical target attributes. In other words, if the target attribute
contains continuous (floating-point) values, a regression technique is
required. If the target attribute contains categorical (string or discrete
integer) values, a classification technique is called for.
The most common form
of regression is linear regression, in which a line that best fits the data is
calculated, that is, the line that minimizes the average distance of all the
points from the line.
that the increase in churcn is being caused by the regional difference in
income. The payment plans are met by a few of long term consumers who are in
the more affluent rural areas. With options form competirots consumers in the
lower income post codes are faced with the easy transition to a lower priced
service provide.d The desire results from t hemodelling are identified in
Figure 5. This is the aimed results form the test data once the modell has been
Churn Rate =
(Customers beginning of period – Current
customers end of period ) / Customers beginning of period )
Customer Churn Rate = (15, 000-5000)/15000 = 66%
Modelling Results = (10,000-8000)/10000 = 20%
Figure 5 :
Training data versus test data.
The model is flexible enough to be amended and
adjusted to suit the marketing departemnts tareeget area. This can be filtered
into different departments investigating the different variables mentioned in
our data exploration elemnt of the project. This is also to match the companys
growth plan and to ensure it allows for the numbers to increase .
models should provide Western
Power Plc its customers, the testing period starts. It normally
takes up to a few months. The model is fine-tuned according to the results.
Such custom-built models have a solid advantage compared to automatically
generated models – they stay very flexible and can be developed according to
each company’s growing demands.
Power Plc Key concerns and objectives
laid in the reduciton of churn not just the discorvery of where the retention
is low. In order for the business objectives to be met the model can be used to
identify which variable is causeing either the single or the double churn
consumers to leav. A key
objective is to determine if there is some important business issue that has
not been sufficiently consideredFor
telecommunication industry, most of the studies have proposed customer churn
prediction by using data mining techniques as mentioned in previous section.
However, these methodologies have some disadvantages. Heuristic-based approach
and analytical methods are inconvenient for complicated problems, and also it
is hard to collect pure data for statistical methods. Moreover, a mathematical
model is required for simulation, correlated variables are unsuitable for
decision trees, and a clear confidential data set is needed for neural networks
(Lee et al., 2009).
networks gave more reliable results
will run for 6 months, and that both campaign response rate and churn rates
will be monitored regularly during this test period.
Depending on the results of
the assessment and the process review, you now decide how to proceed.Do you
finish this project and move on to deployment, initiate further iterations, or
set up new data mining projects? You should also take stock of your remaining
resources and budget as this may influence your decisions.
of possible actions
might have a data problem
practitioners that concern themselves with churn focus on a handful of “usual
suspect” reasons for churn: a customer progressed through on-boarding but
failed to fully realize value from the product, the sponsor who purchased the
product and evangelized it to the rest of the organization left, etc. Some
savvy practitioners look a bit more broadly than that.
are myriad things that influence churn that aren’t being recognized, and
remedied, by companies who maintain a narrow focus on the most obvious
influences and the narrow data set that describes them. Did your company put
out an ad campaign that angered customers? Perhaps your organization made the news… in a negative way.
To truly understand the reasons for churn, you must
widen your lens to incorporate all of the data from across the enterprise.
important to keep in mind that your customers’ interaction with your brand is
frequently bigger and broader than what you can find in your CRM or ERP. To
truly understand the reasons for churn, you must widen your lens to incorporate
all of the data from across the enterprise. Data from different departments —
HR, Ops, Finance, Marketing, etc. — even public data, may hold the keys to your
Soeini, K. V. Rodpysh. 2012. Applying Data Mining to Insurance Customer Churn
Management. IACSIT Hong Kong Conference. 30: 82–92. 3 J. Hadden. 2008. A
Customer Profiling Methodology for Churn Prediction.Cranfield University. 4
L. Bin, S. Peiji, L. Juan. 2007. Customer Churn Prediction Based on the
Decision Tree in Personal Handyphone System Service. International Conference
Service Systems and Service Management. 5 Y. Zhang, M. Berry. 2011.
Behavior-Based Telecommunication Churn Prediction with Neural Network Approach.
International Symposium on Computer Sciance and Society. pp. 307–310. 6 C.
Kirui, L. Hong, W. Cheruiyot, H. Kirui. 2013. Predicting Customer Churn in
Mobile Telephony Industry Using Probabilistic Classifiers in Data Mining.
International Journal of Computer Science. 10(2): 165–172. 7 V. Effendy.
2014. Handling Imbalanced Data in Customer Churn Prediction Using Combined
Sampling and Weighted Random Forest. Information and Communication Technology
(ICoICT), 2014 2nd International Conference. pp. 325–330.
Management: The Foundation of Contemporary Marketing …
By Roger J. Baran, Robert J. Galka
Handbook of Marketing and
edited by Shankar Ganesan
Fundamentals of Machine
Learning for Predictive Data Analytics: Algorithms …
John D. Kelleher, Brian Mac Namee, Aoife D’Arcy
D., Applied Predictive Analytics, Wiley, 2004 Brown, M., Data Mining for
Dummies, Wiley, 2014. Helberg, C, Data Mining with Confidence, SPSS 2002, 2nd
edition. Linoff, G.S. &Berry M. J., Data Mining Techniques: For Marketing,
Sales, and Customer Relationship Management, Wiley 2011, (3rd edition) McCue,
C., Data Mining and Predictive Analysis: Intelligence Gathering and Crime
Analysis, Elsevier Butterworth-Heinemann, 2007. Siegel, E.,
Analytics, Wiley, 2014. Taylor, J., Decision Management Systems: A practical
Guide to using Business Rules and Predictive Analytics, IBM Press 2012.
ch 1&2: chatty intro, gives a realistic picture Abbott ch 1: a bit deeper,
good intro, from a statistical background Linoff & Berry ch 1&2:
Standard text, a lot more detail Helberg ch 1: Historical perspective,
statistical background, short, rare Siegel ch 1&2: Good light intro, what
informed business person should know… Taylor ch 1&2: Different angle,
based on decision management
P., Greasley, A. & Hickie, S., Business Information Systems, 4th edition,
Prentice Hall, 2008. Especially sections on CSF analysis, pp 324-325 &
522-523. UoB classmark: 658.403 8 BOC Campbell, D., Stonehouse, G. &
Houston, B., Business Strategy: An Introduction, 1999. UoB classmark: 658.4012
CAM, Aldrich Good introduction in early chapters. Ference, T. & Thurman,
P. W., MBA Fundamentals: Strategy, Kaplan, 2009. UoB classmark 658.401 2 FER
Grant, Robert M., Contemporary Strategy Analysis, 9th edition, Wiley, 2016.
Overview Tom Khabaza, October 2016
SEMMA Applied in Industry – Slides
and A email: [email protected]