Roughly 4 to 5% of their customers are not renewing contracts. As each contract has a lot of value, this loss is huge. They want to predict in advance who is likely to churn and try to prevent it.
What was tricky
The business needed above 40% precision for a certain recall. Their internal data science team was not able to get any measurable precision (<2%). Despite extremely qualified data scientists using very powerful models (variants of Gradient Boosting Machines), the improvement was minimal.
What did we do?
In most champion, challenger scenarios like this (where we need to beat an existing model hands-down), we have a rule of thumb.
If the data is structured, don’t worry much about improving the algorithm. Focus on other aspects. Here is what we did:
Align the training and validation data sets: We realized they are defined incorrectly and the distributions are greatly different. We corrected it.
Try various lengths of past in the time series: Always, in time series one wonders just how much of past is good. 3 months or 3 years? There is just a sweet spot and you need to empirically find it.
Use technology where you can: The embeddings are a better way to represent most categorical attributes. Hence, if enough data is available, embed them all.
In 3 months, we moved the precision from 2% to 38% with these systematic experiments and with the same algorithm they had been using…
To have INSOFE faculty and data scientists solve your business problems, prep your engineering teams to face the real world complexities, visit here
Moral of the story
Even today, the basics of machine learning matter. Thinking about data, splitting it correctly, doing visual observations, systematic experimentations never go out of fashion!