Why you need enforce reproducibility as habit

2017 Aug 26

In the few months that I arrived in Movile, I saw some strange pattern about several “data analysis”. The pattern was: once the analysis is delivered by someone without any background of data science this kind of insight suddenly become a kind of dogma. In other words, no one will check the information, and in most of the cases there is no code, no commit at github, no .sql file or .R/.py file with the scripts used. The practical problem is: What if this information was deadly wrong? And worse: How to discover if this information was harmful to the business? Seeing this, my first mission statement as Data Intelligence Tech Lead at that time was to enforce to every BI Developer, Revenue Assurance Data Analyst, and Data Scientist that every code must be reproducible no matter what conditions. Every insight must be delivered with some code in github. Someone could say: “Wow… We have a little dictator here!” With this simple rule, we are having this not exhaustive list of positive effects:

We’re collecting until today a huge dividend about the reproductive science: Any opinion have a code behind, and this code can be tested for anyone with access in github. This avoids the “excel kid” to drive any decision making without one hand on their shoulder, BEFORE the decision making;
We unmasked several “BS artists” that exploit the lack of data literacy of our internal clients (e.g. analysts, managers, et cetera) showing unnecessary complexities or delusional estimates without any kind of method behind; and
We developed a culture to be very skeptic about our estimates, especially to what we do not know about the data (a.k.a. exogenous factors about the market, brazilian economy, and so on). In another words: We stop to guessing about what we don’t know at that time and MADE IT CLEAR for our internal clients.

To know a little bit more how we operate, this article was the key reference for us to built our culture of compliance and deployment.

No matter what time crunch you are facing, it’s not worth putting a flaky implementation of an analysis into production. As data scientists, we are working to create a culture of data-driven decision-making. If your application breaks without an explanation (likely because you are unable to reproduce the results), people will lose confidence in your application and stop making decisions based on the results of your application. Even if you eventually fix it, that confidence is very, very hard to win back.

Data science teams should require reproducibility in the same way they require unit testing, linting, code versioning, and review. Without consistently producing results as good or better than known results for known data, analyses should never be passed on to deployment. This performance can be measured via techniques similar to integration testing. Further, if possible, models can be run in parallel on current data running through your systems for a side-by-side comparison with current production models.

Don’t get me wrong: Without any kind of compliance about your analysis, your organization will be a house of BS artists and any benefit to extract insights of the data, will be contaminated with BS hidden bias and can lead to several disasters in decision making, as we already experienced.