Why you need enforce reproducibility as habit
2017 Aug 26In the few months that I arrived in Movile, I saw some strange pattern about several “data analysis”. The pattern was: once the analysis is delivered by someone without any background of data science this kind of insight suddenly become a kind of dogma. In other words, no one will check the information, and in most of the cases there is no code, no commit at github, no .sql file or .R/.py file with the scripts used. The practical problem is: What if this information was deadly wrong? And worse: How to discover if this information was harmful to the business? Seeing this, my first mission statement as Data Intelligence Tech Lead at that time was to enforce to every BI Developer, Revenue Assurance Data Analyst, and Data Scientist that every code must be reproducible no matter what conditions. Every insight must be delivered with some code in github. Someone could say: “Wow… We have a little dictator here!” With this simple rule, we are having this not exhaustive list of positive effects:
- We’re collecting until today a huge dividend about the reproductive science: Any opinion have a code behind, and this code can be tested for anyone with access in github. This avoids the “excel kid” to drive any decision making without one hand on their shoulder, BEFORE the decision making;
- We unmasked several “BS artists” that exploit the lack of data literacy of our internal clients (e.g. analysts, managers, et cetera) showing unnecessary complexities or delusional estimates without any kind of method behind; and
- We developed a culture to be very skeptic about our estimates, especially to what we do not know about the data (a.k.a. exogenous factors about the market, brazilian economy, and so on). In another words: We stop to guessing about what we don’t know at that time and MADE IT CLEAR for our internal clients.
To know a little bit more how we operate, this article was the key reference for us to built our culture of compliance and deployment.
Don’t get me wrong: Without any kind of compliance about your analysis, your organization will be a house of BS artists and any benefit to extract insights of the data, will be contaminated with BS hidden bias and can lead to several disasters in decision making, as we already experienced.