Why I Believe We Are Entering a Data Science Bubble in Brazil
2020 Feb 12TL;DR: Make a rational and pragmatic assessment based on facts and market information before choosing a career change or even investment alternatives in Data Science & Machine Learning qualifications.
I know that today what I am saying might seem counterintuitive, at the same time that we have companies announcing they will hire 100 data scientists in a single year (even without any solid justification as to why and especially the expected return on these hires), or when the search volume for “data science” increases by more than 50% in less than 3 years, or in a simple search we can find more than 1,500 available jobs on LinkedIn.
Of course, my opinion is not the most popular currently, but what I see today are the same patterns of past technology bubbles in Brazil.
Who doesn’t remember the Microsoft Certifications bubble? (Does anyone remember the ill-famed SQL Server Maestro program?)
Or the Oracle certifications bubble?
The courses of the winning double PHP and MySQL that promised the highest salaries in the market?
Or the promises of the El Dorado of salaries and employability as a Web Designer with Corel Draw, Dreamweaver, and Flash courses?
And the ITIL, COBIT, TOGAF, or BABOK framework courses that promised us wonderful positions just by managing business processes? (When many universities and equally shady short courses, instead of teaching the basics of code, threw an entire generation of college students into “management” frameworks, while some of them cannot manage even their own financial lives).
Everything I am going to put here concerns the supply side of data scientists, and not the demand itself (i.e. this deserves a special post, but in practice very few companies know what they are hiring and there are people hiring only for signaling).
Thus, this post will be much more focused on career management than on a market portrait.
Far from being any kind of coach or something similar, my goal will be to invite the reader to reflect on some aspects that I judge as important according to some observations I have made in the market as a whole empirically.
Some aspects that signal that perhaps we are in a Data Science and Machine Learning bubble in my view are:
1) The Return on Investment (ROI) in a DS education in relation to salaries does not pay off: I will remove from this analysis MOOCs and sensational courses like Fast.ai by Jeremy Howard for a very simple reason: We in Brazil love a diploma and our educational system was shaped in a way that does not promote self-didacticism, but promotes a model based on tutoring where the teacher is not a facilitator but is the one responsible and guardian of the knowledge itself. What I mean is that this ROI analysis is valid exclusively for post-graduate and extension courses. I did a little research on some courses and there are some educations that cost more than R$ 30,000. Nothing against the value itself, however, let’s say the expected return on this investment is 4 years. That’s R$ 625/month for 4 years. In other words: from the day the course ends, our recent graduate already needs a raise of just over R$ 600 just to break even in relation to their education. Ah, and of course besides the opportunity cost of time (e.g. class time, commuting, meals, etc.) and the opportunity cost of money (e.g. using that money in some investment, stock fund, etc.). In view of a brutal recession that we had in recent years (and with income practically stagnant) I would consider very carefully an investment decision (or debt) of this size without a return perspective of at least 2 years.
2) The market is increasingly competitive and the entry barrier is almost nil, and while this is good, it can be a problem. If I had to describe the market, the phrase that most resembles how I see it is Bellum omnium contra omnes, or war of all against all. If you are a Computer Scientist, you will compete with Econometricians who know more about modeling than you; if you are a Statistician, you will compete with Computer Scientists who code more than you; if you are an Econometrician, you will compete with Statisticians who master a mathematical/statistical toolset better than yours, and all of them will compete with people with master’s and PhDs. The point here is that competition will get increasingly brutal and this in the long run is not scalable in terms of career given that these are disciplines that demand time for learning.
3) As soon as the market starts to have disillusionment with wrong hires and corporate frustrations with data scientists begin, selection processes will get more competitive and there won’t be good spaces for everyone. One thing I learned as a hiring manager at a period in my career was: At each disappointment due to a wrong hire, two brutal measures came into play: a) the requirements, tests, and demands increase much more and b) the salary increased in proportion given that the requirement would be higher if the candidate were approved. What I mean here is that positions will start to suffer an insane escalation of skills for the best positions and the market will be divided into “good positions” and “bad positions”. In other words: entering as a Data Scientist earning below market just doing SQL queries and messing with Excel macros is easy; the difficult part is going for the bowl of high salaries and using modern tools for solving difficult problems.
4) The flood of people without the slightest idea of what Science or Data is entering the area: Think carefully, at this exact moment thousands of people are entering the field without any kind of idea of the market because of hype, influencers, media, and other sources that promise the El Dorado of high salaries or describe Data Science as the sexiest profession of the 21st century. This is not scalable and at some point a large part of these people who are venturing will have a very big disillusionment given the reasons in item 3) or even when companies realize that the Excel nut who has been at the company for thousands of years knows more about the business and the numbers than all the people who keep making scripts copied from Towards Data Science on the Macbook Pro with retina display and foreign conference sticker.
5) Do you really think that suddenly all companies in the solar system have never heard of basic data analysis or basic statistics? That only now they woke up from a stupor of the last 25 years and they decided they need unicorns sent from the heavens to save companies from bankruptcy through insights generated from data? With or without data scientists every day in the solar system businesses are closed, sales are made, people buy things, and money circulates from one hand to another.
But let’s suppose you see me as a sell-out or a person with hidden interests because of what I said. Thus, do not believe me, but in the people and institution actors below; they do know what is best for your career:
a) Universities and some extension and post-graduate course professors: If your main source of income depended on the largest possible volume of students or if your institution served as the preferred proxy for hiring, would you announce that we are in an educational bubble (where we already have the incredible advent of useless diplomas in Brazil) or would you surf the wave and offer money-grabbing courses that are at least 3 years behind the market? One thing I see a lot are institutions that have been sleeping for more than 5 years in relation to data science or data in general that suddenly open a Data Science and Machine Learning course with a faculty that has never been in the market and with syllabuses that do not correspond with the reality of what is being done in companies and nor do they cover basic aspects that every scientist should know. Again, you don’t need to believe me: Go to extension and post-graduate courses and see which of them are teaching fundamentals of basic statistics, causal inference, calculus, algebra, etc.
b) Media: Do not be fooled, behind every announcement of companies hiring numerous data scientists there is an underworld of paid articles to generate something called Brand Awareness. Numerous companies use the media to publish articles and “news” that favor these companies (e.g. the magazine does a positive article about company A and months later a subsidiary company of company A pays for a 300 thousand reais ad for the same magazine) to give the perception that that company is cool and does incredible things, when in fact it is just linking its brand positively while the real jobs are actually closed (this when these jobs exist).
c) Conference talk: As a good conference rat, I have to admit that I fell for this a lot in the past. I would go to the conference and dazzled with the technology I would already make a study plan for the new tool for Data Science or Machine Learning. You go back energized from the conference to your company with the expectation of implementing those wonderful cases, but in reality, you end up closing meaningless task tickets in JIRA.
d) Influencers, technologists, and the like: These love when you keep changing technologies just as you change clothes for a simple reason: this gives more views on YouTube, more comments about what is “trending”, more articles on their Medium blogs, and more than that: they gain credibility only by showing what to do and not doing something corporately or academically and worse - all this without any kind of risk involved in case what they recommend goes wrong. Science and Engineering are things that walk slowly but with firm steps. A technology switch or even the adoption of new tools (at least in serious companies) is a process that can take years or must have a very reasonable reason to happen. I worked at a company where the queue software was only changed after many instability problems, and even with a new system our “failback” still remained in the old and always reliable text files. The point I want to leave here is: when seeing influencers, technologists, and the like talking about new tools and frameworks that you need to learn, be suspicious and understand that companies are not going to leave established technologies in which they have a body of knowledge and know-how to embark on your adventure as an “up-to-date” Data Scientist with the market.
e) Friends and colleagues who are already in the market: A small part of these people will never admit they are in a bubble for a simple reason: these people really believe they are excellent and deserving of their positions and nabobish salaries. However, some of these colleagues forget the unidentified causal factors that led them to have the position they have today. For example, it could be tenure, time in the DS position before the market hype, track record at the company unrelated to DS, or even the lack of someone who has technical knowledge but who also knows the business. And logically let’s not forget that we human beings are great at underestimating the role of chance and luck in our lives. At the end of the day, we are human beings and we love a romanticized narrative of reality. Now falling for the narrative is a matter of choice.
Finally, I want to put some lies that people tell about Data Science and Machine Learning:
The educations are not expensive: Stop and think: an education is not just the value paid, but the opportunity cost plus the time that will be invested as I stated previously. Data Science today is a very dynamic area that is changing very fast. Tools in less than 1 year gain totally different directions. Is it worth investing 18 or 36 months paying more than R$ 1,500 for an education and at the end practically all frameworks taught are already being discontinued or in other versions? All this for what? To get a job in a junior position or at most have a R$ 350 raise when you spent R$ 40 thousand on an education? It doesn’t seem like an ideal ROI to me.
There is a deficit of data scientists in the market: This is partially true. In this case, numerous companies need to modernize the way they do analysis, given that some are still in the descriptive or diagnostic paradigm and want to go to the predictive and prescriptive part. However, unless you work at companies that are really using data in their products like for example Nubank, Quinto Andar, Pipefy, Movile, and etc., most companies still need to get out of Excel and the combo “mean-standard-deviation-correlation-pie-chart”. And there is absolutely nothing wrong with that. The point is that the market thinks that a data scientist has skills of Data Engineer, Data Scientist, Software Engineer, DBA, requirements analyst all in the same role. So if you think you are going to reach your Data Scientist position doing analyses on dashboards like in Minority Report while sipping a delicious Italian wine while listening to Bach’s Toccata in D minor while having your insights, I have bad news: This is not going to happen. In the best case, you will be fighting with the DBA to have access to the database, you will find a lot of defensiveness from analysts who do not have a job as sexy as yours, and without business knowledge you will be lost.
I need to be a Data Scientist to succeed in life: There are many cool things in engineering outside of Data Science that are as important, such as DevOps, Site Reliability Engineering, Front/Backend Engineering, Data Engineering, Automation, Incident Response, Mobile Development, Security, Infrastructure, etc. And believe me, they are positions that pay very well, require knowledge that is difficult to acquire, and there will always be demand and they directly impact the business.
Final Considerations
Reading this account that borders on extreme pessimism some might say “but gee, so I shouldn’t go into the Data Science area?” my answer is “go, but understand the main problems of the area and know that this is not the only way. Keep in mind that frustrations of expectations in professional and financial aspects can be a reality and few people are talking about it”. I think there will always be room for good professionals, regardless of what they do, and good companies always hire good people even if they don’t have the budget, the job, or even the position itself. I hope this small account has shed some rationality and light on the Data Science career and serves at least for a reflection.
Notes
[1] This post was blatantly inspired by the 2010 account of the Brasília Real Estate Bubble which saw the main problem in the heart of the Brazilian economy regarding the credit bubble.
[2] This scribe refuses (at least here on the blog) to adopt the new internet writing standard of sentences with at most 7 very simple words. As I believe that the readers of this blog are people of advanced cognitive capacity, the sentences will remain complex. Reducing complexity to a language that promotes the dumbing down of readers has never been and never will be the focus here.