Some quick comments about Genevera Allen statements regarding Machine Learning

Start note: Favio Vazquez made a great job in his article about it with a lot of charts and showing that in modern Machine Learning approach with the tools that we currently have the problems of replication and methodology are being tackled.

It’s becoming a great trend: Some researcher has some criticism about Machine Learning and they start to do some cherry picking (fallacy of incomplete evidence) in potential issues start with statements like “We have a problem in Machine Learning and the results it’s not reproducible”, “Machine Learning doesn’t work”, “Artificial intelligence faces reproducibility crisis, “AI researchers allege that machine learning is alchemy and boom: we have click bait, rant, bashing and a never-ending spiral of non-construcive critcism. Afterward this researcher get some spotlights in public debate about Machine Learning, goes to CNN to give some interviews and becomes a “reference in issues in Machine Learning”.

Right now it’s time for Ms. Allen do the following question/statement “Can we trust scientific discoveries made using machine learning?” where she brings good arguments for the debate, but I think she misses the point to 1) not bring any solution/proposal and 2) the statement itself its too abroad and obvious that can be applied in any science field.

My main intention here it’s just to make very short comments to prove that these issues are very known by the Machine Learning community and we have several tools and methods to tackle these issues.

The second intention here it’s to demonstrate that this kind of very broad-obvious argument brings more friction than light to debate. I’ll include the statement and a short response below:

“The question is, ‘Can we really trust the discoveries that are currently being made using machine-learning techniques applied to large data sets?’” Allen said. “The answer in many situations is probably, ‘Not without checking,’ but work is underway on next-generation machine-learning systems that will assess the uncertainty and reproducibility of their predictions.”

Comment: More data do not imply in more insights and harder to have more data it’s to have the right combination of hyperparameters, feature engineering, and ensembling/stacking the models. And every scientific statement must be checked (this is a basic assumption of the scientific method). But this trend maybe cannot be a truth in modern research, as we are celebrating scientific statements (over selling) with the researchers intentionally hiding their methods and findings. It’s like Hans Bethe hiding his discoveries about stellar nucleosynthesis because in some point in the future someone can potentially use this to make atomic bombs.

“A lot of these techniques are designed to always make a prediction,” she said. “They never come back with ‘I don’t know,’ or ‘I didn’t discover anything,’ because they aren’t made to.”

Comment: This is simply not true. A very quick check in Scikit-Learn, XGBoost and Keras (3 of the most popular libraries of ML) shattered this argument.

“In precision medicine, it’s important to find groups of patients that have genomically similar profiles so you can develop drug therapies that are targeted to the specific genome for their disease,” Allen said. “People have applied machine learning to genomic data from clinical cohorts to find groups, or clusters, of patients with similar genomic profiles. “But there are cases where discoveries aren’t reproducible; the clusters discovered in one study are completely different than the clusters found in another,”

Comment: Here it’s the classic use of misleading experience with a clear use of confirmation bias because of a lack of understanding between tools with methodology . The ‘logic’ of this argument is: A person wants to cut some vegetables to make a salad. This person uses a salad knife (the tool) but instead to use it accordingly (in the kitchen with a proper cutting board) this person cut the vegetables on the top of a stair after drink 2 bottles of vodka (the wrong method) and end up being cut; and after that this person get the conclusion that the knife is dangerous and doesn’t work.

There’s a bunch of guidelines being proposed and there’s several good resources like Machine Learning Mastery that already tackled this issue, this excellent post of Determined ML makes a good argument and this repo has tons of reproducible papers even using Deep Learning. The main point is: Any junior Machine Learning Engineer knows that hashing the dataset and fixing a seed at the beginning of the experiment can solve at least 90% of these problems.


There’s a lot of researches and journalists that cannot (or do not want to) understand that not only in Machine Learning but in all science there’s a huge problem of replication of the studies (this is not the case for Ms. Allen because she had a very interesting track record in ML in terms of publications). In psychology half of the studies cannot be replicated and even the medical findings in some instance are false that proves that is a very long road to minimize that kind of problem.