Em algum momento todo cientista de dados ou engenheiro de machine learning já se deparou com enquetes e blog posts com a seguinte pergunta: “Para ambientes de produção, qual é melhor R ou Python?“.
Para quem acompanha o debate de tecnologia através da academia, indústria, conferências, e na mídia já percebeu que a Inteligência Artificial (AI) e suas subáreas são os assuntos mais quentes no momento.
Some weeks ago during a security training for developers provided by Marcus from Hackmanit (by the way, it’s a very good course that goes in some topics since web development until vulnerabilities of NoSQL and some defensive coding) we discussed about some white box attacks in web applications (e.g.attacks where the offender has internal access in the object) I got a bit curious to check if there’s some similar vulnerabilities in ML models. After running a simple script based in ,, using Scikit-Learn, I noticed there’s some latent vulnerabilities not only in terms of objects but also in regarding to have a proper security mindset when we’re developing ML models. But first let’s check a simple example.
From MIT Tech Review article called “Google shows how AI might detect lung cancer faster and more reliably” we have the following information: Early warning: Danial Tse, a researcher at Google, developed an algorithm that beat a number of trained radiologists in testing.
In a very insightful article made by David Talby he discuss about the fact that in a second that a Machine Learning goes to production, actually this model starts degradate itself because the model contact with the reality, where the author uses the following statement: The key is that, in contrast to a calculator, your ML system does interact with the real world.
Most of the time we completely rely in the default parameters of Machine Learning Algorithm and this fact can hide that sometimes we can make wrong statements about the ‘efficiency’ of some algorithm.
In one experiment using a very large text database I got at the end of training using train_supervised()in FastText a serialized model with more than 1Gb.