Some weeks ago during a security training for developers provided by Marcus from Hackmanit (by the way, it’s a very good course that goes in some topics since web development until vulnerabilities of NoSQL and some defensive coding) we discussed about some white box attacks in web applications (e.g.attacks where the offender has internal access in the object) I got a bit curious to check if there’s some similar vulnerabilities in ML models. After running a simple script based in ,, using Scikit-Learn, I noticed there’s some latent vulnerabilities not only in terms of objects but also in regarding to have a proper security mindset when we’re developing ML models. But first let’s check a simple example.
In a very insightful article made by David Talby he discuss about the fact that in a second that a Machine Learning goes to production, actually this model starts degradate itself because the model contact with the reality, where the author uses the following statement: The key is that, in contrast to a calculator, your ML system does interact with the real world.
Most of the time we completely rely in the default parameters of Machine Learning Algorithm and this fact can hide that sometimes we can make wrong statements about the ‘efficiency’ of some algorithm.
Edwards Deming said: In God we trust, all others must bring data.Source Wikipedia In face of a very nice thread of Cecile Janssens in Twitter I’m making this new statement for every ML Engineer, Data Analyst, Data Scientist hereafter: “IN GOD WE TRUST, OTHERS MUST BRING THE RAW DATA WITH THE SOURCE CODE OF THE EXTRACTION IN THE GITHUB“CLESIO, Flavio.
In one experiment using a very large text database I got at the end of training using train_supervised()in FastText a serialized model with more than 1Gb.
Ben Lorica talks security in terms of Software Engineering but at least for me the most important aspect of security in Machine Learning in the future it’s the model explainability where he says: Model explainability has become an important area of research in machine learning.
Start note: Favio Vazquez made a great job in his article about it with a lot of charts and showing that in modern Machine Learning approach with the tools that we currently have the problems of replication and methodology are being tackled.
Ensemble machine learning and forecasting can achieve 99% uptime for rural handpumps Abstract: Broken water pumps continue to impede efforts to deliver clean and economically-viable water to the global poor.