Logging and Structured Events in Machine Learning APIs
2022 Sep 21Logging and Structured Events in Machine Learning APIs
Working with the operationalization Machine Learning APIs one of the biggest issues that I encounter regardless if it’s green/brown field project is the lack of comprehensive logging or some kind of structured events for further persistence and analysis.
And one of the consequences due to the lack of those mechanisms is that those projects do not have ways to debug or inspect any abnormal internal behavior.
In some cases that I worked/saw/know that such mechanisms exists, most of them rely upon some poor implementations that generate a lot of noise that forces Data Scientists and/or Analytics Engineers to go to what I call “Conformity Hell” that consists in {fetch > parse > conform > join} barely structured logs and database tables from other platforms.
Despite all efforts to make it reliable, the reality is that this is taking a lot of manual effort to generate quite brittle/error-prone results.
Problem: Poor response information
One of the biggest problems that I see right away when I go into an existing project is the impossibility of doing error/diagnostic analysis of the model and/or API service which blocks any attempt to retrain the model and make those services quite opaque from the operation’s perspective. And this is an universal complaint from Data Scientists and Machine Learning Engineers.
However, the first thing I’m going to investigate when I get some of those complaints is how the API response is built. Imagining a scenario of a Text Classification API, what I find most of the time is a response more or less like this: