Model Servicing with R and Plumber

2020 Apr 30

At some point, every data scientist or machine learning engineer has come across surveys and blog posts with the following question: “For production environments, which is better: R or Python?”

Most of the time, Python takes the lead in this regard, whether because of its ease of learning or (I speculate) the fact that many users don’t understand the difference between a general-purpose language and a programming language designed for scientific computing and scripting.

There are numerous resources that criticize the use of R in production for several reasons, some of which are very fair:

Many users lack a background in software development;
Due to the previous point, there is no strong culture in the community for practices such as dependency management, testing, error handling, and logging (even with good tools available for all of these);
Hidden arguments in the language, such as the unbelievable [stringsAsFactors = TRUE](https://stat.ethz.ch/pipermail/r-announce/2020/000653.html), which was only corrected in version 4.0.0 (meaning it lacks backward compatibility!). In other words, a bug became a feature, and a major version update was required to fix a language behavior caused by a design error (a good explanation for this can be found here);
Users’ lack of familiarity with R packages/software that could ensure greater robustness for data products/inference in production, such as packrat for checkpointing and Docker for frontend setup.

However, in practical terms, it is not always possible for all data scientists, analysts, and other users to migrate to Python for various reasons (e.g., migration costs, training costs, business risks of removing something from production, etc.).

Consequently, some R users end up without access to ways to put their models into production and, most importantly, perform the servicing of these models (i.e., receiving requests and providing responses with predictions on a platform that will be, roughly speaking, like a web service where the API serves to communicate between two applications).

The idea of this post is to help these people gain the power to launch RESTful APIs in production for model servicing. With the help of an infrastructure team, this code can be placed on a server or in a Docker image, making it available to other applications.

To add a bit more realism to our example, let’s use the case of a bank called Layman Brothers [N1], which aims to deliver a machine learning service that informs whether a customer will enter a payment delinquency situation or not. For this, we will use AutoML to train this model (for simplicity’s sake).

Source: Steel factory by Gadjo_Niglo

AutoML (Automatic Machine Learning)

For those unfamiliar, the concept of AutoML (Automatic Machine Learning) is the process of automating the entire machine learning model training pipeline by training numerous models within a time limit or stopping condition (e.g., AUC, RMSE, Recall, Precision, etc.).

This allows even people who are not experts in Data Science and Machine Learning to simply pass data to the AutoML tool, which performs multiple training sessions with various algorithm combinations within a specific time frame.

The idea is to simplify the end-to-end training process, training numerous simple models or a combination of various algorithms (XGBoost, Deep Learning, GLM, GBM, etc.) with multiple hyperparameter combinations.

In other words, instead of a human manually testing various combinations, AutoML does it all.

In some cases, AutoML models even beat data scientists on Kaggle leaderboards, as we can see in the example below where Erin LeDell with only 100 minutes of AutoML training managed to place 8th in a Kaggle Hackathon:

Source: Erin LeDell Twitter

AutoML in R with H2O.ai

For the training of our Default Prediction model for Layman Brothers, we will use the R language and the AutoML implementation in H2O (I have already posted some tutorials and considerations about this tool here on the blog, it’s worth checking out).

In our case, we will use H2O’s AutoML because, in addition to using common algorithms, H2O’s implementation also includes the option of Stacked Ensembles of all previously trained models and gives us a leaderboard of the best models.

For training our model, we will use data from Layman Brothers in AutoML.

The project structure will have 5 folders with self-explanatory names: (1) api, (2) data, (3) logs, (4) models, and (5) src (where the source code will reside). Paths can and should be changed (recommended), and since we are using a dataset hosted on GitHub rather than in the project, the data folder is optional.

First of all, let’s load the logging library and use standard paths as constants to store our objects:

With the log created, let’s now install H2O directly from CRAN.

One point to consider here is that I am installing packages directly from CRAN because, at least for me, R’s dependency management tools do not have good usability compared to Homebrew, npm, and even pip.

Dependencies installed, let’s start our H2O cluster:

In our case, we will use all available CPUs on the machines in the cluster (cpus=-1). Since I am running on a single machine, I will limit the memory size to 7Gb.

Cluster started, let’s now load our data into H2O, split the training and test datasets, and determine the variables [N2] we will use for training the models:

Now that our data is loaded, let’s perform the training using AutoML:

In our case, we will use a maximum of 20 models (max_models = 20), with AutoML performing Cross Validation with 5 partitions (nfolds = 5), locking the random seed at 42 (seed = 42), and using AUC as the metric to determine the best model (sort_metric = c("AUC")).

There are numerous other options that can be configured, but we will use these for the sake of simplicity.

After training, we can store the leaderboard information in the log or check it in the console:

If everything went well, we will have the serialized winning model in the models folder, ready to be used by our RESTful API [N4].

To read the log information during model training, simply open the training_pipeline_auto_ml.log file in the operating system or run the command:

$ tail -F training_pipeline_auto_ml.log

during execution.

This can help, for example, to keep track of how long each phase is taking. If desired, error handling can be applied to the code, with subsequent logging of these errors to facilitate debugging.

With our model trained and serialized, let’s now set up our endpoint [N3].

Configuring the RESTful API Endpoint in Plumber

For servicing our models, we will use Plumber, a tool that converts R code into a web API [N4]. In our case, we will use Plumber to launch our API and service the model [N3].

First, let’s configure our endpoint. Briefly, an endpoint is a URL path that communicates with an API [N4]. This file will be named endpoint.r.

This endpoint will be responsible for linking our file containing the prediction function (which we will discuss later) to the HTTP requests our API will receive.

We will also include a log file, in this case named automl_predictions.log, where we will record all calls to this endpoint.

Attentive readers will notice there are 3 functions in this endpoint. The first is convert_empty, which simply inserts a dash if any part of the request information is empty.

The second is the r$registerHooks function, which comes from a Plumber object and records all HTTP request information such as the IP calling the API, the user, and the response time.

The third and final function is r$run, which determines the IP where the API will receive calls (host="127.0.0.1"), the port (port=8000), and whether the API will have Swagger active (swagger=TRUE). In our case, we will use Swagger to test our API and see if the service is working.

This will be the last script to be executed, and later we will see how it can be run without entering RStudio or other IDEs.

However, let’s now configure our prediction function within Plumber.

Configuring the Prediction Function in Plumber

In our case, we will create a file named api.R. This file will be used to (a) get the request data, (b) perform light processing on this data, (c) pass it to the model, (d) get the result, and return it to the endpoint.

This file will be referenced in our example on line 16 of the endpoint.r file.

Now, let’s understand each part of the api.r file.

Following the pattern used previously, we will start our file by locating the path where our model is saved to later load it into memory (line 27 - "StackedEnsemble_AllModels_AutoML_20200428_181354") and then start our logging (line 27 "api_predictions.log").

In this example, the serialized model is "StackedEnsemble_AllModels_AutoML_20200428_181354", which was the best on the AutoML leaderboard.

On line 34, we load the model into memory, and from that point, it is ready to receive data and perform predictions.

Logging and model loaded, now comes the part where we configure the variables the model will receive. In our case, we have the following code:

The characters #* mean we are defining parameters to be passed to the function.

Below, we have the command #* @post /prediction, which, roughly speaking, will be the name of the page receiving the POST method. [N4]

Now that we have the variables the model will receive properly declared for Plumber (meaning our API is capable of receiving data through requests), let’s create the function that will receive the data and perform the prediction:

This is a simple R function that takes the variables declared previously for Plumber as arguments.

Between lines 8-30, I converted all variables to numeric for a simple reason: When I pass the function directly (without conversions), Plumber doesn’t verify variable types before passing them to the model.

Because of this issue, I spent a few hours trying to see if there was a way to do this directly in Plumber, and there is; but in my case, I preferred to keep it inside the function and control the conversion there. In my mind, I can keep error handling inside the function itself and at least attempt some conversions if necessary. But that’s a personal choice.

Between lines 34 and 59, I build the data.table, and then on lines 62 and 63, I convert it into an H2O.ai object.

This conversion is necessary because H2O.ai models, as of the current version, only accept data objects in their own format.

Finally, between lines 62 and 70, we perform the actual prediction and return it in the function.

Next, there is a second function that takes the request body and displays the values in the console (these values can also be recorded in the log).

And thus we have our endpoint.r and api.r files created, which serve the following purposes:

api.R: I have the model loaded in memory, I receive data, I process these inputs, run them through the model, and return a prediction. Additionally, I am responsible for specifying which parameters the model will receive.
endpoint.r: I launch the API, receive requester information like IP and user, and reference api.R to handle the prediction work.

In your case, if you already have your model, simply work on the api.R and endpoint.r files, adapt the inputs to your data, and load your machine learning model into memory.

Now that we have our files, let’s launch our API.

Initializing the RESTful API

With our API and endpoint files properly configured, to initialize our API, we can run the endpoint.R file within RStudio.

However, since we are talking about a production environment, doing this manually is not practical, especially in an environment where changes are made constantly.

Thus, we can initialize this API by running the following command in the command line (terminal for Linux/MacOS users):

$ R < /<<YOUR-PATH>>/r-api-data-hackers/api/endpoint.R --no-save

Running this command will show the following image in the terminal:

With this command, we only need the files in the directories to initialize our RESTful API at the address http://127.0.0.1:8000.

However, accessing this URL in a browser won’t show anything, so we will use Swagger for testing. Access the following address in your browser: http://127.0.0.1:8000/__swagger__/

In the browser, you will see a screen similar to this:

To perform a prediction via the Swagger interface, click on the green icon labeled POST. You will see a screen similar to this:

Next, click the “Try it out” button and fill in the information in the fields we declared as parameters in the endpoint.r file:

Finally, after all the information is filled in, click the blue button containing the word execute:

Clicking this button allows us to see the result of our prediction in the response body:

The body of this request response we sent to the URL contains the following information:

[
  {
    "predict": "1",
    "p0": 0.5791,
    "p1": 0.4209
  }
]

In other words, given the values passed in the request, the Layman Brothers model predicted that the customer will enter a default situation. If we want to work with probabilities, the model provides this information in the response: the customer has a 58% probability of defaulting versus a 42% probability of not defaulting.

But for readers who have made it this far, some might ask: “Flavio, customers won’t enter our Swagger page and make requests. How will a production application use this model?”

Remember when I said this RESTful API would be like a web service? The point is that the main application, i.e., our Layman Brothers bank platform that receives credit information, will pass this information to our RESTful API servicing the models via HTTP requests, and our API will return the values just as we saw in the previous message body.

In more concrete terms: Once your RESTful API is running, your model is ready to be requested by the main application.

This HTTP call can be made by copying the curl command provided by Swagger, as seen in the image below:

In this case, to simulate the call the main Layman Brothers application must make, let’s copy the following curl command:

curl -X POST "http://127.0.0.1:8000/prediction?PAY_AMT6=1000&PAY_AMT5=2000&PAY_AMT4=300&PAY_AMT3=200&PAY_AMT2=450&PAY_AMT1=10000&BILL_AMT6=300&BILL_AMT5=23000&BILL_AMT4=24000&BILL_AMT3=1000&BILL_AMT2=1000&BILL_AMT1=1000&PAY_6=200&PAY_5=200&PAY_4=200&PAY_3=200&PAY_2=200&PAY_0=2000&AGE=35&MARRIAGE=1&EDUCATION=1&SEX=1&LIMIT_BAL=1000000" -H "accept: application/json"

After copying this command, paste it into the terminal and press enter. We will get the following result:

We received the same result as when we executed it in Swagger. Success.

To read our logs later, simply run the command:

tail -F api_predictions.log

within the logs folder, resulting in the following:

Here we have all the information we chose to record in the log file. Thus, if this process is automated, debugging or auditing of results can be performed if necessary.

There are two versions of this code on GitHub. This light version is in the r-api-data-hackers repository, and the more complete version is in the r-h2o-prediction-rest-api repository.

FINAL CONSIDERATIONS

The goal of this post was to show a step-by-step guide on how data scientists, statisticians, and other interested parties can set up a RESTful API entirely using R code.

The project itself, from a production coding perspective, has many limitations such as error handling, security, logging, request and response handling in logs, and running everything in a more isolated environment, like Docker.

However, I believe that after this tutorial, many practical problems regarding putting machine learning models into production in R can at least be addressed, giving more power to R developers and other interested parties.

NOTES

[N1] - Name with no connection to reality.
[N2] - The variables SEX (gender), MARRIAGE (whether the customer is married or not), and AGE (age) are for demonstration purposes only, like any other variable. In the real world, ideally, these variables would be eliminated to avoid bringing discriminatory biases into models and other ethical issues.
[N3] - There are numerous options for launching an API in production through hosting, which are simply paid services that handle part of the infrastructure and take care of some security and authentication issues, such as Digital Ocean, RStudio Connect, and there are resources for hosting Plumber in Docker images. In our case, we will assume this API will be placed into production on a networked machine where an infrastructure analyst or a data scientist can deploy it.
[N4] - Although the goal of this post is “make it work first, then understand,” it is extremely important to understand the aspects related to nomenclature and what each part of the REST architecture does. There are great resources for this here, here, here, and here.

REFERENCES

[1] - R in production
[2] - Syberia: A development framework for R code in production
[3] - From ‘R vs Python’ to ‘R and Python’
[4] - Production Code in R
[5] - What are the main limits to R in a production environment?
[6] - Talk: R is Production Safe
[7] - Package Management for Reproducible R Code
[8] - R in production Environments
[9] - Can I use R on production?
[10] - Why can’t R be used to write production grade code? Why is Python not used for prototyping also?
[11] - Deploying R Models Into Production
[12] - R developer’s guide to Azure