Machine Learning Tetrad = Business Knowledge + Statistical Understanding + ML Algos + Data

In the post called Learning Market Dynamics for Optimal Pricing post of Sharan Srinivasan he talks about how AirBnb uses ML and Structural Modeling (Mathematical + Statistical Modelling) combined to get some results about the offer to guests the optimal pricing based in market dynamics based in the anticipation of the booking and the difference the time between the booking date until the check-in (also know as Lead Time).

This part of the post summarizes the whole point why they choose that approach:

Machine Learning vs Structural Modeling or Both?

Modern ML models fare very well in terms of predictive performance, but seldom model the underlying data generation mechanism. In contrast, structural models provide interpretability by allowing us to explicitly specify the relationships between the variables (features and responses) to reflect the process that gives rise to the data, but often fall short on predictive performance. Combining the two schools of thought allows us to exploit the strengths of each approach to better model the data generating process as well as achieve good model performance.

When we have good intuition for a modeling task, we can use our insights to reinforce an ML model with structural context. Imagine we are looking to predict a response Y based on features (X₀,…,Xn). Ordinarily, we would train our favorite ML model to predict. However, suppose we also know that Y is distributed over an input feature X₀ with a distribution F parameterized by ? i.e. Y~ F(X₀; ? ), we could leverage this information and decompose the task to learning ? using features (X₀,…,Xn), and then simply plug our estimate of ? back into f to arrive at Y in the final step.

By employing this hybrid approach, we can leverage both the algorithmic powerhouse that ML provides and the informed intuition of statistical modeling_. This is the approach we took to model lead time dynamics._

This post’s a good technical compass about the best combination for every modelling problem in Core Machine Learning always will be the tetrad: Business Knowledge + Statistical understanding of the data + ML Algos + Data.