How to choose between RMSE and RMSLE?
2020 Mar 22There are numerous articles describing RMSE and RMSLE, but here I will try to be as direct as possible regarding my choice between these two metrics.
When I just want to have the error measure in terms of model bias and variance, without considering any aspect regarding magnitude differences between what was predicted (y_hat) and what was expected in the validation set (y), I use RMSE.
Example: An erroneous prediction of {y=1, y_hat=2} will enter the quadratic average in the same way as {y=1000000, y_hat=1000500}; this means that the magnitude of the second prediction doesn’t matter and that I accept that it will influence the quadratic average (in this case, an error magnitude 500x larger than the first error).
When I want the measurement of bias and variance but don’t want to penalize errors occurring at different magnitudes, then I use RMSLE. That is, errors are isolated within the same order of magnitude between y_hat and y.
Using the previous example, in the case of an error of {y=1000000, y_hat=1000500}, the logarithmic term of RMSLE will adjust between y_hat and y and calculate the difference within the same magnitude before calculating the quadratic average. This means that, even with an order of magnitude much larger than the previous errors, the logarithm will smooth out these errors from these “large numbers” by removing the magnitude in the quadratic average.
As usual, the code is below: