Fruit in the Hallway and Availability Bias in Recommendation Systems
2020 Feb 11As some of my three readers know, I have been taking some courses completely outside the Data Science/Machine Learning field, such as Air Accident Investigation, Evidence-Based Medicine, Econometrics, and Causal Inference.
I will write in the future about why I am taking these courses (reason: because I think we are in a DS/ML bubble); but the point I want to make here is that one of the fundamental principles of Evidence-Based Medicine is closely related to understanding the level of uncertainty and how to determine a treatment (and especially not to treat anyone if the result is weak).
When we think of recommendation systems, we always imagine success stories like Netflix, Spotify, Amazon, among others, where recommendation systems leveraged these companies to success.
However, some causal aspects and/or lack of understanding of random factors are generally not discussed in depth. And this lack of discussion is also reflected in papers from important conferences like WWW, ACM RecSys, and ICML, just to give some examples where there is a lot about results but little talk about uncertainty aspects that may have influenced these same results.
As a consequence, we can be influenced to adopt a posture focused on results, ignoring the fact that these results may be the fruit of chance alone and not of an intrinsic competence of our work.
In this post, I will speak specifically about an uncertainty factor in Recommendation Systems which is availability bias. This is a real problem that can happen in production systems, especially in model/system evaluation protocols. At the end, I will discuss an alternative that minimizes this problem.
However, let’s enter into a hypothetical situation to exemplify this point in a more accessible way.
Ideas and fruit in the hallway…
Imagine the following scene inside an office: There is a hallway in the office that people use to move between two points. For simplicity, let’s say this hallway is a common passage like any other.
One day, the Human Resources team observes an academic work that attests to a correlation between workplace improvement and fruit consumption.
Immediately after reading this article, the HR team thinks about performing the following intervention: “Why not distribute fruit here in the office to increase job satisfaction?”
Next, the HR team thinks of a way to implement this intervention: “Why not leave some fruit available in a place where all people can have access? This way everyone can pass by and take whatever fruit they want.”
So the HR team puts a tray of various fruits in the hallway, leaving the fruits accessible to everyone freely.
The idea is as follows: The fruits will be available for three weeks in the hallway. After this period, a satisfaction survey will be conducted to measure the level of well-being at work and the effect of this new HR policy within the company.
Measurement and discovery of great results
Three weeks later, the HR team conducts the satisfaction survey and the result was a 70% increase in employee satisfaction, after all, who doesn’t like free fruit at work?
With this great result, the HR team implements the fruit available to all employees as a permanent policy.
It is at this point that we see cases like “Our company implemented free fruit as a policy and increased satisfaction by 70%”, cases appear in HR conferences, numerous success stories and fanfics on LinkedIn, and in some cases, the person responsible for the project becomes Chief People Officer with a portion of promotions for people on the team in the wake of this success.
However, let’s look at what was not said.
Chance being measured as competence
After six months, the HR team notices a level of stagnation in satisfaction indicators even with the implementation of the fruit-in-the-hallway policy.
With this in view, the HR team and the People Analytics team initiate an investigation and resolve to conduct some experiments, which here we will call the “A/B/C Fruit Corridor Experiment”
In this case, three fruit baskets will be placed on alternating days of the week following these configurations:
· Basket 1: Bananas only
· Basket 2: Apples only
· Basket 3: Assorted fruits
After one month of experiments, the following results were obtained:
· Basket 1: Bananas only – +1% improvement
· Basket 2: Apples only – +2% improvement
· Basket 3: Assorted fruits – +1% improvement
Performing a simple comparison, we have the following table:
| Policy (intervention) | Result (Satisfaction) |
|---|---|
| Implementation of the fruit policy | +70% |
| Optimization and experiments | +1% |
We can see in this example that even with more effort in terms of analysis, experimentation, implementation, and optimization, the gain was practically marginal.
What may have happened in this case is that people might not have taken the fruits due to the recommendation arrangement provided by HR in the A/B/C test, but rather they consumed them just due to the fact that they were available for consumption.
In other words: People opportunistically took the fruits just because they were available.
What do you mean by availability?
No recommendation system in production is free from some kind of availability bias. Human action still carries a certain degree of non-determinism, which means that no matter how good the recommendation is, some people will interact with the system just because it is available.
In all recommendation systems where there is a passive utilization (i.e. not actively sending recommendations like push notifications, email, etc.), there may be a certain potential for people using it opportunistically.
Practical examples of opportunistic aspects unrelated to the recommendation itself:
- How many times were we not thirsty, but stopped at a water fountain just to drink a little water to keep ourselves hydrated?
- How many times did we not have much to throw in the trash, but ended up disposing of it in the nearest trash can?
- How many times when passing by a tourist spot do we not get some street food just because we are there?
These are cases of opportunistic utilization due to a potential availability bias.
A simple alternative
To initially measure availability bias, I always like to use a random baseline at the beginning of every project. In this way, I (a) make sure of the role of randomness in the recommendation (or of other unidentified causal elements and/or confounding variables) and (b) with this information available, I can better measure the real impact of other variables during experimentation. Below is an example:
| Policy (implementation) | Result |
|---|---|
| Random recommendations | +10% |
| Policy #1 (A) | +14% (Adjusted +4%) |
| Policy #2 (B) | +12% (Adjusted +2%) |
That is, right from the start I have 10% availability bias no matter what recommendation is placed on the screen in a new system. I will remove these 10% of performance from my result because they will be achieved just by the fact of being available.
In this way, with just a random baseline, I can already have comparison parameters to know how much of the result is due to availability.
Final Considerations
Having an availability bias in a recommendation platform is not a problem at all and is sometimes even expected in new systems. The big problem is when the effects of availability and the whole burden of uncertainty that this bias can carry are confused with effects of the algorithm implementation itself. The final consideration I leave here is that before any implementation of recommendation systems, one should have an a priori understanding of the uncertainty factors and counterfactuals that may influence the result.