A Better Way to Make the Recommendations That Power Popular Platforms

When Uber Eats embraced a more data-driven approach in 2019, Yuyan Wang joined the company as an applied scientist and founding member of its data science team. To help keep customers, couriers, and restaurants happy and maximize long-term profits, the team set out to build better recommendations into the food delivery platform.

What’s known as recommender systems are the driving force behind the suggestions you see when you open an app like Uber Eats. They’re powered by algorithms that have been trained to understand what consumers want based on their previous decisions as well as other customers’ behavior. For example, when an Uber Eats customer who likes Tex-Mex opens the app, they might see a selection of restaurants that make fajitas and enchiladas.

These recommendation systems have become key growth drivers for multi-sided platforms — apps that connect multiple customer groups, like buyers and sellers (Amazon, eBay) or drivers and passengers (Uber, Lyft). YouTube has attributed 70% of its watch time to recommendations, while Netflix has reported that personalized suggestions now contribute to 80% of content consumption.

When Wang joined Uber Eats, its recommendations were geared toward getting customers to keep using the app. But they weren’t taking into account restaurants’ or couriers’ goals. Recommending popular restaurants, in theory, may increase the likelihood of consumers placing an order, says Wang, who recently joined Stanford Graduate School of Business as an assistant professor of marketing. But there are unintended consequences that could quickly cascade.

Well-liked restaurants might get over-recommended and then overwhelmed with orders. If they have a bad experience, they may not recommend Uber Eats to other restaurant owners. Customers would be annoyed by late deliveries and may not come back to the platform. Furthermore, delayed orders could make couriers late, putting a dent in their tips. More importantly, new or low-volume restaurants might not get the exposure they’re expecting and opt to leave the platform. In the long term, this would result in fewer restaurants using the platform, which would mean a worse experience for consumers due to lack of selection.

“When you optimize for only one side, it hurts other sides and ultimately the business,” Wang says. “For a platform to be successful in the long term, you need to model and take into account all sides of the business. It’s more profitable that way.”

In a new paper, Wang and Long Tao and Xian Xing Zhang of Uber Technologies show how they developed a new recommender system for Uber Eats that considers the often competing goals of multi-sided platform participants. The multi-objective hierarchical recommender, or MOHR, is a system that companies across industries – from Netflix to Etsy – can use to improve customer recommendations.

Upgrading the Carousel

Wang and her colleagues’ recommender system is the first to mathematically and holistically make customer recommendations in ways that benefit multiple stakeholders. The system also addresses the challenge of ranking and arranging suggestions on a page.

When the researchers began the project, Uber Eats didn’t have a dedicated mathematical framework for organizing carousels of recommendations where customers could scroll through categories like “Healthy Eats” or “Can Be Delivered in 25 Minutes.” “Carousels help to alleviate the cold-start issue, or the problem of not knowing what to recommend to new customers,” Wang says. “But platforms also have trouble ranking and arranging these carousels together with single restaurants on the same page” — much less in real time and in a personalized way.

Many platforms’ solution had been to use a mishmash of disjointed rules or expensive machine learning systems to make ranking decisions. “Our system gives platforms a holistic and mathematically principled way to do it,” Wang says.

Over the years, more people have realized that recommender systems focused on short-term engagement goals can lead to more clickbaity, poor content.

Yuyan Wang

The researchers conducted a field experiment, applying their recommender system to 2% of Uber Eats’ global consumers. The results showed significant improvements in consumer conversion, retention, and gross bookings. If the system had been applied to all consumers, they estimated Uber Eats would have seen a $1.5 million weekly increase in revenue. The company has since deployed the recommender system globally on its app homepage.

Wang and her colleagues designed their system with multiple modules so developers at different companies could use it piecemeal based on their unique needs. “Maybe you don’t have a hierarchical presentation on your page, but you do care about competing objectives,” she says. “You can use the system in a modularized fashion.” The system can also optimize one-sided platforms like news sites or clothing sellers.

Beyond the Black Box

The system syncs well with Wang’s overarching research interests, which lie in the intersection of marketing, machine learning, and statistics. After working at Uber, she was a senior research engineer at Google DeepMind, a job she held for four years. “I loved my jobs,” she says. “You can see the immediate, tangible impact of your work. When you order Uber Eats today, the recommendations are still powered by this framework, so it was a really fantastic experience.”

Wang moved into academia because she wanted to better understand and improve the long-term values of personalized products and services. “Over the years, more people have realized that recommender systems focused on short-term engagement goals can lead to more clickbaity, poor content,” she says. “I want to optimize long-term metrics, such as gaining repeat customers and getting customers to have a more fulfilling and meaningful long-term experience on the platform.”

More broadly, Wang is interested in using theory and behavioral insights to help design more transparent machine learning systems. She sees flaws in current design methods such as the black box model, in which developers cannot see the factors algorithms use to generate a given output. In another new paper, she details a collaboration with Google researchers where they tested a recommendation framework on YouTube that considers consumers’ intent when making predictions instead of relying on pure black-box approaches as most platforms do.

“More data and more computing power may make new AI models more powerful,” Wang says. “But you don’t really know why a consumer behaves in a certain way on the platform or why certain model architectures work better than the others. This is not the most sustainable way of doing AI research.”

“It’s great that more and more people are excited to leverage AI and machine learning to solve real-world business problems,” she says. “I’m excited to bridge that gap, and I see great potential for these two communities to be closer to each other and to leverage each other’s strengths.”