Browse or search publications from faculty affiliated with the lab.
On Synthetic Difference-in-Differences and Related Estimation Methods in Stata
In this article, we describe a computational implementation of the synthetic difference-in-differences (SDID) estimator of Arkhangelsky et al. (2021, American Economic Review 111: 4088-4118) for Stata. SDID can be used in many…
Choosing the “Right” Default Donation Amounts for Each Donor to Balance Multiple Fundraising Objectives
This report describes insights gleaned from the Data Fellows collaboration between PayPal and the Golub Capital Social Impact Lab at Stanford University’s Graduate School of Business. By embedding researchers in PayPal’s charitable giving team,…
Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects
There are a number of available methods for selecting whom to prioritize for treatment, including ones based on treatment effect estimation, risk scoring, and hand-crafted rules. We propose rank-weighted average treatment effect (RATE) metrics as…
Federated Offline Policy Learning
We consider the problem of learning personalized decision policies from observational bandit feedback data across multiple heterogeneous data sources. In our approach, we introduce a novel regret analysis that establishes finite-sample upper…
Qini Curves for Multi-Armed Treatment Rules
Qini curves have emerged as an attractive and popular approach for evaluating the benefit of data-driven targeting rules for treatment allocation. We propose a generalization of the Qini curve to multiple costly treatment arms that quantifies the…
Qini Curves for Multi-Armed Treatment Rules
Qini curves have emerged as an attractive and popular approach for evaluating the benefit of data-driven targeting rules for treatment allocation. We propose a generalization of the Qini curve to multiple costly treatment arms that quantifies the…
Service Quality on Online Platforms: Empirical Evidence about Driving Quality at Uber
Forthcoming in Management Science
Online marketplaces have adopted new quality control mechanisms that can accommodate a flexible pool of providers. In the context of ride-hailing, we measure the effectiveness of these mechanisms…
Estimating Wage Disparities Using Foundation Models
One thread of empirical work in social science focuses on decomposing group differences in outcomes into unexplained components and components explained by observable factors. In this paper, we study gender wage decompositions, which require…
Service Quality in the Gig Economy: Empirical Evidence about Driving Quality at Uber
The rise of marketplaces for goods and services has led to changes in the mechanisms used to ensure high quality. We analyze this phenomenon in the Uber market, where the system of pre-screening that prevailed in the taxi industry has been…
Policy Learning with Adaptively Collected Data
In a wide variety of applications, including healthcare, bidding in first price auctions, digital recommendations, and online education, it can be beneficial to learn a policy that assigns treatments to individuals based on their characteristics…
LABOR-LLM: Language-Based Occupational Representations with Large Language Models
Many empirical studies of labor market questions rely on estimating relatively simple predictive models using small, carefully constructed longitudinal survey datasets based on hand-engineered features. Large Language Models (LLMs), trained on…
Data-driven Error Estimation: Upper Bounding Multiple Errors with No Technical Debt
We formulate the problem of constructing multiple simultaneously valid confidence intervals (CIs) as estimating a high probability upper bound on the maximum error for a class/set of estimate-estimand-error tuples, and refer to this as the error…
Towards Costless Model Selection in Contextual Bandits: A Bias-Variance Perspective
Model selection in supervised learning provides costless guarantees as if the model that best balances bias and variance was known a priori. We study the feasibility of similar guarantees for cumulative regret minimization in the stochastic…
The Heterogeneous Impact of Changes in Default Gift Amounts on Fundraising
When choosing whether and how much to donate, potential donors often observe a set of default donation amounts known as an “ask string.” In an experiment with more than 400,000 PayPal users, we replace a relatively unused donation amount ($75) on…
The Value of Non-traditional Credentials in the Labor Market
This study investigates the labor market value of credentials obtained from Massive Open Online Courses (MOOCs) and shared on business networking platforms. We conducted a randomized experiment involving more than 800,000 learners, primarily from…
Using Wasserstein Generative Adversarial Networks for the Design of Monte Carlo Simulations
When researchers develop new econometric methods it is common practice to compare the performance of the new methods to those of existing methods in Monte Carlo studies. The credibility of such Monte Carlo studies is often limited because of the…
CAREER: A Foundation Model for Labor Sequence Data
Labor economists regularly analyze employment data by fitting predictive models to small, carefully constructed longitudinal survey datasets. Although machine learning methods offer promise for such problems, these survey datasets are too small…
Digital Interventions and Habit Formation in Educational Technology
We evaluate a contest-based intervention intended to increase the usage of an educational app that helps children in India learn to read English. The evaluation included approximately 10,000 children, of whom about half were randomly selected to…
Impact Matters for Giving at Checkout
We conducted two experiments on PayPal’s Give at Checkout feature to learn about the effect of 1) information about charity outcomes on donations, and 2) exposure to these point-of-sale microgiving requests on subsequent giving. In this “…
Optimal Experimental Design for Staggered Rollouts
In this paper, we study the design and analysis of experiments conducted on a set of units over multiple time periods where the starting time of the treatment may vary by unit. The design problem involves selecting an initial treatment time for…
Evaluating Treatment Prioritization Rules via Rank-Weighted Average Treatment Effects
There are a number of available methods for selecting whom to prioritize for treatment, including ones based on treatment effect estimation, risk scoring, and handcrafted rules. We propose rank-weighted average treatment effect (RATE) metrics as…
Low-Intensity Fires Mitigate the Risk of High-Intensity Wildfires in California’s Forests
The increasing frequency of severe wildfires demands a shift in landscape management to mitigate their consequences. The role of managed, low-intensity fire as a driver of beneficial fuel treatment in fire-adapted ecosystems has drawn interest in…
Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization
In many applications, e.g. in healthcare and e-commerce, the goal of a contextual bandit may be to learn an optimal treatment assignment policy at the end of the experiment. That is, to minimize simple regret. However, this objective remains…
Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization
In many applications, e.g. in healthcare and e-commerce, the goal of a contextual bandit may be to learn an optimal treatment assignment policy at the end of the experiment. That is, to minimize simple regret. However, this objective remains…