More than 15 years ago, when Mohsen Bayati began working with machine learning in the healthcare industry as a postdoctoral research scholar at Microsoft, he was sure artificial intelligence systems would be ubiquitous in the medical field within five years.
While at Microsoft, he worked on an impactful study using AI techniques to analyze electronic health records in an effort to reduce hospital readmission rates. But even after exponential progress in systems such as OpenAI’s ChatGPT, Anthropic’s Claude, and Google’s Gemini, his prediction has not come to fruition.
“I was always frustrated — why is adoption so slow?” says Bayati, The Carl and Marilynn Thoma Professor of Operations, Information & Technology at Stanford Graduate School of Business. “But I learned over the years that it’s not purely a technology problem.”
Healthcare is a complicated business — there are many competing incentives, including patient health, cost, and legal and regulatory requirements — but the problem was more basic, Bayati learned. More human.
“The one thing that was missing from AI solutions was that clinicians didn’t trust them,” he says. “Even if a system can make the best predictions, if it makes one very bizarre prediction” — an output that is entirely fabricated and known in the field as a hallucination — “people lose all trust.”
In recent years, Bayati has set out to confront that problem, to build trust between people and the machines that are increasingly part of our lives. And he’s making progress, including with two studies that are focused on reducing errors in pharmacy medication directions and predicting prostate cancer recurrence. The papers are part of a body of work that seeks to improve healthcare and experimental design using data-driven learning, decision models, and mathematical and statistical physics theories (the study of the behavior of atoms and other particles).
Bayati’s research has changed healthcare practices and results: His work on online pharmacy errors led to a 33% reduction in so-called near-miss events in which an error is caught before it reaches a patient; his research on hospital readmission rates projected an 18% reduction in rehospitalizations.
“It’s not very common in academia to see your research have a direct impact in practice,” Bayati says. “When it happens, it’s really amazing.”
But all that impact almost didn’t happen.
Bayati came to Stanford in 2000 as a PhD student in math, which he had studied in his native Iran. His focus was the topology of multi-dimensional surfaces.
“The type of math I was working on was the furthest thing from applied math that you can think of — completely abstract, operating in hyperbolic worlds where the notion of distance is not like in our world,” he says, laughing.
The work was interesting, but Bayati found himself more attracted to the research his friends in the engineering department were doing. “They were working on math problems, too” but in an applied sense, he explains. “The idea that you could make an impact with mathematics became fascinating to me.”
Bayati switched to the electrical engineering department and, while earning his PhD in the field, interned with the theory group at Microsoft Research. Seeing the work of commercial researchers and their potential for impact was eye-opening, he says. He returned to the company as a postdoctoral researcher in 2007 and spent two years there, including in a group that included computer science theorists and mathematicians. That was where he first learned about machine learning and its predictive capabilities — when you “have some past data of an event and want to predict future occurrences of the event.” His team approached a hospital in the Washington, D.C., area and asked, “Are there problems you would like to predict?” The hospital readmission study emerged from that collaboration.
The skills Bayati was learning would have been relevant in many other contexts — online advertising also relies on predictions made by machine learning, for instance — but in the technology space, “much more sophisticated solutions [were already being] applied” while in healthcare, “you would immediately notice a gap” in methodology. That gap was appealing to Bayati; it gave him room to grow.
“You could call it an arbitrage opportunity,” he concedes. “If you could take the more sophisticated solutions and apply them to health care, you could have more impact.”
People-Powered AI
For the past several years, in addition to his teaching and advising at Stanford, where he holds courtesy appointments in the electrical engineering and radiation oncology departments, Bayati has been an Amazon scholar, consulting and collaborating with scientists at the company.
In that role, he worked on creating a system that incorporates language models that could provide medication directions such as dosage and frequency. Called MEDIC, the system is based on the expert guidance of, among others, pharmacists, making it a so-called human-in-the-loop AI solution. Compared to more traditional, benchmark large language models, in which experts in pharmacy are not involved in the training process, MEDIC was more accurate (it also has built-in guardrails, built with human input and designed to stop it from operating out of bounds).
Bayati and his colleagues also tested their system against the then-latest versions of ChatGPT, Claude, and Gemini. In some cases, those AI systems and the benchmark LLMs inaccurately implied a tablet or capsule form. They even erroneously added, misinterpreted, and even completely fabricated instructions — the kind of hallucinations that shake people’s faith in AI technologies. In addition to being more reliable, MEDIC was also faster and cheaper, requiring far less computing capacity.
“Implementing AI solutions such as MEDIC in real-world environments, especially those involving human interaction, presents challenges,” the authors wrote in their resulting paper, “Large language models for preventing medication direction errors in online pharmacies,” which was published by Nature Medicine in April. “It demands a lengthy period of evaluation and optimization, necessitating strong collaboration with [data entry] technicians and pharmacists. Building trust in AI-generated suggestions and incorporating ongoing feedback are essential.”
In other ongoing work, Bayati has also elevated the work of people in creating more accurate and trustworthy AI systems. He collaborated with doctors at Stanford to use patient-level clinical data to build an AI system that can predict prostate cancer recurrence. Using a sample of 147 patients, the team reviewed physician progress notes, including biopsy reports and clinical notes, to create a machine-learning outcome prediction model. Compared to a more traditional machine-learning model, the team’s model was able to “extract more signal” — basically, pull more meaning — from progress notes, and, ultimately, to predict prostate cancer recurrence more accurately.
Because the model required a time-intensive, manual review of clinician notes “it would be impossible to use in practice,” Bayati admits.
Nevertheless, the research “basically gives a North Star” for the use of AI in healthcare settings.
“This is what we should be able to do,” he says. “We should fix our AI to achieve that.”
Improving Experiments
Some of Bayati’s most recent work focuses not on predicting problems but on designing better experiments to test interventions in medicine and other fields. One paper, “Causal Message Passing for Experiments with Unknown and General Network Interference” was inspired by a statistical physics technique Bayati learned as a PhD student. The work tries to address the problem of interference in experiments. Say, for instance, that you are testing a COVID prevention policy in one city and comparing its performance to a policy in place in a nearby city. People from the cities might interact with each other — commuting to other locations for work or traveling back and forth to see friends or family. If one city is “sicker,” the other city might get sicker too, and the interaction among the treatment groups would skew the results of the experiment. Bayati’s research, with lead author and PhD student Sadegh Shirani, provides proof-of-concept for a technique they call causal message-passing, which can estimate the total treatment effect in the presence of network interference.
“We’re in the R&D phase,” Bayati says. “But the R&D phase is super promising.”
Shirani says the work is significant because it connects two fields: statistical physics and causal inference. “It’s exciting to use real-world intuition from physics to solve a problem in a totally different domain,” he explains.
Another recent work, “Optimal Experimental Design for Staggered Rollout,” tries to solve a different problem in controlled trials. Imagine that, to reduce the risk of interference, you decide to treat large metropolitan areas as treatment groups: all of the San Francisco Bay Area receives one treatment, and all of New York receives another, as opposed to individual cities or counties receiving different treatments. The chances of interference go down, but so too does the sample size of your experiment — there are fewer metropolitan areas in the country than there are cities and counties.
Bayati and colleagues at Stanford GSB, including professors Susan Athey and Guido Imbens, together with lead author Ruoxuan Xiong, then a PhD student, try to solve this problem by using staggered rollouts for the treatments.
“The paper uses the element of time,” Bayati explains. “We can generate more information by looking at each sample over time. The whole focus of the paper is how to choose start times to get the most precise information.”
Ultimately, the team’s proposed methods reduce the opportunity cost of experiments — in this case, time — by 50%.
“We could run the experiment in three months versus six months,” he says. “We can get answers faster.”
Xiong says Bayati “thinks very deeply about problems.”
“I learned a lot from collaborating with him — not just how to shape the problem and how to approach the problem but how to solve the problem and then write and position the paper,” she says.
Long-Term Focus
Bayati is a respected teacher, mentor, and advisor. Earlier this year, the Stanford GSB PhD Student Association awarded him its PhD Faculty Distinguished Service Award, calling him an “innovator,” a “remarkable applied mathematician,” and someone who “instills a joy for discovery.”
Shirani says Bayati is a hands-on mentor who takes the time to listen to students’ professional and personal concerns; the professor once walked for two hours with him around campus while Shirani shared the pressures he was facing. Another time, Bayati met Shirani in his office on a Sunday to work on the causal message passing paper. He also brought his young son, whom he was helping with his homework.
“At the same time that he was helping him with elementary school math, we were discussing a proof in our statistical physics paper,” Shirani says.
Shirani says he nearly gave up on the causal message passage paper a few months into the research. “I was almost convinced it was not going to work,” he remembers, “but he told me, ‘No, please keep working, and I promise you, you’re going to have some great results.’”
Shirani had a breakthrough a couple of weeks later. “I was so pleased,” he said. “He pushed me to follow the path.”
In addition to the chance to work with future scholars, one reason Bayati chose academia over industry was the “freedom” his job as a professor gives him to explore complicated and persistent problems in his chosen fields. “As a professor, you can take long-term bets in your research,” he says.
He enjoys the struggle of working — sometimes for years — on questions and the rare moments when “you suddenly figure out something completely new.”
“It’s not like a big unknown becomes 100% known,” he says. “It’s more like you’re in this dark room and you get a flashlight suddenly.”
Photos by Nancy Rothstein