Can We Trust the Recommendations of AI?

AD SPACE
970x125

Spend a few minutes scrolling through posts about AI, and you’ll see a divide. On one side are warnings about hallucinations, errors, and the risks of relying on a system that can sound confident while being wrong. On the other is a growing comfort with using it for just about everything — drafting, planning, even making decisions — sometimes with surprisingly little scrutiny.

That divide makes it seem like the main question is how much weight we should give AI’s recommendations. But that may not be the right question. Across two recent studies, my colleagues and I looked at how people actually use AI when making decisions. The results don’t fit neatly with the idea that the problem is simply using it too much or not enough.

What Happens When People Use AI

In two studies, we asked people to complete a relatively simple decision task — selecting a small set of items from a larger pool of options[1]. In the first study, they could decide whether to view ChatGPT’s recommendations before making their choices. In the second, they completed the task first, were then shown ChatGPT’s recommendations — with half receiving lower-quality suggestions — and were given the chance to revise their answers.

This setup lets us look at something that often gets overlooked in discussions about AI. Not just whether people use it, but what they do with what it produces once they see it. Some participants ignored the recommendations entirely. Others incorporated them selectively. Others leaned on them more heavily. That variation turned out to matter.

In many cases, using AI helped. Participants who incorporated more of the recommendations into their decisions tended to perform better than those who ignored them. That part isn’t especially surprising. If a source of input is accurate and relevant, using it should improve outcomes.

But that’s only part of the story. The benefit wasn’t coming from using AI in any general sense. It depended on the quality of the recommendations and how they were used. When participants incorporated higher-quality AI recommendations, their performance improved. And when they incorporated poorer recommendations, their performance suffered. The same basic behavior, giving weight to AI’s recommendations, could lead to very different outcomes.

This seems straightforward, but there’s a catch. People didn’t consistently respond to those differences. Some incorporated lower-quality recommendations when they shouldn’t have. Others failed to make use of higher-quality recommendations when they could have. Only about half of those who received higher-quality recommendations chose to revise their answers, while more than a third of those who received lower-quality recommendations did the same.

Among those who incorporated higher-quality recommendations, performance improved by about 24 percent. Among those who incorporated lower-quality recommendations, performance dropped by about 18 percent (see Figure 1).

Judging AI Is Harder Than It Looks

One explanation for the patterns we observed is that people aren’t just deciding whether to use AI. They’re making a judgment about how useful or reliable its recommendations seem and then acting on that judgment. In our studies, those perceptions played a significant role. Participants who saw the AI as more useful or reliable were more likely to incorporate its recommendations into their decisions.

But those judgments weren’t always aligned with the actual quality of the recommendations. Some participants gave weight to suggestions that ended up hurting their performance. Others ignored recommendations that would have helped. It wasn’t a matter of people simply overusing or underusing AI. They were responding to its recommendations, but not always in ways that matched their quality.

Part of the challenge is that evaluating AI output isn’t a single decision. It’s a series of small ones — whether to look at it, whether to take it seriously, how much to incorporate, and what to ignore. Each of those steps creates an opportunity for things to go right or wrong. Those judgments don’t always track with the quality of the output. And when the output sounds plausible, those judgments can be harder than they seem.

Those judgments don’t necessarily balance out. When an AI recommendation is clearly wrong, it’s often easy to dismiss. But when it’s close enough to be plausible — well-structured, confident, and aligned with what we expect — it becomes much harder to detect where it falls short. Weak recommendations can still be accepted, while stronger ones may be discounted if they conflict with prior beliefs or intuitions. The result is a pattern of misalignment between what people use and what actually improves their decisions.

Why People Struggle to Evaluate AI Output

Two individual differences played a significant role in the patterns we observed. These factors shaped how participants interpreted and responded to AI’s recommendations.

The first was perceived trustworthiness. Participants who viewed the AI as more trustworthy were more likely to incorporate its recommendations into their decisions — regardless of whether those recommendations were helpful or harmful.

The second was perceived expertise. Although participants did not have any meaningful experience relevant to the task, many reported higher confidence in their initial judgments. That confidence made them less likely to revise their answers, even when the AI’s recommendations would have improved their performance.

Taken together, these factors help explain why higher-quality recommendations were not consistently used, and lower-quality ones were not consistently rejected. People were not simply reacting to the quality of the input; they had limited ability to evaluate that quality in the first place. They were weighing that input against their perceptions of the source and their own judgment — and those perceptions did not always align with what would have improved their decisions.

This brings us back to the broader question. The issue is not just how much weight people give to AI’s recommendations. It is how they decide when those recommendations are worth using. As AI becomes more integrated into decision-making, that judgment becomes increasingly important.

What's Hot

Starring Devon Sawa, Shudder’s Anthology Series HELL MOTEL is Back for a Second Season!

Can We Trust the Recommendations of AI?

The 100 Best Novels of All Time… Supposedly

ads

Can We Trust the Recommendations of AI?

3 Strategies to Ask for a Romantic Commitment Indirectly

How to Handle Difficult Bosses, Employees, and People

3 Tips to Make Any Gathering More Successful

Starring Devon Sawa, Shudder’s Anthology Series HELL MOTEL is Back for a Second Season!

Can We Trust the Recommendations of AI?

The 100 Best Novels of All Time… Supposedly

Starring Devon Sawa, Shudder’s Anthology Series HELL MOTEL is Back for a Second Season!