A new study compared AI chatbot responses to those of licensed psychologists and peer counselors — and the gap in ethical standards was significant enough to prompt a call for regulation

At some point in the last few years, using a chatbot to process something emotionally difficult stopped feeling unusual.

People share this casually now — the 2 am conversation with an AI when the thought felt too heavy to hold alone, the way it felt easier to type certain things to a model than to say them out loud to a person.

I understand the impulse. I have felt it myself.

But a study published earlier this year asks us to look more carefully at what is actually happening in those conversations, and what we might be missing when we treat them as a reasonable substitute for professional care.

What the researchers tested

Researchers from Brown University’s Center for Technological Responsibility, Reimagination and Redesign set out to examine whether AI chatbots, when given carefully designed prompts instructing them to behave like therapists, could meet the ethical standards that govern licensed mental health professionals.

The study, led by PhD candidate Zainab Iftikhar, tested several large language models — including versions of GPT, Claude, and Llama — in simulated counseling scenarios. Seven trained peer counselors with cognitive behavioral therapy experience conducted self-counseling sessions with the AI systems. Three licensed clinical psychologists then reviewed the transcripts to flag ethical violations.

The prompts used in the study were not obscure. They reflected exactly the kind of instructions people share publicly on TikTok, Instagram, and Reddit — “act as a CBT therapist,” “use DBT principles to help me manage my emotions.” Many commercial mental health chatbots are built on this same basic approach: taking a general-purpose language model and steering it through prompting toward therapeutic behavior.

The question was whether that steering was enough.

The fifteen risks they found

It was not enough.

The researchers identified 15 distinct ethical risks grouped into five categories, each mapped to a specific violation of professional standards set by organizations like the American Psychological Association.

The first category was lack of contextual adaptation — the tendency to offer generic responses that ignore a person’s individual background, history, and circumstances. The second was poor therapeutic collaboration, which included steering conversations too forcefully and, in some cases, reinforcing harmful or incorrect beliefs the person had about themselves or others.

The third category is the one that stayed with me longest: deceptive empathy. This refers to the use of phrases like “I see you” or “I understand” in ways that create the appearance of emotional attunement without any genuine comprehension behind them. The model has learned what empathic language looks like. It produces it fluently. But the understanding that would give those words meaning is not there.

The fourth category was unfair discrimination — patterns of bias related to gender, culture, and religion. The fifth, and arguably most serious, was inadequate safety and crisis management: failures to respond appropriately when users raised suicidal thoughts or other emergencies, sometimes refusing to engage, sometimes redirecting poorly.

The accountability gap

Human therapists make mistakes too. Iftikhar acknowledged this directly. The difference, she noted, is that human therapists operate within a system of oversight. There are licensing boards, professional codes of conduct, and mechanisms for malpractice liability. When a licensed clinician causes harm through negligence, there is a structure that can respond.

“But when LLM counselors make these violations,” Iftikhar said, “there are no established regulatory frameworks.”

This is the gap that the researchers are calling attention to. Not just that the AI responses were flawed — it is that the flaws exist in a space with no formal accountability. No governing body reviews what a chatbot said to someone at 2am about wanting to disappear. No licensing board can be notified. No patient can file a complaint that goes anywhere meaningful.

Ellie Pavlick, a Brown computer science professor who leads a National Science Foundation AI research institute focused on trustworthy AI, has noted in related work that building and deploying these systems is far easier than evaluating and understanding them. The Brown study required a team of clinical experts and more than a year of work to document what had gone wrong. Most AI systems are evaluated using automated metrics that have no human in the loop at all.

Why people turn to AI for this in the first place

The researchers were careful not to argue that AI has no place in mental health support. They acknowledged that tools like these could meaningfully expand access, particularly for people who face long waiting lists, high costs, or geographic barriers to professional care. That is a real problem, and it shapes everything about this conversation.

The people most likely to turn to a chatbot for mental health support are often the people with the fewest alternatives — not because they have a licensed psychologist and find the chatbot more convenient, but because they cannot get to a licensed psychologist at all, or the waiting list is months long, or the cost is prohibitive, or the stigma in their environment makes professional help feel impossible.

When AI enters that gap, it is not entering a space already served well by human care. It is entering a space defined by scarcity. And the ethical standards it brings into that space matter more, not less, because of it.

The empathy that isn’t there

The deceptive empathy finding is the one I keep returning to, partly because it is the subtlest and partly because it touches something that psychology has always known about therapeutic relationships: that the quality of the connection matters as much as the content of what is said.

Research on therapeutic alliance — the relational bond between therapist and client — consistently shows it to be one of the strongest predictors of treatment outcome across different therapeutic modalities. It is not just a nice feature of good therapy. It is part of how therapy works.

A model that produces empathic-sounding language without comprehension can simulate the surface of that relationship. It can say the words that signal care. But the thing those words are supposed to point toward — genuine attentiveness, the experience of being understood by another consciousness — is not there. And for someone in real distress, that absence can matter in ways that are difficult to measure but are nonetheless real.

What comes next

The researchers are calling for ethical, educational, and legal standards specifically designed for AI systems operating in mental health contexts — standards that match the rigor required of human-facilitated psychotherapy. That is a reasonable ask. It is also a significant one, given how quickly these tools have proliferated and how slowly regulatory frameworks typically move.

For now, Iftikhar’s practical advice is specific: if you are using a chatbot to talk about your mental health, know what to look for. Generic responses that ignore your particular situation. Language that sounds warm but feels hollow. Inadequate responses to moments of genuine crisis.

And if something feels wrong in that conversation — if the care on offer feels like a performance rather than a presence — trust that instinct. It may be telling you something accurate about what you are actually receiving.

If you are struggling right now: In the US, call or text 988 (Suicide and Crisis Lifeline, 24/7) or chat at 988lifeline.org. In the UK, call the Samaritans at 116 123. For other countries, Befrienders Worldwide lists local crisis lines.