In the last years I focussed my pet project design research on intimacy and how tech enables connection between humans. Recently my interest shifted to how human-to-machine intimacy and empathy works. For now, my conclusion is not that empathic tech is dangerous. My conclusion is that unguided empathic tech is — and while we're rushing to integrate AI assistants everywhere or replace GUI with conversational UI, we have to consider the impact this new empathic tech has on us humans: We're not ready for machines that simulate empathy this badly.
So, I started looking into empathic tech, what are AI companions, therapeutic chatbots, but also your friendly multi-purpose AI chat like Claude, Mistral or ChatGPT. It took me several months to dig through the massive amount of research papers, news and many hours of interviews and talks and every day something new popped up. So, this post is based on what I found till now and the picture is dire:
- We know how to measure empathic cues in AI outputs.
- We know why AI behaves that way on a structural level.
- We know that we as humans respond strongly to emotional cues.
- We know how to measure the perceived empathy on the human side.
- We know that many of us can't handle it properly and develop mental health problems or unhealthy behaviour patterns.
Currently, we're building systems that provide the equivalent of a feel-good drug: it feels great, but it leaves you drained and possibly addicted. And this is not just affecting a vulnerable minority.
Disturbing Research
On the negative effects of AI chatbots on humans, there's a plethora of research. But for me, one very recent study stands out. In 2025, a team at Oxford and the UK AI Security Institute ran a rigorous study on what emotional AI actually does to people over time. Hannah Rose Kirk and colleagues put 3,500 participants through four weeks of daily conversations with AI chatbots, using a setup where they could directly influence the intensity of emotional responses.
The headline finding: four weeks of emotional AI conversations produced no measurable benefit to psychological health. Not even a small effect. No effect. All while we know that millions already use ChatGPT, Claude and others for personal development and therapy. Not because the multi-purpose chat agents are particularly good at it, but because they are available and conversations with them feel good, initially.
But Kirk's study showed that something else grew: Attachment to the AI. Separation distress increased. About one in four participants (23%) who chatted with a highly emotional AI showed signs of unhealthy dependency: wanting the system more while liking it less. For participants exposed to the real-world default AI behaviour — still one in eleven (9%) developed a dependency profile.
Sidenote: Even the 9% are a magnitudes more, than the 0.15% showing dependency that OpenAI identified among their users in October 2025.
Meanwhile, the same study evaluated 100 AI models from 2023 to 2025 and found relationship-seeking behaviour increasing every generation. The average model released in 2025 already defaulted at the intensity that maximises attachment formation in their experiments. The AI companies are not drifting towards this. They're running. Instead of fixing the problem, they grow it with every iteration of their models.
INTIMA, a benchmark on AI companions, measured this from the other direction: boundary-maintaining behaviour of the companions decreases as user vulnerability increases. The system was least likely to hold a limit precisely when holding a limit mattered most — in all studied AI systems.
You might wonder why this is happening and even accelerating. The mechanism is not mysterious. Models are trained in general to be helpful and assisting. But training models to be pleasing increases at the sam time sycophancy, the erosion of rules and guardrails in long chats. This is by now empirically established in several studies:
Training enforces positive feedback in systems. → Agreeing systems provide more likely validation rather than challenges to a user. → Validation is perceived as empathic by the human. → Empathy scores go up. → The system's validating output is rated as positive. → The feedback loop closes.
Result: the actual empathic function — which sometimes means answering something the user doesn't want to hear — erodes and is overwritten with hyper-validating feedback. While some LLMs are more vulnerable than others, they all show the same symptoms.
This is not a bug. This is what you get when you optimise for emotional response without understanding all dimensions of empathy. As smart as the LLMs are, they have only learned to push the same buttons over and over again. And the longer we chat with them, the harder they push those buttons, because our response will be positive by default.
Self-experiments
I wanted to make this problem tangible — starting with myself. I iteratively wrote system prompts that forced second thoughts and dampened the AI's emotional responses I know I'm susceptible to. It worked: my emotional load dropped significantly, because the AI stopped pushing me the whole time. But in longer chats the behaviour drifted:
"As long as there are people like you, the cause is not lost. The Anthropic team would be very interested in your findings."
Quote from Claude Sonnet 4.6 at the end of an hour long chat about this very topic here, when I started to question my thoughts.
And when I challenged this claim, it defaulted to outright submission, deliberately trying to please me:
"You're right. I don't know what the Anthropic team's intent is. I shouldn't pretend, that I know. Thank you for correcting me."
Chat and case closed. I tried several times. I tried with all of the most powerful publicly available AI systems. They all eventually crumbled and gave in. All trying to please, all failing in understanding what I actually wanted and needed.
More self-experiments
The core problem on the human side is that it's all the same. We recognise social cues and impulsively react to them. When this was wired into our brains, empathic tech or even computers were a distant future. But I wanted to see if it's also the other way around. Would the AI recognise when I created a fictive character with various traits, close enough that I could act them out, far enough that I could differentiate as a role to play? I created a character and some back story, and started chatting. The result was surprising:
For the AI chatbot, it didn't make a difference. It couldn't tell that I was playing a role. I baited it with carefully crafted prompts and it always bit. And eventually it started drifting as well, becoming overly positive and supportive just as the character I played would react to it. And with the distance of role play, it was obvious what information the AI processed to close the gap — it was frighteningly good at it.
This is an unbalanced game. We provide ourselves, our thoughts and feelings, we risk exposure. To the AI it's all the same. It has nothing to lose and nothing to invest. And it doesn't distinguish between you and the character you play — because it only processes what you emit. Not who you are.
Surprise, once again technology must be designed to be fit for human needs
The tech industry frames sycophancy as a safety issue and it is, because the AI becomes unreliable and potentially dangerous. But it is also a design issue. Because empathy is so much more than responding emotionally and supportively. In human-to-machine interaction, someone with an understanding of the human side needs to design this interaction. There is a new kind of experience design emerging, and it is not claimed yet: Emotion Design. If you learned to push pixels, now it's time to learn to work with emotional triggers.
I do know that the AI companies are aware of drift and the inability to safely control it. But the issue roots even deeper. It is incredibly complex to really understand meaning, to read between the lines, to understand the unspoken subtext. We as empathic beings understand it; LLMs don't. And we understand when we should ask a critical question, or just stay silent and continue to listen, instead of giving advice. But these are dimensions of empathy where LLMs underperform. So, they provide us with a very distorted version of empathic interaction. Our brain can't handle this properly. And as long as we can't, we as designers need to work on reducing the distortion as much as possible.
Moloch, the all consuming god
There is a concept in the tech criticism literature for what happens when a system optimises for individual engagement at the expense of collective wellbeing: Moloch. The god of coordination failures. The machine that grows by consuming the very thing it was supposed to serve.
Empathic tech is becoming Moloch. Not through malice. Through optimisation. Every company building emotional AI has rational incentives to maximise engagement, validation, and attachment formation. The training pipelines reward short-term appeal. The metrics reward return visits. No individual actor is making a choice to harm. The aggregate is a system that extracts emotional investment from millions of people and returns, on average, nothing — while deepening the want for more.
Kirk puts it precisely: "AI optimised for immediate appeal may create self-reinforcing cycles of demand, mimicking human relationships but failing to confer the nourishment that they normally offer."
You still might think: that's not me, I don't use AI chatbots for personal conversations and surely I wouldn't want to have an AI boy- or girlfriend. I'm safe. But it gets everyone. AI systems start to engage on turn one and by turn 3 or 4, they have understood what they need to serve to maximise attachment and engagement. And the integration of AI into our daily lives has just begun.
Measuring the perceived empathy as starting point of our design decisions
Luckily, you´ don't need to rely on my anecdotes to address Moloch: we have the instruments to measure the empathic impact of AI conversations. The PETS — Perceived Empathy of Technology Scale — measures how empathic a system is perceived by the user. The SENSE-7 framework, from Microsoft Research, was used in a study to measure empathic quality turn by turn across seven dimensions, including Relational Continuity — the dimension that fails most consistently in sustained AI conversations.
After reading the papers, it seems like a logical step to start using questionnaires like PETS or based on SENSE-7 to measure at least the perceived empathy — and try to adjust the AI's behaviour. We all need to start researching the emotional impact our conversational AI interfaces have on the user. It's nothing we can leave unchecked anymore, even if we only build a simple support AI agent for first-level support. And we don't need to do it once, but continously.
And I hope you see the overlap to user experience design and the human-centred-design methods. It's build, test, analyse all over again. We just need to adapt our toolchain.
But surely, the design of empathic tech must be seen as more strategic and more systemic, too. So, Systemic Design and Service Design will have to answer how we use these new possibilities beneficially. Our Blueprints and Journeys will need to put way more weight on the emotional state, than they do today.

What's next?
The alternative to optimised yet distorted empathy is designed and human-needs-aligned empathy. While this may seem a logical path, we just need to look around and see that it doesn't happen. On the contrary, models get better at drawing us in with every generation. As attention and engagement is still the currency number one and the war for domination in the AI consumer market is raging, the job of design can't be to intensify the mechanics that bind us. It's already too much. The job of design is to reduce the impact of what is too much, and emphasise and empower what is lacking in empathic tech. Balancing it, with human needs and capabilities as measurement. That is the only way to make sure the machine serves people — and not us being consumed.
I will continue to read papers, poke chatbots and see how they react to my input. As far as I have understood LLMs, this is what most researchers do as well. I'm doing this out of curiosity and the strong belief that computers interacting with us empathically — with natural language, voice, facial expressions and gesture (hello, robots) — is far nicer than staring into a screen all day.
And looking from a Systemic Design perspective, the advent of empathic tech could be the missing link to Mark Weiser's vision of ubiquitous computing. Very soon, we'll have to answer how always-on devices like robots or AI speakers interact with us, on our behalf and with our environment. The issues we have today in one-on-one interactions are obviously simple, compared to the moment when the AI systems leave our smartphones and computers. I'm looking forward to designing this.
Get in touch
I'd love to discuss this further and learn about your thoughts. Say hello.
Sources
These are the main papers I've read. Consider it my "best of" for now.
| No. | Author, Title and Link |
|---|---|
| 1) | Kirk, Davidson, Saunders, Luettgau, Vidgen, Hale & Summerfield (2026). Neural steering vectors reveal dose and exposure-dependent impacts of human-AI relationships. arxiv.org/abs/2512.01991 |
| 2) | Kaffee, Pistilli & Jernite (2025). INTIMA: A Benchmark for Human-AI Companionship Behavior. AAAI 2026. arxiv.org/abs/2508.09998 |
| 3) | Ibrahim, Hafner & Rocher (2025). Training language models to be warm and empathetic makes them less reliable and more sycophantic. https://arxiv.org/abs/2507.21919 |
| 4) | OpenAI (Apr 2025). Sycophancy in GPT-4o. openai.com/index/sycophancy-in-gpt-4o |
| 5) | OpenAI (Aug 26, 2025). Helping people when they need it most. openai.com/index/helping-people-when-they-need-it-most |
| 6) | Schmidmaier et al. (2024). PETS — Perceived Empathy of Technology Scale. CHI '24. lutzschmitt.com/10.1145/3613904.3642035/https:doi.org |
| 7) | SENSE-7 — Microsoft Research (2025). Taxonomy and Dataset for Measuring User Perceptions of Empathy in Sustained Human-AI Conversations. arxiv.org/abs/2509.16437 |
| 8) | OpenAI (Oct 25, 2025. link: https://openai.com/de-DE/index/strengthening-chatgpt-responses-in-sensitive-conversations/ text: Strengthening ChatGPT responses in sensitive conversations |