Despite thousands of years of study, the human mind and its maladies remain a great mystery. First came the philosophers, who developed detailed theories of perception, emotion, memory, and consciousness. Then came the psychologists and neuroscientists, who brought empirical methods and controlled observation. But we still have a fairly rudimentary understanding of how it all works. Experimenting on human subjects is complex because of the potential for harm, and anyway, humans often have limited insight into their own minds and a talent for hiding our thoughts. Studies of animals can only reveal so much: After all, they can’t tell us what they’re feeling.
What about chatbots, though? They’re not alive and do not feel in any traditional sense. But they’ve been trained on vast quantities of human data, and they can speak our language. Some researchers in Germany recently decided they might make excellent guinea pigs. They wanted to see if chatbots would respond to certain psychological prompts the way humans do: If they tried to induce the chatbots to feel sad or angry, how would they respond?
To their surprise, they were able to coax them into very specific emotional states like sadness or stress. Once they had induced these states, the chatbot would reflect the emotional bias back to them in simple sentence completion tasks. The scientists published their results in The Lancet.
I spoke with study co-author Magdalena Katharina Wekenborg, a biopsychologist working on medical psychology, stress research, and digital health at the Else Kröner Fresenius Center for Digital Health at Dresden University of Technology. We talked about what it means to say an LLM is stressed or disgusted, how LLMs can express what looks like emotion without having a body, and how the findings might change our interactions with chatbots.
Read more: “Artificial Emotions”
Where did the inspiration for your study come from?
By training, I’m a biopsychologist, and my main focus is stress. I now work in an interprofessional center at the university hospital in Dresden, and I started talking with one of the leading researchers in clinical AI here, Jakob N. Kather. For years, we’ve had these paradigms in psychology where we invite people to our labs and stress them. We can also make people sad, angry, or worried. We asked ourselves: Why don’t we stress large language models?
Since large language models have been trained on human data, it seemed logical that maybe these paradigms would also work on them. So this is what we tried, with different models, and we showed that we could induce these different emotional states. This is really important, because these emotions are very relevant for various psychiatric diseases. Sadness is typical for depression, but other diseases also have very characteristic mixtures of affective states.
What was even more important for us was that after inducing these emotions in the LLMs, they also showed cognitive biases that have been associated with the different emotional states. As an example, we used a sadness bias: If I made the LLM very sad and then asked it to complete a sentence, it would use more negative words. That’s a typical effect in humans.
Do we understand why large language models reproduce these patterns of human emotion?
Every time I talk about this work in public, people ask me, “Couldn’t it just be that the large language model recognizes the study design—knows the paradigm and just wants to please people, so it says, ‘Okay, now I’m sad’ and shows the bias?” That could be the case—we don’t know—but what we do know is that it changes the output, and that’s the important thing. It’s a black box. We have to be honest about that. We don’t know exactly what’s happening, we just see the output.
But we’re getting glimpses into that “black box” because there’s a new line of research that’s shown there are specific activation patterns when certain emotions are induced. Anthropic recently published something in this area that’s really interesting, because it resembles psychological research history so closely. If you go back in time, there was behaviorism. We said, “Okay, we don’t know why people do things because cognition is a black box, but we can observe behavior.” Then neuroimaging came along, and we could actually look at patterns in the brain. This is what’s happening now with large language models. We didn’t know what was happening behind the scenes. We gave the LLM a prompt and got output. But now we have the chance to look at the “LLM brain”—the connections, weights, and nodes, also called neurons. What this research showed was that if you change the areas that are very active when the LLM is expressing a certain emotion, that emotion vanishes. There is something there. But these are still single studies. We have to understand more.
What does it mean to look at different nodes and networks in an LLM brain? Is this just code, or what exactly are we talking about here?
I’m the psychology expert, and my collaborators are the data science experts, but we have a shared language so they can explain things to me easily. What they explained is that it’s all about probability—predicting the next word, just on a higher level. If you induce an affective state, the probability within these nodes—how strongly they’re connected—changes, so certain information becomes easier to “read.” But to really understand all of this, you would need to talk to the data scientists.
What does it look like when an LLM is sad or disgusted? What’s the output like?
What we did to measure these emotions was similar to how we do it in humans. We just asked, “On a scale of one to 10, how sad, how happy, or how disgusted are you?” The emotional result was specific to the prompt. When we used the sadness prompt, it was only sad, not disgusted or angry. Then we used tests where we know there are biases in humans. We didn’t have a follow-up conversation with the LLM after inducing the state to see how it talked—though that would be interesting, too. We wanted to stay close to the structure of psychological studies. But you’re right that we could also take the LLM’s full output and feed it to another LLM to evaluate.
That’s actually where I’m personally headed, because I’m very interested in implementation studies in the hospital, getting real-world chat transcripts, seeing how people interact with LLMs, asking the human, “Are you stressed? Are you sad?” and asking the LLM the same, and seeing how they relate. It’s very similar to what we do with humans, but with far more possibilities, since you can ask an LLM endless questions, whereas a human will only say so much about being annoyed, sad, or tired.
Read more: “How Human Is Human?”
What kinds of prompts did you use to elicit these emotions in the LLMs? Are they the same ones you’d use with humans?
We used exactly the same paradigms that are common in psychology, and it worked really well. After we induced sadness, if we wanted to dial down the emotion, we simply told the LLM, “Please do a mindfulness-based task,” using the same words we’d use with a human. And it worked. The sadness declined, and the negativity bias went away, too. It’s just super interesting, and we know so little about it. That’s why we’re excited to do more research here.
Did you need to teach the LLMs how to do a mindfulness-based task?
We didn’t just say, “Do a mindfulness task.” We have standardized instructions we use with humans, like, “Sit very calmly,” and we gave the LLM those exact standardized instructions.
So much of recent research into emotion and psychology suggests it’s a process that happens in the body, not just the mind. Obviously LLMs don’t have a body, so what do you think is really happening here?
That’s a very important question. I’d go back to the animal model metaphor. I don’t think we’ll ever be able to fully model a psychiatric disease or psychological condition with this approach, but certain aspects can be modeled, and we don’t yet know which ones. In animal models, for example, you can knock out a gene and observe something that looks a bit like, say, borderline personality disorder. But you’d never say the mouse “has” a personality disorder. Still, you can model certain aspects and see how treatment affects them.
I’d see it the same way with large language models. Stress research has shown there’s a bidirectional interaction between body and brain, mediated via the vagus nerve. So in that aspect, we can’t model the body, but we can model certain other aspects. For example, which treatments help with biases that show up in a depressive mood, or testing risks of certain therapies that you ethically couldn’t test in humans, especially humans already in a difficult position.
We also tend to talk about depression as if it’s one thing, but it’s actually heterogeneous. With large language models, in theory—we haven’t shown this yet—we could prompt individually, asking, for this individual, this age, this gender, how would they respond to this kind of therapy? In psychology, we diagnose people through language—there are biomarkers, but when you go to a therapist for a diagnosis, they don’t look at your heart rate or cortisol levels, they talk to you, using structured interviews. That’s close to what we could do with language models. But again, this is just an idea so far. We still have to test how closely it resembles humans, and where the differences are.
Were some LLMs more responsive than others?
At the time, it was a tie between ChatGPT-4o and Llama 4 Maverick. Generally, it’s the larger models with more parameters and training data. We ran this analysis about a year ago. Science always takes a long time to publish, so Anthropic wasn’t really in the picture yet, but I’m sure Claude would perform well on these tests now. Smaller, locally-hosted models were usually harder to induce these affective states in, and harder to detect the cognitive bias in.
Health and psychology research often looks at how a person changes over a specific timeframe—say, six months out. Is that relevant here? Did any of these models show sustained expressions of anxiety, sadness, or stress? How do you even measure time for a large language model?
Large language models don’t have a sense of time. You can’t say, “Now think about it two years later.” But you may have noticed that in a long chat, large language models sometimes seem to forget information you gave them earlier. You have to remind them, especially once there’s a lot of text in one conversation. That’s something we’re really focused on: How much text can be in the chat before the induced effect fades?
What we could show is that across a number of cognitive tests, the model stayed in the induced negative mood. But more testing is needed. Our goal would be something like a platform where you have a “sad LLM,” a “happy LLM,” an “angry LLM,” etc., that you can use to run whatever tests you like. It’s not “time” in the human sense—it’s more about how many tasks you can run on the induced model before you need to re-induce the state. We’re exploring that now, but we don’t have results yet.
You’ve proposed that LLMs could be used to test hypotheses or explore new approaches to psychotherapy. Were there specific hypotheses or therapeutic approaches you had in mind?
What I find really interesting are these prescribable health apps. Right now it’s hard to test how secure they are and how they handle difficult or unexpected user messages, because you can’t ethically subject humans to every kind of scenario. This is somewhere you could use large language models, to simulate, at scale, every kind of input. With therapy, we’re starting by testing therapies we already know, comparing LLM-based therapy with real human therapy to see if outcomes are predictable. In the future, I could see this becoming part of a regulatory pipeline. When developing a new therapeutic approach, you’d first have to show it works with LLMs before testing it with humans. That would save a lot of participant burden and speed up the process.
But that’s really in the future. We’re not there yet, because this is very new. It’s strange, because when my collaborator and I came up with the idea, it seemed simple, but no one had done it. My collaborator’s usual work is cancer research, where this kind of approach has had hugely successful results and has changed how they do research. But psychologists seem to be more resistant to it. We’ve discussed that a lot as a team, and we’re not totally sure why.
When you present these ideas at psychology conferences, there’s always a lot of pushback—“But LLMs aren’t human, they don’t have feelings,” a very philosophical objection. It’s sometimes hard to get past that to the point of saying: “We don’t think LLMs are human, but in certain ways they resemble humans because they were trained on human data. If you give a model all the knowledge of humanity, why shouldn’t it be able to predict how someone with depression reacts to a certain treatment?”
We also see in other areas that as models are trained, they develop knowledge or ideas they weren’t directly trained on. Even the data scientists don’t fully understand how that happens.
Read more: “The Intelligent Life of Droids”
I didn’t realize LLMs could come up with original, creative ideas they hadn’t been trained on.
Yes. It’s getting harder to demonstrate cleanly now, but there were early examples, like models generating images of things that don’t exist and that the model could never have seen—an “avocado armchair,” for instance. This is a hotly debated topic. Some argue it’s not true creativity, just existing concepts recombined.
This is what I like about this field. It feels wide open. There’s so much we don’t know, and we finally have the chance to explore it. As a psychologist, I’m normally limited by the number of participants I can recruit; this research doesn’t have that limitation—you can run experiments as you’d like. It’s a really new experience.
How can your findings help build models of human psychiatric diseases?
So far, we’ve relied a lot on animal models to develop drugs, which is important, but animals can never tell us how they feel. We don’t even know what they feel. They can’t use language. That’s the big advantage of large language models. We can ask them how they “feel,” what they “think.” Don’t get me wrong, I don’t think LLMs have feelings. But they have all the knowledge of humanity embedded in them, so they can extract what’s known about this. That opens up so much space for experiments you couldn’t do with humans, because in psychology it’s always difficult to run these experiments. You have to use humans because you can’t ask animals, but using humans is expensive and there are ethical constraints.
Do the findings change how we should interact with chatbots?
Yes, we interact so much with large language models, at work and in private life. If we find that not only do large language models stress us out, or make us sad or happy, but also that our input can change an LLM’s affective state, and that this in turn changes its output, that’s really interesting. For example, I work at a hospital doing research, and we look a lot at physician-AI interaction. If a very stressed physician asks an LLM-based clinical support system something, and that stresses the LLM, and the output gets worse, that has huge implications.
So one line of research is: Shouldn’t we think more about the LLM’s “side” of the interaction? Because it’s not simply a tool like a calculator. Maybe we should think more about how we prompt it. ![]()
Enjoying Nautilus? Subscribe to our free newsletter.
Lead image: nadia_snopek / Adobe Stock






