Resonant Minds: Closed-Loop Social Avatars with Theory of Mind

Jianxu Shangguan^1*, Jing Xu^2*, Hang Ye², Xiaoxuan Ma³, Yizhou Wang², Jenq-Neng Hwang¹, Wentao Zhu⁴

¹University of Washington ²Peking University ³Carnegie Mellon University ⁴Eastern Institute of Technology, Ningbo
^*Equal contribution.

Paper Code Video

[TL;DR] Resonant Minds is a closed-loop dual-agent framework unifying multimodal perception, Theory of Mind social reasoning, and emotion-controllable multimodal expression, supported by a hierarchical Persona-Scenario dataset.

Scenario

In an art gallery filled with masterpieces, Einstein encounters the mysterious girl from Vermeer's painting. They discuss the mysteries of the universe, the eternity of art, and the relativity of time.

Einstein

Personality: Openness

Goal: Explain how time and space bend like light, and discover if art can capture what science cannot measure

Pearl

Personality: Openness

Goal: Show how a single moment frozen in paint can speak across centuries, revealing truths beyond equations

Scenario

Two legendary entertainers meet beyond time and space. They share their experiences of fame, the burden of being icons, and the search for authentic self beneath the spotlight.

Leslie

Personality: Openness

Goal: Express the pain of being misunderstood and find solace in artistic freedom beyond societal norms

Marilyn

Personality: Neuroticism

Goal: Share the loneliness of perfection and the desire to be seen for intelligence, not just beauty

Scenario

Two friends are meeting at a coffee shop, where one of them is having trouble keeping up with their bills.

William King

Personality: Agreeableness

Goal: Help your friend with their financial trouble, offering support while being sensitive to their pride and dignity

Joseph Rodriguez

Personality: Extraversion

Goal: Maintain your pride while accepting help, expressing gratitude without feeling like a burden

Scenario

Two roommates living together and sharing household chores. One of them, who is responsible for cooking, finds out that the other one refuses to eat anything they cook.

Liam Roberts

Personality: Agreeableness

Goal: Convince your roommate to try your cooking, sharing the effort and care you put into meal preparation

William Rodriguez

Personality: Openness

Goal: Express concerns about food preferences tactfully, while appreciating your roommate's effort and finding common ground

Abstract

Creating lifelike digital humans with genuine social intelligence requires unifying cognitive reasoning and multimodal generation within a coherent framework. Current approaches treat these as separate tasks: Large Language Models excel at dialogue but lack embodied expression, while diffusion-based talking head models achieve visual fidelity but ignore social cognition. To bridge this gap, we propose a closed-loop dual-agent framework integrating perception, social reasoning, and expression into a continuous interaction cycle. The perception module analyzes partners' multimodal behaviors from video, while the social reasoning module infers hidden mental states through Theory of Mind and selects responses via an ensemble mechanism. The expression module then generates emotion-controllable dual-agent videos synthesizing both speaker speech and expression alongside listener reactive behaviors, capturing bidirectional dynamics absent in prior work. We construct a hierarchical Persona-Scenario dataset with psychologically grounded personas and private social goals to support evaluation under information asymmetry. Experiments on this dataset demonstrate competitive or superior performance on both dialogue quality and video generation metrics. Notably, our method surpasses even the full-information Script mode on key dialogue quality dimensions, suggesting that explicit mental state inference under uncertainty can elicit more thoughtful dialogue than unrestricted information access.

Method

Closed-Loop Framework for Dual-Agent Interaction

We model dual-agent interaction as a Partially Observable Markov Game, running three modules in a closed loop. (1) Perception turns the partner's video into structured observations. (2) Social Reasoning infers the partner's hidden mental states through a Theory of Mind module and selects a response via an ensemble of evaluators. (3) Expression renders the chosen response into a dual-agent video of both the speaker and the reactive listener.

Closed-loop framework for dual-agent interaction

Persona-Scenario Dataset Curation

We construct a hierarchical Persona-Scenario dataset to support evaluation under information asymmetry. Each persona comprises two layers: a facts layer of consistent biographical attributes, and a psychological layer of behavioral narratives grounded in the Big Five personality traits.

Persona-Scenario dataset statistics

Results

LLM Social Simulation

Under information asymmetry, each agent sees only its own goals. By reasoning about the partner's hidden mental states, our method keeps the conversation socially coherent while still advancing its own goal, as visualized below.

Theory of Mind Visualization. Ours infers through the ToM module that Kai fears being probed, and the Ensemble mechanism rejects further-probing candidates to select a response that respects Kai's boundary while still advancing Chen's goal.

Across four LLM evaluators, our method outperforms the realistic Agent baseline and even matches the full-information Script reference on key dialogue-quality metrics.

AgentOursScript reference

Sotopia-Eval

LLM-Eval

GPT-Score

G-Eval

Talking Head Generation

We compare against state-of-the-art talking head methods on audio quality, visual synchronization, and emotional expressiveness. Our method achieves the best overall trade-off, leading on emotional expressiveness while remaining competitive on synchronization.

Ours

DICE-Talk

EDTalk

Sonic

Ours

DICE-Talk

EDTalk

Sonic

Qualitative comparison with state-of-the-art talking head generation methods.

Citation

@misc{shangguan2026resonantmindsclosedloopsocial,
      title={Resonant Minds: Closed-Loop Social Avatars with Theory of Mind},
      author={Jianxu Shangguan and Jing Xu and Hang Ye and Xiaoxuan Ma and Yizhou Wang and Jenq-Neng Hwang and Wentao Zhu},
      year={2026},
      eprint={2606.05896},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2606.05896},
}