Research

Multimodal Communication with Social Robots and Virtual Agents

My research asks a practical question:

how should social robots and virtual agents communicate with people?

Human communication multimodal, we don't just use speech. We coordinate interaction through gaze, facial expressions, pointing, nods, blinks, pauses, gestures, timing, feedback, and shared visual context. These signals help us manage attention, take turns, repair misunderstandings, express engagement, and decide whether an interaction should continue.

I study how these signals work in human communication, how they can be modelled in artificial agents, and how people perceive and respond to them. My work combines human-robot interaction, psycholinguistics, artificial intelligence, virtual reality, multimodal data analysis, and empirical user research.

A central idea in my work is that artificial agents do not need to copy humans perfectly. Human-like behaviour is not always useful behaviour. Instead, I ask which signals matter, what function they serve, when an agent should use them, and whether they make interaction clearer, easier, and more reliable.

Postdoctoral Research

Referntial communication with artificial agents

In this work we study how speech, gaze, and pointing are integrated during referential communication, and how such behaviour can be captured, analysed, and adapted for artificial agents.

This study presents a complete workflow for capturing, processing, and aligning multimodal data from human participants performing a referential task, including speech, eye gaze, and pointing behavior. It also shows how the resulting temporal and movement-based features can inform models of referential behavior in an artificial agent. Information about speech timing, gaze patterns, pointing actions, and gesture movement dynamics is obtained using motion-capture and eye-tracking technologies.

Automated data annotation pipelines

Multimodal data is difficult to analyze as it involves synchronizing multiple data streams and annotating (which is very effort and time intensive). In my research I have focused on developing pipelines which make it easier to annotate, analyze and synchronize multimodal data.

In this study we introduce a human-in-the-loop automatic gaze data annotation pipeline that maps gaze data onto regions of interest (ROI) in videos leveraging state-of-the-art deep learning models. This pipeline would enable researchers define target objects and refine ROIs using intuitive channels such as text or visual prompts.

Doctoral Research

Real-time robot emotions using LLMs

In this work, I investigate how large language models can be used to generate context-appropriate emotional expressions for social robots during live human–robot dialogue.

The system uses the ongoing conversation as context, predicts the robot’s emotional state in real time, and maps this prediction to facial expressions on the robot. Through a collaborative interaction study, we compared model-driven emotional expressions with no-emotion and mismatched-emotion conditions. The results showed that congruent LLM-generated expressions made the robot appear more human-like, emotionally appropriate, and engaging.

Paper

Gaze control for Social Robots

In this work, I explore how social robots can decide where and when to look during human–robot interaction. Instead of relying only on reactive gaze shifts, we developed a planning-based gaze control system that predicts gaze targets over a short future time window.

The system coordinates gaze behavior across conversational functions such as turn-taking, gaze aversion, referential gaze, and joint attention, while also improving eye–head coordination. A user study showed that this planning-based approach was preferred over a purely reactive system and was perceived as more interpretable and better at regulating interpersonal intimacy.

Paper

Robot emotion recognition: Are eyes enough?

In this work, I study how people perceive emotional expressions in social robots, and whether the eye region alone is sufficient for recognizing emotions.

We conducted a user study comparing human faces with robot faces that varied in appearance and visible facial region. The results show that fully animated robot faces can communicate emotions effectively, but recognition becomes less accurate when only the eyes are visible. Under this constraint, more human-like robot faces support better emotion recognition, highlighting the importance of facial design for expressive and socially intuitive robots.

Paper

Page updated

Google Sites

Report abuse