Social AI Workshop: Social AI for Speech and Conversation

Friday, March 28, 2025
Advanced Research Centre, Glasgow, United Kingdom

Social AI Workshop: Social AI for Speech and Conversation

Friday, March 28, 2025
Advanced Research Centre, Glasgow, United Kingdom

What you need to know

Social AI involves developing an AI domain aimed at endowing artificial agents with social intelligence, the ability to deal appropriately with users’ attitudes, intentions, feelings, personality, and expectations. Understanding speech and conversation is a key component. This full-day workshop will host a series of invited talks by renowned experts followed by a roundtable discussion with the audience. Our goal is to bring together academic experts, students, and industry professionals to encourage dialogs around the progress, challenges, and opportunities around these important topics as AI continues to permeate all aspects of our social presence. 

This event is open to all PhD students and faculty.

Invited Speakers

Jean-Francois Bonastre

Senior Researcher (Directeur de Recherche), Inria, France | Professor, Avignon University, France

Explainability in speaker recognition (and more generally in speech processing)
Explainability has become a mandatory topic in AI in general. This is largely due to the need for greater trust on the part of experts and the general public, in the face of AI's limitations, manipulations, biases, errors, and hallucinations. New AI regulations, such as those of the EU, also play an important role, as explainability aspects are now required for certain applications. Speech processing applications are particularly concerned, as they are often linked to critical human aspects, such as HR, healthcare or forensics. This talk will present in a few words some of the main approaches in AI explainability (XAI), as well as their limitations. Using speaker recognition as an example, a new explainable by-design approach will be presented. By representing speech in terms of the presence or absence of speech attributes taken from a small and bounded set, it enables simple explanations that can be interpreted by anyone. Some potential extensions, such as a more general scheme capable of mixing knowledge-based and automatically discovered attributes, or the application of this principle to pre-trained encoders, will be discussed.

Heysem Kaya

Assistant Professor, Utrecht University, The Netherland

Towards Fair and Interpretable Speech-based Depression Severity Modeling
Recently, with increasing momentum, many state-of-the-art deep learning models have shown to be successful in detecting depression based on multimodal cues. However, such efforts and models render to be useless in clinical applications due to both legal (such as due to the new EU AI law) and practical reasons. Therefore, we aim to make such critical machine learning tasks employed for high-risk applications responsible and trustworthy. From responsibility in ML, here we mean transparency/interpretability, algorithmic fairness, and privacy. Since speech is relatively less prone to automatic subject identification via public tools/search engines compared to vision (i.e., face) and hence is more privacy-preserving, we work on speech modality for such critical tasks as depression. Therefore, this talk will focus on our recent and ongoing efforts in speech-based depression prediction with responsible AI considerations.

Khiet Truong

Associate professor, University of Twente, The Netherlands

From speech technology to spoken conversational interaction technology
Nowadays, speech technology is at our fingertips. Automatic speech recognition (ASR) and speech synthesis have evolved drastically to the point where ASR performance has reached human parity and where an artificial voice is not discernible anymore from a natural human voice. However, as soon as you start talking to machines, one will start to notice that speech technology is still facing many challenges. In order to move to technology that really understands you, this technology also needs to be able to process non-speech or paralinguistic information. In this talk, I will highlight some of the research we are carrying out on spoken conversational interaction technology. I will talk about how current open-source ASR systems deal with non-speech elements and different speaker groups. And I will present some of our work around designing robot communication (that does not always need to involve speech).

Location

Advanced Research Centre
11 Chapel Lane, Glasgow, G11 6EW United Kingdom

When

  • Friday, March 28, 2025 9:00 AM
  • Timezone: United Kingdom Time
  • Add to calendar

Share