Paralinguistic Speech Attribute Recognition and Multimodal Behavior Signal Analysis

Prof. Ming Li

Sun Yat-sen University, China


Speech signal not only contains lexicon information, but also deliver various kinds of paralinguistic speech attribute information, such as speaker, language, gender, age, emotion, channel, voicing, psychological states, etc. The core technique question behind it is utterance level supervised learning based on text independent speech signal with flexible duration. I will introduce our recent works from features to representation and to modeling. Furthermore, we try to combine these paralinguistic speech attributes recognition tasks together into one problem, and use end-to-end deep learning methods to solve in order to reduce the need of prior domain knowledge. Finally, I will introduce our works in multimodal behavior signal analysis and interpretation. We apply signal processing and machine learning technologies to human behavior signals, such as audio, visual, physiological and eye tracking data to provide objective and quantitative behavior measurements or codes for assistive autism diagnose.

Research keywords : Speech signal processing, Human behavior signal processing, Multimodal biometrics, Structure health monitoring