Oregon Health & Science University
Samples of everyday conversations are being collected and analyzed in a growing number of applications, ranging from studying behavior in social psychology to clinical assessment of voice pathology and even cognitive function. Aside from the spoken words, the acoustic properties of speech samples can provide important cues in these applications. The goal of this study is to develop robust and accurate algorithms for estimating speech features. Researchers have employed a number of techniques in time and fre- quency domains to estimate, for example, fundamental frequency and harmonic-to-noise ratio (HNR). However, their limitations hinder applications in clinical assessments. Time domain methods often ignore the frequency and amplitude variations of speech over the analysis frame, and on the other hand, the resolution of short time Fourier transform does not provide the necessary time-frequency resolution to capture small amount of perturba- tion observed in, for example, Parkinson's disease (PD). ix The purpose of this study is to achieve accurate and reliable estimation of fundamental frequency, HNR, jitter, and shimmer for clinical speech analysis. Adopting a time-varying harmonic model (TVHM) for representing speech, we quantify hoarseness, a salient feature of PD, as well as jitter and shimmer. We verify our implementation of TVHM and pitch estimation on Keele data set. Results show that pitch detected using TVHM outperforms those from get-f0 , an algorithm employed in many popular tools (wavesurfer, praat,etc). Further, we demonstrated the utility of our measures for hoarseness, jitter and shimmer in predicting clinical rating of severity of Parkinson's disease.
Center for Spoken Language Understanding
School of Medicine
Asgari, Meysam, "Accurate and Robust Models for Clinical Speech Processing" (2014). Scholar Archive. 3499.