March 2012

Document Type


Degree Name



Dept. of Biomedical Engineering


Oregon Health & Science University


When engaged in a conversation, speakers use both verbal and non-verbal mechanisms to help coordinate the dialogue, ensuring that, at each point, the other is engaged in the dialogue, and is capable of hearing, understanding and responding to the speaker. The problem is that current Spoken Dialog Systems (SDSs) do not take full advantage of dialogue coordination mechanisms, which can lead to interactions that are unnatural and inefficient. However, we posit that an SDS should anticipate, recognize and potentially emulate the full richness of dialogue coordination mechanisms. In this dissertation research, we aim to further understand dialogue coordination mechanisms, and to assess how they might be used to ease human-computer interaction. We start by investigating what cues a human speaker uses to differentiate computer-directed speech from self-directed speech, and from human-directed speech, finding that in both cases speech directed to the computer is much louder. We next conduct a perceptual study to determine what cues people attend to when determining whether a speaker is addressing a computer or nearby human. Here we found that people tended to rely on the direction of the speaker's gaze, although this led to systematic errors in their judgments of addressee. We next investigate whether 'um' and 'uh' result from the same, or different cognitive processes, using human-human interaction data collected while clinicians interacted with children with typical development, autism, or developmental language disorder. Here we found that 'um' appears to be listener-oriented, and 'uh' speaker-oriented. Next, again using the data from above, we investigated what factors impact the length of inter-turn gaps, and whether there is an interaction between gaps, disuencies and social pressure to respond. Here we found that, after a question, speakers tend to respond more quickly, are more likely to start their speech with a disuency, and that the likelihood of a disuency increased with the length of the gap. Finally, we conduct a simulation study, using Reinforcement Learning, to demonstrate that dialogue policies can be created that take advantage of dialogue coordination mechanisms.




Center for Spoken Language Understanding


School of Medicine



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.