Oregon Health & Science University
WordsEye is a text-to-scene conversion system that receives a text description of a picture from the user via its online interface and converts it into a 3D scene. The core of WordsEye is VigNet, a unified knowledge base and representational system for expressing lexical and real-world knowledge needed to depict scenes from text. In particular, VigNet contains the knowledge needed to map the objects and locations specified in a text into the actual 3D objects. Individual objects typically correspond to single 3D models, but locations (e.g. a living room) are typically a group of objects. Prototypical mappings from locations to objects and their relations are called location vignettes. This thesis explores our proposed methodology of using Amazon Mechanical Turk (AMT) to populate some portions of VigNet. In the first part, we use AMT to fill out contextual information about VigNet objects, including information about their typical locations and nearby objects, and we filter out Turkers' inputs by WordNet similarity and corpus association measures. Manual evaluation of the Turkers' results show that this is a promising approach.
In the second part, we discuss three strategies for using AMT to collect semantic information for location vignettes. In the first strategy, Turkers describe pictures of different rooms and we then use the WordsEye NLP module to extract the objects in the rooms from their descriptions. In the second strategy, Turkers list the objects that are functionally important for a particular room (such as a sink for a kitchen), and in the third strategy, Turkers name the objects that are visually important, including large objects and furniture. For evaluation, we manually built a set of location vignettes and compare the result of each strategy against that. Our experiments achieved up to 90.62% precision and 87.88% recall.
Center for Spoken Language Understanding
School of Medicine
Rouhizadeh, Masoud, "Collecting Semantic Information for Locations in the Knowledge Resource of a Text-to-Scene Conversion System" (2013). Scholar Archive. 3475.