Han, Ting: Learning to Interpret and Apply Multimodal Descriptions. 2018
Inhalt
- Introduction
- Related work
- Speech and gestures in natural communications
- Typologies of hand gestures
- Relations between speech and co-verbal hand gestures
- Semantic coordinations between co-verbal gestures and verbal content
- Temporal alignment between gestures and speech
- Multimodal human-computer interfaces
- Representation of multimodal content
- Existing multimodal datasets
- Summary
- Multimodal corpora
- Multimodal spatial scene description corpus
- Multimodal object description corpus
- The SAGA corpus
- Summary
- A system of understanding multimodal spatial descriptions
- Modelling the interpretation of multimodal spatial descriptions
- System overview
- Learning knowledge from prior experience
- Applying the represented knowledge
- Experiment
- Summary
- Towards real-time understanding of multimodal spatial descriptions
- Real-time understanding of spatial scene descriptions
- System overview
- Gesture detection
- Gesture interpretation
- Utterance segmentation
- Natural language understanding
- Multimodal fusion & application
- System evaluation
- Gesture detector evaluation
- Gesture interpretation evaluation
- Utterance segmentation evaluation
- Whole system evaluation
- Incremental evaluation
- Human understanding
- Summary
- Investigate symbolic and iconic modes in object descriptions
- Draw and Tell: iconic and symbolic modes in object descriptions
- Model the meaning of multimodal object descriptions
- Experiments
- The image retrieving task
- Metrics
- Experiment 1: Mono-modal models
- Experiment 2: multimodal models
- Experiment 3: reduced sketch details
- Discussion
- Summary
- Learning semantic categories of multimodal descriptions
- Represent multimodal utterances with semantic concepts
- Task formulation
- Modelling the learning of multimodal semantics
- Experiments
- Summary
- Conclusion and future work
- References
