About	Partners	Documents	Demos	Internal
Presentation Progress Workshops Contacts

Project presentation

Details

Project acronym: COMIC
Project full title: COnversational Multimodal Interaction with Computers
Contract number: IST-2001-32311
Start of the project: 1/03/2002

Rationale

It is widely believed that future automatic services will come with interfaces that support conversational interaction. The interaction with devices and services will be as easy and natural as talking to a friend or an assistant. In face-to-face communication we use all our senses: we speak to each other, we see facial expressions, hand gestures, sketches and words scribbled with a pen, etc. Face-to-face-interaction is multimodal. In order to offer conversational interaction, future automatic services will be multimodal, which means that computers will be able to understand speech and typed text, recognize gestures, facial expressions and body posture of the human interlocutor, and that the computer can use the same communication channels, next to presenting graphics, to render in its responses.

COMIC starts from the assumption that multimodal interaction with computers should be firmly based on generic cognitive models for multimodal interaction. Much fundamental research is still needed in order to base multimodal interaction on the understanding of generic cognitive principles that form the basis of this type of interaction. COMIC will build a number of demonstrators to evaluate the applicability of the cognitive models in the domains of eWork and eCommerce.

Objectives

Build new and more advanced models of the cognitive processes involved in multimodal interaction, and harness knowledge from cognitive psychology to develop software and tools needed to create novel eCommerce and eWork services. This research and development will result in healthier, safer and more pleasant working conditions. COMIC addresses fixed and mobile working environments.
Show the usability of advanced cognitive models for novel eWork and eCommerce services by a series of subsequently more powerful demonstrators. One concrete application will be a design tool for bathrooms with a multimodal interface. COMIC intends to show that advanced multimodal interaction can make such a tool usable for non-experts. In addition, COMIC will build a demonstrator for a spatial and logistics task.

Benefits

Since COMIC is aiming at establishing general cognitive principles for multimodal interaction, the results of this project may be used in many eBusiness, eLearning, eCommerce, eHealth, eCulture etc. applications.

Work plan

The focus of the COMIC research is on cognitive science research related to multimodality. The demonstrators that are built in the project are considered as tools to evaluate the basic research.

Multimodal interaction involves more processes and modules than meet the eye of the user. In addition to multiple input and output channels there are modules that combine the inputs (Fusion), interpret the inputs in the context of the dialogue, and that split the output information over the available channels (Fission). Cognitive modeling is performed for all modules in multimodal interaction:

Dialogue and Action management
COMIC focuses on problem solving and negotiation tasks. For these types of tasks complex dialogue and action management is necessary that must model the beliefs, intentions, and goals of both dialogue participants. COMIC will elaborate on advanced dialogue and action modules.
Fusion
Semantic and pragmatic models will be developed for the interpretation of the parallel input channels which will include speech and gestures that can be captured by sampling the movements of a pen in a space close to the display screen. Data from the channels must be interpreted 'in context': data in one input channel can affect the meaning of semi-simultaneous data in another channel.
Fission
Models will be developed for the optimal split of output information over parallel channels, which will include graphics, speech, and for some applications also a presentation agent.
Speech recognition
Special attention is paid to those non linguistic aspects, like intonation and hesitations that may carry additional information next to the information conveyed by the words. Paralinguistic information is expected help assess the degree to which a speaker is committed to verbal contents of an utterance.
Gesture recognition
Pen input can be used for handwriting and for making sketches and deictic gestures, which can take the form of point-and-click actions or of drawing lines and contours. The research focuses on paralinguistic aspects of pen input. Research questions include which gestures signal hesitations or certainty
Output presentation
One important topic here is related to facial expressions. The ultimate goal is to produce a dynamic avatar with synthesized facial, head and eye movements that has been validated by means of perceptual evaluations for realism, accuracy, and usability in an application.
Research questions are:
- Which part of the facial motions are perceptually necessary
- How accurate must synthesized faces be in terms of size, speed, spatial resolution

Tools for testing the basic research

One important tool in the project is called SLOT. SLOT stands for spatial logistics task. It is an experimental paradigm that is developed to allow investigating human vocal and gesture behavior in different settings. Two persons perform a route-planning task, that is essentially about finding the cheapest route that satisfies two sets of independent and potentially conflicting constraints (represented by the individual goals of the players). The paradigm is developed in such a way that it is easy to shift from collaboration to competitiveness, and to selectively reduce the bandwidth of the communication channels.

Detailed analysis of the verbal and nonverbal actions of the players will be performed, to be able to construct descriptive models of the meaning of verbal expressions in the context of nonverbal gestures, and especially of the meaning of nonverbal gestures in the context of verbal messages and the overall status of the interaction. Special attention will be paid to signals that carry information about the degree of "commitment" that subjects attach to their moves in the dialogue.

Selective reduction of the channels and their bandwidth in human-human interaction should provide insight in the lower bounds of the information flow that are needed to enable a channel to make significant contributions to the effectiveness of the interaction.

The models that emerge from human-human data will be implemented in the from of an 'artificial subject' and put to the test in interaction with human subjects.

Another important tool to test the basic Research is Bathroom design application. With this tool experts and laypersons can interactively design a bathroom, by changing the selection and placement of tiles and sanitary ware. The present version of the application only allows mouse-based interaction. COMIC will extend the application towards multimodal interaction. This extension will also introduce the possibility for the system to help the user in making the best possible choices.

Last updated 17 July 2003. Please contact Mary Ellen Foster with any comments, complaints, or reports of broken links.