Computer Vision News

13 Karen Tatarian Overview: By merely observing humans, one can directly infer that no social interaction takes place without cues, whether verbal or nonverbal, that allow others to interpret behaviors and reasonably estimate intentions. These powerful social signals and nonverbal behaviors are complex and multi-modal, which means they are made of different combinations of modalities and cues like gestures, gaze behavior, and proxemics (e.g., management of space and environment). Thus, for a robot to be perceived as a socially-intelligent agent by humans , it is expected to be able to hold a successful social interaction, adapt to the social environment, andexhibit appropriate multi-modal behavior. In my thesis, I first investigate howone of thesemodalities can help adapt another one, then explore the effects of the modalities when performed multi-modally on behavioral interaction outcomes and perception of the robot’s social intelligence, and finally present an architecture using reinforcement learning for the robot to learn to combine its multi- modal behaviors with a reward function based on the multi-modal social signals of the human in an interaction.

Computer Vision News - March 2022