ECCV 2020 Daily - Tuesday

2 Oral Presentation 4 Menglin Jia is a Computer and Information Science PhD student at Cornell University, under the supervision of Serge Belongie and Claire Cardie. She speaks to us ahead of her oral today. This work introduces us to Fashionpedia – a unified fashion ontology and new clothing dataset. It proposes a novel task that combines instance segmentation with attribute recognition and presents a strong baseline model for this. “Computer vision in the deep learning era is booming – particularly object recognition,” Menglin tells us. “By object recognition I mean you describe or localize objects within an image. In our work, we combine these two tasks together. That is the novel part. Not only do you need to localize where an item of clothing is in an image, but you need to describe it. What shape is it? What texture is it?What kind of manufacturing technique has been used?” The team have acquired a dataset of around 48,000 clothing images in daily life, street style, celebrity events, runways, and online shopping. The images originate from Flickr and stock photo websites. They propose a new ontology on top of this dataset. The images are annotated by crowd workers for segmentation masks and fashion experts are recruited to annotate the localized attributes. The fashion experts manually go over the images first to verify the quality, ensure clothing items are visible, and remove any that are unsuitable. In terms of computer vision, they use the Mask R-CNN model as a backbone. On top of it they add another branch for attribute recognition. What challenges have the team encountered so far? Menglin explains: “It’s a multi-task problem where you need to get the boundaries and you need to identify the category of this Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset DAILY T u s d a y