Computer Vision News - July 2023

Computer Vision News 24 Women in Computer Vision home yet because my parents are still in China. It's funny because every time I cross the border to come into the US through immigration, the customs officer will always ask, where's your home? And I always reply with: my home in the US or my home in China? And he says you tell me! [ both laugh ] It's a little awkward. What about a citizen of the world? Yeah! [ laughs ] … a citizen of the world. There must be things in the US that you like very much, since you have been here for many years. Well, I guess it's mostly in my career, I think. That means your career is important enough for you to make it set the place where you are going to live. Yes, although China has been evolving very fast in this AI field. True. I feel like the bigger news always comes from the US first, like GPT and stuff. What is your drive? I feel like it's very similar to other people. Especially in research, when you write a paper, then in Google Scholar, you see your paper there; that's some kind of certificate for me. Like, you accomplished a project, and then this project is well formatted and well summarized in this paper. And then, seeing my paper count and also seeing the citation numbers grow means people are recognizing my work. That's very satisfying for me. Especially one paper, it's called UNITER. It's my highest-cited paper right now. This work has been recognized by many colleagues in this field. I hope in the future I will do more work like that. All my readers want to know what is the recipe for having high citation work. How did you make it? What is the recipe for success? I only have one high-cited work, so I can only speak from that perspective. UNITER was proposed at a very early stage of pretraining in vision language research. We see the success of BERT in LP. And then we quickly realized you can do similar things in vision language too. Just by taking the transformer architecture and taking the image feature as input, and then when you train with large image text pairs, the model can learn vision capabilities too. It was not only us who thought of that. We had seen similar works around the same time. Actually, that was September of 2019, and there are a few similar works popping up while we were still working on the project.

RkJQdWJsaXNoZXIy NTc3NzU=