Computer Vision News - January 2024

9 Computer Vision News GEM - Grounding Everything He hopes that in the future, the model could be extended even further to include proper reasoning capabilities. However, its non-disruptive and userfriendly approach already makes GEM an accessible and easy-to-use tool. “I released a Hugging Face demo where you can just upload the link of an image, or an image itself, and query it with text to see if it’s working well for the type of image you uploaded,” he adds. “I’ve seen on Twitter people are using this model to query on the output of diffusion models or on animes, and it’s working pretty well because it’s CLIP and was trained with these kinds of images. Also, there’s the code on GitHub, and if you want to install the model, you pip install gem_torch, and then you’re good to go!” As we wrap up the interview, Walid reveals that while he was born in France, his parents are from Morocco. His connection to Morocco was especially poignant during a recent visit when he witnessed the country’s resilience in the wake of the severe September earthquake. “I was there in October, right after the earthquake,” he tells us. “When I was there, it was almost a month, and I traveled throughout Morocco. It was doing way better.” 2 NeurIPS PAPERS in this issue! Find them on page 2 (by Denys Rozumnyi) and on page 38 (with Nina Montaña Brown)

RkJQdWJsaXNoZXIy NTc3NzU=