WACV 2026 Daily - Sunday

a Graph Visual Question Answering (GVQA) task. Instead of an end-to-end black box, DriveLM mimics human reasoning step-by-step, linking perception, prediction, and planning through a logical graph. By leveraging vision-language models (VLMs), the car can explicitly answer questions like, "What are the relevant objects in the scene?" before executing an action. In her final work, completed during her internship at Wayve, she builds a full Vision-Language-Action (VLA) model named SimLingo. SimLingo achieves state-of-the-art closed-loop driving performance while seamlessly unifying multiple capabilities within a single framework: autonomous vehicle control, visual question answering (VQA), instruction following, and explanations of the driving decisions. To tackle the crucial challenge of language-action alignment, ensuring the model actually bases its driving decisions on language rather than relying solely on visual cues, she introduces "Action Dreaming," a novel training task utilizing diverse instruction-action pairs. Thanks to these innovations, SimLingo became the winning entry at the 2024 CARLA Autonomous Driving Challenge at CVPR. By successfully uniting vision, language, and action, Katrin's research paves the way for generalist autonomous agents that we can actually talk to and understand. For more information, see her website (katrinrenz.de). 29 DAILY WACV Sunday Katrin Renz

Made with FlippingBook

RkJQdWJsaXNoZXIy NTc3NzU=