21 Claire Vernade Claire, can you tell us about your work? I work on bandit algorithms and, in general, sequential learning. It’s a niche part of the broader field of reinforcement learning and machine learning, in which we are concerned with theoretical guarantees and theoretical approaches to reinforcement learning. In particular, with bandit algorithms, we are specifically interested in the question of exploration versus exploitation trade off, trying to understand how to explore options while still trying to maximize rewards. This is a central question in reinforcement learning, which is better addressed when you can clean the problem up and try to address easier models. Some say that maximizing rewards on reinforcement learning may lead to problematic consequences. Think at an autonomous vehicle trying to maximize the reward to seek the safest route. It doesn’t mean the safest way for this vehicle will lead to the safest outcome for other vehicles. Maximizing reward could be challenging. Would you agree? Yes, absolutely - in the case of autonomous driving. In many cases, in real-world problems, there are constraints. You can’t just maximize rewards. You have to take into account other constraints, other typical aspects of your rewards. In my case, for instance, I looked into non- stationary environments. I still maximize rewards, but I focus on the situation where the environment is not always giving us the same reward. Sometimes one action does the best, and then two hours later, it will be another action. I want to track the non-stationary, this kind of volatility of the environment. In the case of autonomous driving, there is a safety constraint. Sometimes there are also diversity constraints, and you don’t want to always take the same actions. You also want to maximize diversity in your recommendations or in your propositions. Integrating new constraints modifies the machine learning problem. My job, or the one of my team here at DeepMind, is to try and boil it down to the simplest problem, where we can actually say something that is theoretically valid, and perhaps bring light to problems in more complex environments. We cannot solve autonomous driving altogether, by looking at the entire problem. Sometimes it's good to make blocks. To solve autonomous driving, you need to first split it into many different problems. Our job is to create the building blocks of that tower. Tell us about practical applications in real life with non-stationary situations. The practical applications that you may think of are typically recommending content on platforms. That could be books on Google or audiobooks on Audible. It can be routes, if you want to try to find the route to your work. The best route is

RkJQdWJsaXNoZXIy NTc3NzU=