Computer Vision News - January 2018

The Learning to Run challenge came together because we wanted to explore the feasibility of using deep reinforcement learning on high dimensional biomechanical systems to "learn" complex tasks like walking and running. The participants were provided a human musculoskeletal model in a physics-based simulation environment ( OpenSim ), and the task was to design a real-time controller to navigate through an obstacle course as quickly as possible . Our initial baseline solutions based on DDPG and TRPO showed that it is indeed feasible to learn policies which are represented by feasible gaits and also good cumulative rewards (distance covered by pelvis). The challenge managed to attract a total of 2154 submissions from 471 participants all over the world . Some of the top submissions were capable of running at ~4.5m/s (interesting sidenote: a speed of 4.5m/s qualifies you for the Boston Marathon). In the initial phases of the competition, participants on the top of the leaderboard mostly used TRPO, which generated promising policies, but in many cases, the policies learnt were stuck in local minimas, and the model finally only ended up hopping, or had unusual gaits (unusual when compared to the way humans walk). While TRPO does guarantee monotonic increment, the presence of a large number of local minimas hindered the learning of more efficient policies after a particular threshold of cumulative reward. Another major issue in the challenge was that the simulations were approximately 1600 times slower than other commonly used Humanoid-based reinforcement learning environments based on MuJoCo physics engine. The root of this problem lied in the fact that OpenSim was designed for accuracy and high precision, and some participants came up with interesting hacks to trade a little bit of the precision for faster simulation times; and also showed that the models do learn efficient policies even when trained against this alternate build of OpenSim. In future version of the challenge , we might consider having an alternate build of OpenSim which is more optimized for speed of the simulations and not on the precision of the contact force calculations. In any case, this constraint did incentivize the participants to rely on more sample efficient approaches like DDPG, which ended up dominating the leaderboard 32 Computer Vision News Challenge Challenge Every month, Computer Vision News reviews a challenge related to our field. If you do not take part in challenges, but are interested to know the new methods proposed by the scientific community to solve them, this section is for you. This month we have chosen to review the NIPS'17: Learning to Run Challenge , organized around NIPS 2017. The website of the challenge, with all its related resources, is here . Read below what Sharada Prasanna Mohanty , a Doctoral Assistant in the team of Marcel Salathé at the Digital Epidemiology Lab of EPFL in Geneva, Switzerland and one of the organizers, tells us about this Learning to Run challenge . We would like to mention co-organizer Lukasz Kidzinski from the lab of Scott Delp, who was also a major driving force from the Stanford team. NIPS'17: Learning to Run Challenge by S.P. Mohanty