Computer Vision News - October 2018
During MICCAI, we were particularly impressed by this position paper authored by Annika Reinke and cosigned with Matthias Eisenmann, Sinan Onogur, Marko Stankovic, Patrick Scholz, Peter M. Full, Hrvoje Bogunovic, Bennett A. Landman, Oskar Maier, Bjoern Menze, Gregory C. Sharp, Korsuk Sirinukunwattana, Stefanie Speidel, Fons van der Sommen, Guoyan Zheng, Henning Müller , Michal Kozubek, Tal Arbel , Andrew P. Bradley, Pierre Jannin , Annette Kopp- Schneider, Lena Maier-Hein. In many research fields, organizing challenges for international benchmarking has become increasingly common. Since the first MICCAI grand challenge was organized in 2007, the impact of challenges on both the research field as well as on individual careers has been steadily growing. For example, the acceptance of a journal article today often depends on the performance of a new algorithm being assessed against the state-of-the-art work on publicly available challenge datasets. Yet, while the publication of papers in scientific journals and prestigious conferences, such as MICCAI, undergoes strict quality control, the design and organization of challenges do not. Given the discrepancy between challenge impact and quality control, the contribution of our work can be summarized as follows: Based on analysis of past MICCAI challenges, we show that current practice is heavily based on trust in challenge organizers and participants. We experimentally show how “ security holes ” related to current challenge design and organization can be used to potentially manipulate rankings. This is based on findings of a further study [1], where we show that rankings are very sensitive to different design choices . To overcome these problems, we propose best practice recommendations to remove opportunities for cheating. In [1], we suggest a more extensive list of best practice recommendations and open research questions, covering more aspects of challenge design . [1] Maier-Hein, L. et al. "Is the winner really the best? A critical analysis of common research practice in biomedical image analysis competitions." arXiv preprint arXiv:1806.02051 (2018). By Annika Reinke: How to Exploit Weaknesses in Biomedical Challenge Design and Organization On the left image: robustness of rankings with respect to a single design choice. Kendall's tau, which quantifies differences in rankings, is shown on the left for all 2015 segmentation challenges (every challenge is shown by a circle). A key example on the right illustrates the effect of different design choices (A_i: Algorithm i). A Position Paper 45
Made with FlippingBook
RkJQdWJsaXNoZXIy NTc3NzU=