Computer Vision News - September 2016

CVN: I understand that coresets are less useful when you have too many layers or too many parameters. Feldman: Right - If you have n points and you’re looking for n parameters, like in the Traveling Salesman Problem for example, you probably can’t use coresets again unless you assume something about the input. Every polynomial time algorithm is a coreset in the sense that you have an exponential number of solutions, but you only search a small number. You still guarantee that the solution will be there. You compress the solution space. In some way, every efficient algorithm uses this kind of approach for compressing data. The other related fields which we are trying to connect with coresets are compress sensing, sketching, all the "sufficient statistics" and property testing. Unlike with coresets, with property testing you’re not allowed to look at all of the data; the challenge is the same: to solve the problem with small data . However, they have a much harder constraint because they cannot “ We don’t expect anyone to find one single technique to optimize all the problems in the world ” 22 Computer Vision News Guest read all of the data, but only some of it. In coreset, we usually assume that we can scan all of the data. That’s how we can solve more problems than property testing, for example. CVN: If our readers want to try to experiment with coresets themselves, is there any place where they can find tutorials, courses or examples? Feldman: We upload new coresets on my lab’s website . We hope to publish a library within the next few weeks, but you can also get it online or send me an email. We’ll try to have one library for everything. CVN: What surprised you from working on the coresets? Feldman: Recently we had a big surprise. We now examine coresets for two fields using differential privacy. The idea is to do machine learning, while preserving the privacy and anonymity of the users, so that we extract statistics from the data without revealing information about the individuals. Surprisingly, our STOC'11 paper that we now implement shows a formal connection that says that if you can compress the data, it also means that you can have a private version; we refer to it as private coresets of the data, or a sanitized database. It means that you can add small noise to data so that the statistics will be preserved and the k-means will still Guest