ICCV Daily 2021 - Thursday

we shoot just happens to miss it, we don’t see it at all, then the next pixel happens to hit it, and you get these bad aliasing artifacts. ” Their solution is to rethink the image formation process. Instead of shooting rays, they start shooting cones . The idea being that a pixel is a little box, so you define a circle in that box and shoot a cone out that gets bigger as the cone gets further away. Instead of drawing points from a ray, they draw conical sub frustums. They take the cone and slice along its axis and then have these thick volumes of space. They want to featurize the entire volume, instead of just featurizing the center of it. “ You can work through the geometry for how to cast cones, and how to cast rays, and it’s not too hard, ” Jon tells us. “ This is all stuff we understand well from geometry. But then there’s this new problem, which is I now have this conical sub frustum in 3D space, and I need to featurize it as input to a neural network. We didn’t know how to do that. We wanted to give it information about the size of this location, but we didn’t want to do it in a naive way because what we want is to featurize this area of space so that the feature is invariant to small changes within that space. We want to reflect the fact that if the area is very large, then we don’t know about the high frequencies of the location, and if the area is very small, then we confidently know where it is in terms of location. ” Their solution here is to think about the positional encoding used in NeRF. In NeRF, these coordinates are encoded with what’s called positional encoding . This is terminology from the transformer literature, but it is a straightforward idea. You take the coordinate, and you take its sine and cosine, and then sine and cosine twice that coordinate, and four times that coordinate. You scale it up and pass through some sinusoids, so you end up with a feature representation that has sinusoids at lower and higher frequencies. Those frequencies tell you where it is. You can think of it like an addressing scheme in a quad tree or an octree. The first sinusoids tell you if you are on the left or the right, and then the next sinusoid tells you if you’re on the left side of the right part or the right side of the left part and you descend into a tree that way. 5 DAILY ICCV Thursday Jon Barron