a module that I call an element transformer, but it's an encoderdecoder transformer. And the key feature of the decoder in this element transformer is that there are these learnable queries, these learnable element queries. And these are the same length as the number of elements that there are in a short performance, so seven, seven element queries.” This allowed Arushi to take in a variable length performance, which might be a whole bunch of clips, like around 126 clips in the video. The output of this encoder-transformerencoder-decoder would be a fixed set of element embeddings. In that way we can get these element-wise embeddings, which can then later be used with the rubric scoring. The next thing would be the regularization that's applied to basically make sure that the element queries in the transformer decoder focus on the elements in 16 DAILY WACV Monday Poster Presentation
RkJQdWJsaXNoZXIy NTc3NzU=