Computer Vision News - August 2022

10 CNN+LSTM Neural Networks VGG16 model flowchart The following chart shows how the data flow when using the VGG16 model for Transfer Learning. First, we input and process 20 video frames in batch with the VGG16 model. Just before the final classification layer of the VGG16 model, we save the so-called Transfer Values to a cache file. The reason for using a cache file is that it takes a long time to process an image with the VGG16 model. If each image is processed more than once then we can save a lot of time by caching the transfer values. When all the videos have been processed through the VGG16 model and the resulting transfer values saved to a cache file, then we can use those transfer values as the input to LSTM neural network. We will then train the second neural network using the classes from the violence dataset (Violence, No-Violence), so the network learns how to classify images based on the transfer values from the VGG16 model. # We will use the output of the layer before the final # classification-layer which is named fc2. This is a fully-connected (or dense) layer. transfer_layer = image_model.get_layer('fc2') image_model_transfer = Model(inputs=image_model.input, outputs=transfer_layer.output) transfer_values_size = K.int_shape(transfer_layer.output)[1] print("The input of the VGG16 net have dimensions:",K.int_shape(image_model.input)[1:3]) print("The output of the more select layer of VGG16 net have dimensions: ", transfer_ values_size) The input of the VGG16 net has dimensions: (224, 224) The output of the more select layer of VGG16 net has dimensions: 4096 Function to process 20 video frames through VGG16 and get transfer values def get_transfer_values(current_dir, file_name): # Pre-allocate input-batch-array for images. shape = (_images_per_file,) + img_size_touple + (3,) image_batch = np.zeros(shape=shape, dtype=np.float16) image_batch = get_frames(current_dir, file_name) # Pre-allocate output-array for transfer-values. # Note that we use 16-bit floating points to save memory. shape = (_images_per_file, transfer_values_size) transfer_values = np.zeros(shape=shape, dtype=np.float16) transfer_values = \ image_model_transfer.predict(image_batch) return transfer_values A generator that processes one video through VGG16 each function call def proces_transfer(vid_names, in_dir, labels): count = 0 tam = len(vid_names) # Pre-allocate input-batch-array for images. shape = (_images_per_file,) + img_size_touple + (3,)