Options
Video captioning using Semantically Contextual Generative Adversarial Network
Date Issued
01-08-2022
Author(s)
Munusamy, Hemalatha
C., Chandra Sekhar
Abstract
In this work, we propose a Semantically Contextual Generative Adversarial Network (SC-GAN) for video captioning. The semantic features extracted from a video are used in the discriminator to weigh the word embedding vectors. The weighted word embedding vectors along with the visual features are used to discriminate the ground truth descriptions from the descriptions generated by the generator. The manager in the generator uses the features from the discriminator to generate a goal vector for the worker. The worker is trained using: a goal based reward and a semantics based reward in generating the description. The semantics based reward ensures that the worker generates descriptions that incorporate the semantic features. The goal based reward calculated from discriminator features ensures the generation of descriptions similar to the ground truth descriptions. We have used MSVD and MSR-VTT datasets to demonstrate the effectiveness of the proposed approach to video captioning.
Volume
221