Options
SACIC: A semantics-aware convolutional image captioner using multi-level pervasive attention
Date Issued
01-01-2019
Author(s)
Parameswaran, Sandeep Narayan
Indian Institute of Technology, Madras
Abstract
Attention mechanisms alongside encoder-decoder architectures have become integral components for solving the image captioning problem. The attention mechanism recombines an encoding of the image depending on the state of the decoder, to generate the caption sequence. The decoder is predominantly recurrent in nature. In contrast, we propose a novel network possessing attention-like properties that are pervasive through its layers, by utilizing a convolutional neural network (CNN) to refine and combine representations at multiple levels of the architecture for captioning images. We also enable the model to use explicit higher-level semantic information obtained by performing panoptic segmentation on the image. The attention capability of the model is visually demonstrated, and an experimental evaluation is shown on the MS-COCO dataset. We exhibit that the approach is more robust, efficient, and yields better performance in comparison to the state-of-the-art architectures for image captioning.
Volume
11955 LNCS