Now showing 1 - 10 of 29
  • Placeholder Image
    Publication
    Distortion Disentanglement and Knowledge Distillation for Satellite Image Restoration
    (01-01-2022)
    Kandula, Praveen
    ;
    Satellite images are typically subject to multiple distortions. Different factors affect the quality of satellite images, including changes in atmosphere, surface reflectance, sun illumination, and viewing geometries, limiting its application to downstream tasks. In supervised networks, the availability of paired datasets is a strong assumption. Consequently, many unsupervised algorithms have been proposed to address this problem. These methods synthetically generate a large dataset of degraded images using image formation models. A neural network is then trained with an adversarial loss to discriminate between images from distorted and clean domains. However, these methods yield suboptimal performance when tested on real images that do not necessarily conform to the generation mechanism. Also, they require a large amount of training data and are rendered unsuitable when only a few images are available. We propose a distortion disentanglement and knowledge distillation (KD) framework for satellite image restoration to address these important issues. Our algorithm requires only two images: the distorted satellite image to be restored and a reference image with similar semantics. Specifically, we first propose a mechanism to disentangle distortion. This enables us to generate images with varying degrees of distortion using the disentangled distortion and the reference image. We then propose the use of KD to train a restoration network using the generated image pairs. As a final step, the distorted image is passed through the restoration network to get the final output. Ablation studies show that our proposed mechanism successfully disentangles distortion. Exhaustive experiments on different time stamps of Google-Earth images and publicly available datasets, LEVIR-CD and SZTAKI, show that our proposed mechanism can tackle a variety of distortions and outperforms existing state-of-the-art restoration methods visually as well as on quantitative metrics.
  • Placeholder Image
    Publication
    NTIRE 2021 depth guided image relighting challenge
    (01-06-2021)
    El Helou, Majed
    ;
    Zhou, Ruofan
    ;
    Susstrunk, Sabine
    ;
    Timofte, Radu
    ;
    Suin, Maitreya
    ;
    ;
    Wang, Yuanzhi
    ;
    Lu, Tao
    ;
    Zhang, Yanduo
    ;
    Wu, Yuntao
    ;
    Yang, Hao Hsiang
    ;
    Chen, Wei Ting
    ;
    Kuo, Sy Yen
    ;
    Luo, Hao Lun
    ;
    Zhang, Zhiguang
    ;
    Luo, Zhipeng
    ;
    He, Jianye
    ;
    Zhu, Zuo Liang
    ;
    Li, Zhen
    ;
    Qiu, Jia Xiong
    ;
    Kuang, Zeng Sheng
    ;
    Lu, Cheng Ze
    ;
    Cheng, Ming Ming
    ;
    Shao, Xiu Li
    ;
    Li, Chenghua
    ;
    DIng, Bosong
    ;
    Qian, Wanli
    ;
    Li, Fangya
    ;
    Li, Fu
    ;
    Deng, Ruifeng
    ;
    Lin, Tianwei
    ;
    Liu, Songhua
    ;
    Li, Xin
    ;
    He, Dongliang
    ;
    Yazdani, Amirsaeed
    ;
    Guo, Tiantong
    ;
    Monga, Vishal
    ;
    Nsampi, Ntumba Elie
    ;
    Hu, Zhongyun
    ;
    Wang, Qing
    ;
    Nathan, Sabari
    ;
    Kansal, Priya
    ;
    Zhao, Tongtong
    ;
    Zhao, Shanshan
    Image relighting is attracting increasing interest due to its various applications. From a research perspective, im-age relighting can be exploited to conduct both image normalization for domain adaptation, and also for data augmentation. It also has multiple direct uses for photo montage and aesthetic enhancement. In this paper, we review the NTIRE 2021 depth guided image relighting challenge.We rely on the VIDIT dataset for each of our two challenge tracks, including depth information. The first track is on one-to-one relighting where the goal is to transform the illumination setup of an input image (color temperature and light source position) to the target illumination setup. In the second track, the any-to-any relighting challenge, the objective is to transform the illumination settings of the in-put image to match those of another guide image, similar to style transfer. In both tracks, participants were given depth information about the captured scenes. We had nearly 250 registered participants, leading to 18 confirmed team sub-missions in the final competition stage. The competitions, methods, and final results are presented in this paper.
  • Placeholder Image
    Publication
    Mixed-dense connection networks for image and video super-resolution
    (20-07-2020)
    Purohit, Kuldeep
    ;
    Mandal, Srimanta
    ;
    Efficiency of gradient propagation in intermediate layers of convolutional neural networks is of key importance for super-resolution task. To this end, we propose a deep architecture for single image super-resolution (SISR), which is built using efficient convolutional units we refer to as mixed-dense connection blocks (MDCB). The design of MDCB combines the strengths of both residual and dense connection strategies, while overcoming their limitations. To enable super-resolution for multiple factors, we propose a scale-recurrent framework which reutilizes the filters learnt for lower scale factors recursively for higher factors. This leads to improved performance and promotes parametric efficiency for higher factors. We train two versions of our network to enhance complementary image qualities using different loss configurations. We further employ our network for video super-resolution task, where our network learns to aggregate information from multiple frames and maintain spatio-temporal consistency. The proposed networks lead to qualitative and quantitative improvements over state-of-the-art techniques on image and video super-resolution benchmarks.
  • Placeholder Image
    Publication
    Gated Spatio-Temporal Attention-Guided Video Deblurring
    (01-01-2021)
    Suin, Maitreya
    ;
    Video deblurring remains a challenging task due to the complexity of spatially and temporally varying blur. Most of the existing works depend on implicit or explicit alignment for temporal information fusion, which either increases the computational cost or results in suboptimal performance due to misalignment. In this work, we investigate two key factors responsible for deblurring quality: how to fuse spatio-temporal information and from where to collect it. We propose a factorized gated spatio-temporal attention module to perform non-local operations across space and time to fully utilize the available information without depending on alignment. First, we perform spatial aggregation followed by a temporal aggregation step. Next, we adaptively distribute the global spatio-temporal information to each pixel. It shows superior performance compared to existing non-local fusion techniques while being considerably more efficient. To complement the attention module, we propose a reinforcement learning-based framework for selecting keyframes from the neighborhood with the most complementary and useful information. Moreover, our adaptive approach can increase or decrease the frame usage at inference time, depending on the user's need. Extensive experiments on multiple datasets demonstrate the superiority of our method.
  • Placeholder Image
    Publication
    Exploring the Effectiveness of Mask-Guided Feature Modulation as a Mechanism for Localized Style Editing of Real Images
    (27-06-2023)
    Tomar, Snehal Singh
    ;
    Suin, Maitreya
    ;
    The success of Deep Generative Models at high-resolution image generation has led to their extensive utilization for style editing of real images. Most existing methods work on the principle of inverting real images onto their latent space, followed by determining controllable directions. Both inversion of real images and determination of controllable latent directions are computationally expensive operations. Moreover, the determination of controllable latent directions requires additional human supervision. This work aims to explore the efficacy of mask-guided feature modulation in the latent space of a Deep Generative Model as a solution to these bottlenecks. To this end, we present the SemanticStyle Autoencoder (SSAE), a deep Generative Autoencoder model that leverages semantic mask-guided latent space manipulation for highly localized photorealistic style editing of real images. We present qualitative and quantitative results for the same and their analysis. This work shall serve as a guiding primer for future work.
  • Placeholder Image
    Publication
    Distillation-guided Image Inpainting
    (01-01-2021)
    Suin, Maitreya
    ;
    Purohit, Kuldeep
    ;
    Image inpainting methods have shown significant improvements by using deep neural networks recently. However, many of these techniques often create distorted structures or blurry inconsistent textures. The problem is rooted in the encoder layers' ineffectiveness in building a complete and faithful embedding of the missing regions from scratch. Existing solutions like course-to-fine, progressive refinement, structural guidance, etc. suffer from huge computational overheads owing to multiple generator networks, limited ability of handcrafted features, and sub-optimal utilization of the information present in the ground truth. We propose a distillation-based approach for inpainting, where we provide direct feature level supervision while training. We deploy cross and self-distillation techniques and design a dedicated completion-block in encoder to produce more accurate encoding of the holes. Next, we demonstrate how an inpainting network's attention module can improve by leveraging a distillation-based attention transfer technique and further enhance coherence by using a pixel-adaptive global-local feature fusion. We conduct extensive evaluations on multiple datasets to validate our method. Along with achieving significant improvements over previous SOTA methods, the proposed approach's effectiveness is also demonstrated through its ability to improve existing inpainting works.
  • Placeholder Image
    Publication
    AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results
    (01-01-2020)
    Zhang, Kai
    ;
    Danelljan, Martin
    ;
    Li, Yawei
    ;
    Timofte, Radu
    ;
    Liu, Jie
    ;
    Tang, Jie
    ;
    Wu, Gangshan
    ;
    Zhu, Yu
    ;
    He, Xiangyu
    ;
    Xu, Wenjie
    ;
    Li, Chenghua
    ;
    Leng, Cong
    ;
    Cheng, Jian
    ;
    Wu, Guangyang
    ;
    Wang, Wenyi
    ;
    Liu, Xiaohong
    ;
    Zhao, Hengyuan
    ;
    Kong, Xiangtao
    ;
    He, Jingwen
    ;
    Qiao, Yu
    ;
    Dong, Chao
    ;
    Luo, Xiaotong
    ;
    Chen, Liang
    ;
    Zhang, Jiangtao
    ;
    Suin, Maitreya
    ;
    Purohit, Kuldeep
    ;
    ;
    Li, Xiaochuan
    ;
    Lang, Zhiqiang
    ;
    Nie, Jiangtao
    ;
    Wei, Wei
    ;
    Zhang, Lei
    ;
    Muqeet, Abdul
    ;
    Hwang, Jiwon
    ;
    Yang, Subin
    ;
    Kang, Jung Heum
    ;
    Bae, Sung Ho
    ;
    Kim, Yongwoo
    ;
    Qu, Yanyun
    ;
    Jeon, Geun Woo
    ;
    Choi, Jun Ho
    ;
    Kim, Jun Hyuk
    ;
    Lee, Jong Seok
    ;
    Marty, Steven
    ;
    Marty, Eric
    ;
    Xiong, Dongliang
    ;
    Chen, Siang
    ;
    Zha, Lin
    ;
    Jiang, Jiande
    ;
    Gao, Xinbo
    ;
    Lu, Wen
    ;
    Wang, Haicheng
    ;
    Bhaskara, Vineeth
    ;
    Levinshtein, Alex
    ;
    Tsogkas, Stavros
    ;
    Jepson, Allan
    ;
    Kong, Xiangzhen
    ;
    Zhao, Tongtong
    ;
    Zhao, Shanshan
    ;
    Hrishikesh, P. S.
    ;
    Puthussery, Densen
    ;
    Jiji, C. V.
    ;
    Nan, Nan
    ;
    Liu, Shuai
    ;
    Cai, Jie
    ;
    Meng, Zibo
    ;
    Ding, Jiaming
    ;
    Ho, Chiu Man
    ;
    Wang, Xuehui
    ;
    Yan, Qiong
    ;
    Zhao, Yuzhi
    ;
    Chen, Long
    ;
    Sun, Long
    ;
    Wang, Wenhao
    ;
    Liu, Zhenbing
    ;
    Lan, Rushi
    ;
    Umer, Rao Muhammad
    ;
    Micheloni, Christian
    This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor × 4 based on a set of prior examples of low and corresponding high resolution images. The goal is to devise a network that reduces one or several aspects such as runtime, parameter count, FLOPs, activations, and memory consumption while at least maintaining PSNR of MSRResNet. The track had 150 registered participants, and 25 teams submitted the final results. They gauge the state-of-the-art in efficient single image super-resolution.
  • Placeholder Image
    Publication
    Exploring the Effectiveness of Mask-Guided Feature Modulation as a Mechanism for Localized Style Editing of Real Images
    (27-06-2023)
    Tomar, Snehal Singh
    ;
    Suin, Maitreya
    ;
    The success of Deep Generative Models at high-resolution image generation has led to their extensive utilization for style editing of real images. Most existing methods work on the principle of inverting real images onto their latent space, followed by determining controllable directions. Both inversion of real images and determination of controllable latent directions are computationally expensive operations. Moreover, the determination of controllable latent directions requires additional human supervision. This work aims to explore the efficacy of mask-guided feature modulation in the latent space of a Deep Generative Model as a solution to these bottlenecks. To this end, we present the SemanticStyle Autoencoder (SSAE), a deep Generative Autoencoder model that leverages semantic mask-guided latent space manipulation for highly localized photorealistic style editing of real images. We present qualitative and quantitative results for the same and their analysis. This work shall serve as a guiding primer for future work.
  • Placeholder Image
    Publication
    Region-Adaptive dense network for efficient motion deblurring
    (01-01-2020)
    Purohit, Kuldeep
    ;
    In this paper, we address the problem of dynamic scene deblurring in the presence of motion blur. Restoration of images affected by severe blur necessitates a network design with a large receptive field, which existing networks attempt to achieve through simple increment in the number of generic convolution layers, kernel-size, or the scales at which the image is processed. However, these techniques ignore the nonuniform nature of blur, and they come at the expense of an increase in model size and inference time. We present a new architecture composed of region adaptive dense deformable modules that implicitly discover the spatially varying shifts responsible for non-uniform blur in the input image and learn to modulate the filters. This capability is complemented by a self-attentive module which captures non-local spatial relationships among the intermediate features and enhances the spatially varying processing capability. We incorporate these modules into a densely connected encoder-decoder design which utilizes pre-trained Densenet filters to further improve the performance. Our network facilitates interpretable modeling of the spatially-varying deblurring process while dispensing with multi-scale processing and large filters entirely. Extensive comparisons with prior art on benchmark dynamic scene deblurring datasets clearly demonstrate the superiority of the proposed networks via significant improvements in accuracy and speed, enabling almost real-time deblurring.
  • Placeholder Image
    Publication
    Spatially-attentive patch-hierarchical network for adaptive motion deblurring
    (01-01-2020)
    Suin, Maitreya
    ;
    Purohit, Kuldeep
    ;
    This paper tackles the problem of motion deblurring of dynamic scenes. Although end-to-end fully convolutional designs have recently advanced the state-of-the-art in non-uniform motion deblurring, their performance-complexity trade-off is still sub-optimal. Existing approaches achieve a large receptive field by increasing the number of generic convolution layers and kernel-size, but this comesat the expense of of the increase in model size and inference speed. In this work, we propose an efficient pixel adaptive and feature attentive design for handling large blur variations across different spatial locations and process each test image adaptively. We also propose an effective content-aware global-local filtering module that significantly improves performance by considering not only global dependencies but also by dynamically exploiting neighboring pixel information. We use a patch-hierarchical attentive architecture composed of the above module that implicitly discovers the spatial variations in the blur present in the input image and in turn, performs local and global modulation of intermediate features. Extensive qualitative and quantitative comparisons with prior art on deblurring benchmarks demonstrate that our design offers significant improvements over the state-of-the-art in accuracy as well as speed.