Now showing 1 - 10 of 53
  • Placeholder Image
    Publication
    Face Age Progression with Attribute Manipulation
    (01-01-2022)
    Tatikonda, Sinzith
    ;
    Nambiar, Athira
    ;
    The human face is one of the predominant means of person recognition. Human faces are affected by many factors i.e. time, attributes, weather, and other subject-specific variations. Although face aging has been studied in the past, the impact of the aforesaid factors, especially, the effect of attributes on the aging process were unexplored. In this paper, we propose a novel holistic “Face Age progression With Attribute Manipulation” (FAWAM) model that generates face images at different ages while simultaneously varying attributes and other subject specific characteristics. We address the task in a bottom-up manner, considering both age and attributes submodules. For face aging, we use an attribute-conscious face aging model with a pyramidal generative adversarial network that can model age-specific facial changes while maintaining intrinsic subject specific characteristics. For facial attribute manipulation, the age processed facial image is manipulated with desired attributes while preserving other details unchanged, leveraging an attribute generative adversarial network architecture. Our proposed model achieves significant qualitative as well as quantitative performance results.
  • Placeholder Image
    Publication
    Adaptive locally affine-invariant shape matching
    (01-05-2018)
    Marvaniya, Smit
    ;
    Gupta, Raj
    ;
    Matching deformable objects using their shapes are an important problem in computer vision since shape is perhaps the most distinguishable characteristic of an object. The problem is difficult due to many factors such as intra-class variations, local deformations, articulations, viewpoint changes and missed and extraneous contour portions due to errors in shape extraction. While small local deformations have been handled in the literature by allowing some leeway in the matching of individual contour points via methods such as Chamfer distance and Hausdorff distance, handling more severe deformations and articulations has been done by applying local geometric corrections such as similarity or affine. However, determining which portions of the shape should be used for the geometric corrections is very hard, although some methods have been tried. In this paper, we address this problem by an efficient search for the group of contour segments to be clustered together for a geometric correction using dynamic programming by essentially searching for the segmentations of two shapes that lead to the best matching between them. At the same time, we allow portions of the contours to remain unmatched to handle missing and extraneous contour portions. Experiments indicate that our method outperforms other algorithms, especially when the shapes to be matched are more complex.
  • Placeholder Image
    Publication
    SMD: A locally stable monotonic change invariant feature descriptor
    (01-01-2008)
    Gupta, Raj
    ;
    Extraction and matching of discriminative feature points in images is an important problem in computer vision with applications in image classification, object recognition, mosaicing, automatic 3D reconstruction and stereo. Features are represented and matched via descriptors that must be invariant to small errors in the localization and scale of the extracted feature point, viewpoint changes, and other kinds of changes such as illumination, image compression and blur. While currently used feature descriptors are able to deal with many of such changes, they are not invariant to a generic monotonic change in the intensities, which occurs in many cases. Furthermore, their performance degrades rapidly with many image degradations such as blur and compression where the intensity transformation is non-linear. In this paper, we present a new feature descriptor that obtains invariance to a monotonic change in the intensity of the patch by looking at orders between certain pixels in the patch. An order change between pixels indicates a difference between the patches which is penalized. Summation of such penalties over carefully chosen pixel pairs that are stable to small errors in their localization and are independent of each other leads to a robust measure of change between two features. Promising results were obtained using this approach that show significant improvement over existing methods, especially in the case of illumination change, blur and JPEG compression where the intensity of the points changes from one image to the next. © 2008 Springer Berlin Heidelberg.
  • Placeholder Image
    Publication
    Refining high-frequencies for sharper super-resolution and deblurring
    (01-10-2020)
    Singh, Vikram
    ;
    Ramnath, Keerthan
    ;
    A sub-problem of paramount importance in super-resolution is the generation of an upsampled image (or frame) that is ‘sharp’. In deblurring, the core problem itself is of removing the blur, and it is equivalent to the problem of generating a ‘sharper’ version of the given image. This sharpness in the generated image comes by accurately predicting the high-frequency details (commonly referred to as fine-details) such as object edges. Thus high-frequency prediction is a vital sub-problem in super-resolution and a core problem in deblurring. To generate a sharp upsampled or deblurred image, this paper proposes a multi-stage neural network architecture ‘HFR-Net’ that works on the principle of ‘explicit refinement and fusion of high-frequency details’. To implement this principle, HFR-Net is trained with a novel 2-phase progressive–retrogressive training method. In addition to the training method, this paper also introduces dual motion warping with attention. It is a technique that is specifically designed to handle videos that have different rates of motion. Results obtained from extensive experiments on multiple super-resolution and deblurring datasets reveal that the proposed approach gives better results than the current state-of-the-art techniques.
  • Placeholder Image
    Publication
    Feature ensemble networks with re-ranking for recognizing disguised faces in the wild
    (01-10-2019)
    Subramaniam, Arulkumar
    ;
    Sridhar, Ajay Narayanan
    ;
    Recognizing a person's face images with intentional/unintentional disguising effects such as make-up, plastic surgery, artificial wearables (hats, eye-glasses) is a challenging task. We propose a Feature EnsemBle Network (FEBNet) for recognizing Disguised Faces in the Wild (DFW). FEBNet encompasses multiple base networks (SE-ResNet50, Inception-ResNet-V1) pretrained on large-scale face recognition datasets (MS-Celeb-1M, VGGFace2) and fine-tuned on DFW training dataset. During the fine-tuning phase, we propose to use two novel objective functions, namely, 1) Category loss, 2) Impersonator Triplet loss along with two prevalent objective functions: Identity loss, Inter-person Triplet loss. To further improve the performance, we apply a state-of-the-art re-ranking strategy as a post-processing step. Extensive ablation studies and evaluation results show that FEBNet significantly outperforms the baseline models.
  • Placeholder Image
    Publication
    Non-linear Motion Estimation for Video Frame Interpolation using Space-time Convolutions
    (01-01-2022)
    Dutta, Saikat
    ;
    Subramaniam, Arulkumar
    ;
    Video frame interpolation aims to synthesize one or multiple frames between two consecutive frames in a video. It has a wide range of applications including slow-motion video generation, video compression and developing video codecs. Some older works tackled this problem by assuming per-pixel linear motion between video frames. However, objects often follow a non-linear motion pattern in the real domain and some recent methods attempt to model per-pixel motion by non-linear models (e.g., quadratic). A quadratic model can also be inaccurate, especially in the case of motion discontinuities over time (i.e. sudden jerks) and occlusions, where some of the flow information may be invalid or inaccurate. In our paper, we propose to approximate the per-pixel motion using a space-time convolution network that is able to adaptively select the motion model to be used. Specifically, we are able to softly switch between a linear and a quadratic model. Towards this end, we use an end-to-end 3D CNN encoder-decoder architecture over bidirectional optical flows and occlusion maps to estimate the non-linear motion model of each pixel. Further, a motion refinement module is employed to refine the non-linear motion and the interpolated frames are estimated by a simple warping of the neighboring frames with the estimated per-pixel motion. We show that our method outperforms state-of-the-art algorithms on four datasets.
  • Placeholder Image
    Publication
    Data augmentation using part analysis for shape classification
    (04-03-2019)
    Patel, Vismay
    ;
    Mujumdar, Niranjan
    ;
    Balasubramanian, Prashanth
    ;
    Marvaniya, Smit
    ;
    Deep Convolutional Neural Networks have shown drastic improvements in the performance of various Computer Vision tasks. However, shape classification is a problem that has not seen state-of-the-art results using CNNs. The problem is due to the lack of large amounts of data to learn to handle multiple variations such as noise, pose variations, part articulations and affine deformations present in the shapes. In this paper, we introduce a new technique for augmenting 2D shape data that uses part articulations. This utilizes a novel articulation cut detection method to determine putative shape parts. Standard off-the-shelf CNN models trained with our novel data augmentation technique on standard 2D shape datasets yielded significant improvements over the state-of-the-art in most experiments and our data augmentation approach has the potential to be extended to other problems such as Image Classification and Object Detection.
  • Placeholder Image
    Publication
    Real-time upper-body human pose estimation using a depth camera
    (28-10-2011)
    Jain, Himanshu Prakash
    ;
    Subramanian, Anbumani
    ;
    ;
    Automatic detection and pose estimation of humans is an important task in Human-Computer Interaction (HCI), user interaction and event analysis. This paper presents a model based approach for detecting and estimating human pose by fusing depth and RGB color data from monocular view. The proposed system uses Haar cascade based detection and template matching to perform tracking of the most reliably detectable parts namely, head and torso. A stick figure model is used to represent the detected body parts. The fitting is then performed independently for each limb, using the weighted distance transform map. The fact that each limb is fitted independently speeds-up the fitting process and makes it robust, avoiding the combinatorial complexity problems that are common with these types of methods. The output is a stick figure model consistent with the pose of the person in the given input image. The algorithm works in real-time and is fully automatic and can detect multiple non-intersecting people. © 2011 Springer-Verlag Berlin Heidelberg.
  • Placeholder Image
    Publication
    Transfer Learning and Few-Shot Learning Based Deep Neural Network Models for Underwater Sonar Image Classification With a Few Samples
    (01-01-2023)
    Chungath, Tincy Thomas
    ;
    Nambiar, Athira M.
    ;
    Acoustic imaging sonar systems are widely used for long-range underwater surveillance in various civilian and military applications. They provide 2-D images of underwater objects, even in turbid water conditions where optical underwater imaging systems fail. Achieving high accuracy in automatic deep learning based underwater image classification remains an open problem due to insufficient data availability, poor image resolution, low signal-to-noise ratio surroundings, etc. In this study, we conduct a comparative analysis of different advanced deep learning approaches, i.e., transfer learning and few-shot learning, to address the problem of automatic object classification in sonar images, using a few samples of data. Specifically, two metric learning-based approaches, i.e., siamese network and triplet network as well as library-based approaches, are studied under the few-shot learning paradigm. Extensive experiments are conducted on a novel custom-made dataset developed in-house, along with the publicly available SeabedObjectsKLSG dataset. In addition, the effectiveness of the sampling technique in handling class imbalance during model training is also investigated in this work. Our experimental results highlight that the few-shot learning based approach is a promising direction for future research on underwater image classification with a few samples.
  • Placeholder Image
    Publication
    Domain Adaptive Knowledge Distillation for Driving Scene Semantic Segmentation
    (01-01-2021)
    Kothandaraman, Divya
    ;
    Nambiar, Athira
    ;
    Practical autonomous driving systems face two crucial challenges: Memory constraints and domain gap issues. In this paper, we present a novel approach to learn domain adaptive knowledge in models with limited memory, thus bestowing the model with the ability to deal with these issues in a comprehensive manner. We term this as 'Domain Adaptive Knowledge Distillation ' and address the same in the context of unsupervised domain-adaptive semantic segmentation by proposing a multi-level distillation strategy to effectively distil knowledge at different levels. Further, we introduce a novel cross entropy loss that leverages pseudo labels from the teacher. These pseudo teacher labels play a multifaceted role towards: (i) knowledge distillation from the teacher network to the student network (ii) serving as a proxy for the ground truth for target domain images, where the problem is completely unsupervised. We introduce four paradigms for distilling domain adaptive knowledge and carry out extensive experiments and ablation studies on real-to-real as well as synthetic-to-real scenarios. Our experiments demonstrate the profound success of our proposed method.