Options
Mansi Sharma
Loading...
Preferred name
Mansi Sharma
Official Name
Mansi Sharma
Alternative Name
Sharma, Mansi
Main Affiliation
ORCID
Scopus Author ID
Google Scholar ID
14 results
Now showing 1 - 10 of 14
- PublicationA Hierarchical Approach for Lossy Light Field Compression With Multiple Bit Rates Based on Tucker Decomposition via Random Sketching(01-01-2022)
;Ravishankar, JoshithaRecently, there has been extensive progress in developing autostereoscopic platforms for display purposes to present real-world 3D scenes. Light fields are the best emerging choice for computational multi-view autostereoscopic displays since they provide an optimized solution to support direction-dependent outputs simultaneously without sacrificing the resolution. We present a novel light field representation, coding and streaming scheme that efficiently handles large tensor data. Intrinsic redundancies in light field subsets are eliminated through low-rank representation using Tucker decomposition with tensor sketching for various ranks and sketch dimension parameters, making it ideal for streaming and transmission. Apart from removing spatial redundancies, the approximated light field is used to construct a Fourier disparity layers representation to further exploit other non-linear, temporal, intra and inter-view correlations present among the approximated sub-aperture images. Four scanning or view prediction patterns are utilized and the subsets in each pattern hierarchically construct the FDL representation and synthesize subsequent views. Iterative refinement and encoding with HEVC are followed by the final light field reconstruction. The complete end-to-end processing pipeline can flexibly work for multiple bitrates and is adaptable for a variety of multi-view autostereoscopic platforms. The compression performance of the proposed scheme is analyzed on real light fields. We achieved substantial bitrate savings compared to state-of-the-art codecs, while maintaining good reconstruction quality. - PublicationA Hybrid Tucker-VQ Ttensor Sketch decomposition model for coding and streaming real world light fields using stack of differently focused images(01-07-2022)
;Ravishankar, Joshitha; Khaidem, SallyComputational multi-view displays involving light fields are a fast emerging choice for 3D presentation of real-world scenes. Tensor autostereoscopic glasses-free displays use just few light attenuating layers in front of a backlight to output high quality light field. We propose three novel schemes, Focal Stack - Hybrid Tucker-TensorSketch Vector Quantization (FS-HTTSVQ), Focal Stack - Tucker-TensorSketch (FS-TTS), and Focal Stack - Tucker Alternating Least-Squares (FS-TALS), for efficient representation, streaming and coding of light fields using a stack of differently focused images. Working with a focal stack instead of the entire light field majorly reduces the data acquisition cost as well as the computation and processing cost. Extensive experiments with real world light field focal stacks demonstrate that proposed novel one-pass Tucker decomposition using TensorSketch with hybrid vector quantization in FS-HTTSVQ, compactly represents the approximated focal stack in codebook form for better transmission and streaming. Encoding with High Efficiency Video Coding (HEVC) eliminates all intrinsic redundancies present in the approximated focal stack. Resultant low-rank approximated and coded focal stack is then employed to analytically optimize layer patterns for the tensor display. The complete end-to-end light field processing pipelines flexibly work for multiple bitrates and are adaptable for a variety of multi-view autostereoscopic platforms. Our schemes exhibit note-worthy performances on focal stacks compared to direct encoding of an entire light field using a standard codec like HEVC. - PublicationA Novel 3D-Unet Deep Learning Framework Based on High-Dimensional Bilateral Grid for Edge Consistent Single Image Depth Estimation(15-12-2020)
; ;Sharma, Abheesht ;Tushar, Kadvekar RohitPanneer, AvinashThe task of predicting smooth and edge-consistent depth maps is notoriously difficult for single image depth estimation. This paper proposes a novel Bilateral Grid based 3D convolutional neural network, dubbed as 3DBG-UNet, that parameterize high dimensional feature space by encoding compact 3D bilateral grids with UNets and infers sharp geometric layout of the scene. Further, an another novel 3DBGES-UNet model is introduced that integrate 3DBG-UNet for inferring an accurate depth map given a single color view. The 3DBGES-UNet concatenate 3DBG-UNet geometry map with the inception network edge accentuation map and a spatial object's boundary map obtained by leveraging semantic segmentation and train the UNet model with ResNet backbone. Both models are designed with a particular attention to explicitly account for edges or minute details. Preserving sharp discontinuities at depth edges is critical for many applications such as realistic integration of virtual objects in AR video or occlusion-aware view synthesis for 3D display applications. The proposed depth prediction network achieves state-of-the-art performance in both qualitative and quantitative evaluations on the challenging NYUv2-Depth data. The code and corresponding pre-trained weights will be made publicly available. - PublicationVirtual reality. robotics, and artificial intelligence: Technological interventions in stroke rehabilitation(23-09-2022)
;Kanade, Aditya; Stroke is a leading cause of death in humans. In the US, someone has a stroke every 40 seconds. More than half of the stroke-affected patients over the age of 65 have reduced mobility. The prevalence of stroke in our society is increasing; however, since stroke comes with a Iot of post-hospitalization care, a Iot of infrastructure is lacking to cater to the demands of the increasing population of patients. In this chapter, the authors look at three technological interventions in the form of machine learning, virtual reality, and robotics. They look at how the research is evolving in these fields and pushing for easier and more reliable ways for rehabilitation. They also highlight methods that show promise in the area of home-based rehabilitation. - PublicationLatent Factor Modeling of Perceived Quality for Stereoscopic 3D Video Recommendation(01-01-2021)
;Appina, Balasubramanyam; ;Kumar, Santosh ;Kara, Peter A. ;Simon, AnikoGuindy, MaryNumerous stereoscopic 3D movies are released every single year to movie theaters and they evidently generate large revenues. Despite the notable improvements in stereo capturing and 3D video post-production technologies, stereoscopic artefacts continue to appear even in high-budget films. Existing automatic 3D video quality measurement tools can detect distortions in stereoscopic images and videos, but they fail to determine the viewer's subjective perception of those arte-facts, and how these distortions affect their choices and the overall visual experience. In this paper, we introduce a novel recommendation system for stereoscopic 3D movies based on a latent factor model that meticulously analyzes the viewer's subjective ratings and the influence of 3D video distortions on their personal preferences. To the best knowledge of the authors, this is definitely a first-of-its-kind model that recommends 3D movies based on quality ratings. It takes the correlation between the viewer's visual discomfort and the perception of stereoscopic artefacts into account. The proposed model is trained and tested on the benchmark Nama3ds1-cospad1 and LFOVIAS3DPh2 S3D video quality assessment datasets. The experiments highlight the practical efficiency and considerable performance of the resulting matrix-factorization-based recommendation system. - PublicationTele-EvalNet: A Low-Cost, Teleconsultation System for Home Based Rehabilitation of Stroke Survivors Using Multiscale CNN-ConvLSTM Architecture(01-01-2023)
;Kanade, Aditya; Muniyandi, ManivannanHome-based physical-rehabilitation programmes make up a significant portion of all physical rehabilitation programmes. Due to the absence of clinical supervision during home-based sessions, corrective feedback and movement quality evaluation are of utmost importance. We propose a complete home-based rehabilitation suite consisting of 1) a live-feedback module and 2) a deep-learning based movement quality assessment model. The live feedback module provides real-time feedback on a patient’s exercise performance with easy-to-understand color cues. The deep-learning model evaluates the overall exercise performance and gives real-valued movement quality assessment scores. In this paper, we investigate role of the following components in designing the deep-learning model: 1) clinically guided features, 2) special activation functions, 3) multi-scale convolutional architecture, and 4) context windows. Compared to current state-of-the-art deep-learning methods for assessing movement quality, improved performance on a standard physical rehabilitation dataset KIMORE with 78 subjects is reported. Performance improvement is coupled with a drastic reduction in parameter size and inference time of the model by atleast an order of magnitude. Therefore, making real-time feedback to the subjects possible. Finally, an extensive ablation study is carried out to assess the effectiveness of each building block in the network. - PublicationA novel hierarchical light field coding scheme based on hybrid stacked multiplicative layers and Fourier disparity layers for glasses-free 3D displays(01-11-2022)
;Ravishankar, JoshithaWe present a novel hierarchical coding scheme for light fields based on transmittance patterns of low-rank multiplicative layers and Fourier disparity layers. The proposed scheme identifies multiplicative layers of light field view subsets optimized using convolutional neural networks for different scanning orders. Our approach exploits the hidden low-rank structure in the multiplicative layers obtained from the subsets of different scanning patterns. The spatial redundancies in the multiplicative layers can be efficiently removed by performing low-rank approximation at different ranks on the Krylov subspace. The intra-view and inter-view redundancies between approximated layers are further removed by HEVC encoding. Next, a Fourier disparity layer representation is constructed from the first subset of the approximated light field based on the chosen hierarchical order. Subsequent view subsets are synthesized by modeling the Fourier disparity layers that iteratively refine the representation with improved accuracy. The critical advantage of the proposed hybrid layered representation and coding scheme is that it utilizes not just spatial and temporal redundancies in light fields, but also efficiently exploits intrinsic similarities among neighboring sub-aperture images in both horizontal and vertical directions as specified by different predication orders. In addition, the scheme is flexible to realize a range of multiple bitrates at the decoder within a single integrated system. Comparison with state-of-the-art light field coders exhibits superior compression performance of the proposed scheme for real-world light fields. We achieve substantial bitrate savings and also maintain good light field reconstruction quality. - PublicationSDE-DualENet: A Novel Dual Efficient Convolutional Neural Network for Robust Stereo Depth Estimation(01-01-2021)
;Anil, Rithvik; Choudhary, RohitStereo depth estimation is dependent on optimal correspondence matching between pixels of stereo-pair image to infer depth. In this paper, we attempt to revisit the stereo depth estimation problem in a simple dual convolutional neural network (CNN) based on EfficientNet that avoids the construction of a cost volume in stereo matching. This has been performed by considering different weights in otherwise identical towers of the CNN. The proposed algorithm is dubbed as SDE-DualENet. The architecture of SDE-DualENet eliminates the construction of cost-volume by learning to match correspondence between pixels with a different set of weights in the dual towers. The results are demonstrated on complex scenes with high details and large depth variations. The SDE-DualENet depth prediction network outperforms state-of-the-art monocular and stereo depth estimation methods, both qualitatively and quantitatively on challenging scene flow dataset. The code and pre-trained models will be made publicly available. - PublicationA Hierarchical Coding Scheme for Glasses-free 3D Displays Based on Scalable Hybrid Layered Representation of Real-World Light Fields(01-01-2021)
;Ravishankar, JoshithaThis paper presents a novel hierarchical coding scheme for light fields based on transmittance patterns of low-rank multiplicative layers and Fourier disparity layers. The proposed scheme learns stacked multiplicative layers from subsets of light field views determined from different scanning orders. The multiplicative layers are optimized using a fast data-driven convolutional neural network (CNN). The essential factor for multiplicative layers representation, which has not been considered in previous compression approaches, is the origin of redundancy, i.e., the low-rank structure of light field data. The spatial correlation in layer patterns is exploited with varying low ranks through factorization derived from singular value decomposition on a Krylov subspace. Further, encoding with HEVC efficiently removes intra-view and inter-view correlation in low-rank approximated layers. The initial subset of approximated decoded views from multiplicative representation is used to construct Fourier disparity layer (FDL) representation. The FDL model synthesizes the second subset of views identified by a pre-defined hierarchical prediction order. The correlations between the prediction residue of synthesized views are further eliminated by encoding the residual signal. The set of views obtained from decoding the residual is employed to refine the FDL model and predict the next view subsets with improved accuracy. This hierarchical procedure is repeated until all light field views are encoded. The critical advantage of the proposed hybrid layered representation and coding scheme is that it utilizes not just spatial and temporal redundancies but efficiently exploits the strong intrinsic similarities among neighboring sub-aperture images in both horizontal and vertical directions as specified by different predication orders. Besides, the scheme is flexible to realize a range of multiple bitrates at the decoder within a single integrated system. The compression performance analyzed with real light field shows substantial bitrate savings, maintaining good reconstruction quality. - PublicationMEStereo-Du2CNN: a dual-channel CNN for learning robust depth estimates from multi-exposure stereo images for HDR 3D applications(01-01-2023)
;Choudhary, Rohit; ;Uma, T. V.Anil, RithvikDisplay technologies have evolved over the years. It is critical to develop practical HDR capturing, processing, and display solutions to bring 3D technologies to the next level. Depth estimation of multi-exposure stereo image sequences is an essential task in the development of cost-effective 3D HDR video content. In this paper, we develop a deep architecture for multi-exposure stereo depth estimation. The proposed architecture has two novel components. First, the stereo matching technique used in traditional stereo depth estimation is revamped. For the stereo depth estimation component of our architecture, a mono-to-stereo transfer learning approach is deployed. The proposed formulation circumvents the cost volume construction requirement, which is replaced by a dual-encoder single-decoder CNN with different weights for feature fusion. EfficientNet-based blocks are used to learn the disparity. Secondly, we combine disparity maps obtained from the stereo images at different exposure levels using a robust disparity feature fusion approach. The disparity maps obtained at different exposures are merged using weight maps calculated for different quality measures. The final predicted disparity map obtained is more robust and retains best features that preserve the depth discontinuities. The proposed CNN offers flexibility to train using standard dynamic range stereo data or with multi-exposure low dynamic range stereo sequences. In terms of performance, the proposed model surpasses state-of-the-art monocular and stereo depth estimation methods, both quantitatively and qualitatively, on challenging Scene flow and differently exposed Middlebury stereo datasets. The architecture performs exceedingly well on complex natural scenes, demonstrating its usefulness for diverse 3D HDR applications.