Options
Mansi Sharma
A Novel Image Fusion Scheme for FTV View Synthesis Based on Layered Depth Scene Representation & Scale Periodic Transform
01-12-2019, Sharma, Mansi, Ragavan, Gowtham
This paper presents a novel image fusion scheme for view synthesis based on a layered depth profile of the scene and scale periodic transform. To create a layered depth profile of the scene, we utilize the unique properties of scale transform considering the problem of depth map computation from reference images as a certain shift-variant problem. The problem of depth computation is solved without deterministic stereo correspondences or rather than representing image signals in terms of shifts. Instead, we pose the problem of image signals being representable as scale periodic function, and compute appropriate depth estimates determining the scalings of a basis function. The rendering process is formulated as a novel image fusion in which the textures of all probable matching points are adaptively determined, leveraging implicitly the geometric information. The results demonstrate superiority of the proposed approach in suppressing geometric, blurring or flicker artifacts in rendered wide-baseline virtual videos.
A Hybrid Tucker-VQ Ttensor Sketch decomposition model for coding and streaming real world light fields using stack of differently focused images
01-07-2022, Ravishankar, Joshitha, Mansi Sharma, Khaidem, Sally
Computational multi-view displays involving light fields are a fast emerging choice for 3D presentation of real-world scenes. Tensor autostereoscopic glasses-free displays use just few light attenuating layers in front of a backlight to output high quality light field. We propose three novel schemes, Focal Stack - Hybrid Tucker-TensorSketch Vector Quantization (FS-HTTSVQ), Focal Stack - Tucker-TensorSketch (FS-TTS), and Focal Stack - Tucker Alternating Least-Squares (FS-TALS), for efficient representation, streaming and coding of light fields using a stack of differently focused images. Working with a focal stack instead of the entire light field majorly reduces the data acquisition cost as well as the computation and processing cost. Extensive experiments with real world light field focal stacks demonstrate that proposed novel one-pass Tucker decomposition using TensorSketch with hybrid vector quantization in FS-HTTSVQ, compactly represents the approximated focal stack in codebook form for better transmission and streaming. Encoding with High Efficiency Video Coding (HEVC) eliminates all intrinsic redundancies present in the approximated focal stack. Resultant low-rank approximated and coded focal stack is then employed to analytically optimize layer patterns for the tensor display. The complete end-to-end light field processing pipelines flexibly work for multiple bitrates and are adaptable for a variety of multi-view autostereoscopic platforms. Our schemes exhibit note-worthy performances on focal stacks compared to direct encoding of an entire light field using a standard codec like HEVC.
Virtual reality. robotics, and artificial intelligence: Technological interventions in stroke rehabilitation
23-09-2022, Kanade, Aditya, Mansi Sharma, Muniyandi Manivannan
Stroke is a leading cause of death in humans. In the US, someone has a stroke every 40 seconds. More than half of the stroke-affected patients over the age of 65 have reduced mobility. The prevalence of stroke in our society is increasing; however, since stroke comes with a Iot of post-hospitalization care, a Iot of infrastructure is lacking to cater to the demands of the increasing population of patients. In this chapter, the authors look at three technological interventions in the form of machine learning, virtual reality, and robotics. They look at how the research is evolving in these fields and pushing for easier and more reliable ways for rehabilitation. They also highlight methods that show promise in the area of home-based rehabilitation.
Predicting Forward & Backward Facial Depth Maps from a Single RGB Image for Mobile 3d AR Application
01-12-2019, Avinash, P., Sharma, Mansi
Cheap and fast 3D asset creation to enable AR/VR applications is a fast growing domain. This paper addresses a significant problem of reconstructing complete 3D information of a face in near real-time speed on a mobile phone. We propose a novel deep learning based solution to predict robust depth maps of a face, one forward facing and the other backward facing, from a single image from the wild. A critical contribution is that the proposed network is capable of learning the depths of the occluded part of the face too. This is achieved by training a fully convolutional neural network to learn the dual (forward and backward) depth maps, with a common encoder and two separate decoders. The 300W-LP, a cloud point dataset, is used to compute the required dual depth maps from the training data. The code and results will be made available at project page.
A Hierarchical Approach for Lossy Light Field Compression With Multiple Bit Rates Based on Tucker Decomposition via Random Sketching
01-01-2022, Ravishankar, Joshitha, Mansi Sharma
Recently, there has been extensive progress in developing autostereoscopic platforms for display purposes to present real-world 3D scenes. Light fields are the best emerging choice for computational multi-view autostereoscopic displays since they provide an optimized solution to support direction-dependent outputs simultaneously without sacrificing the resolution. We present a novel light field representation, coding and streaming scheme that efficiently handles large tensor data. Intrinsic redundancies in light field subsets are eliminated through low-rank representation using Tucker decomposition with tensor sketching for various ranks and sketch dimension parameters, making it ideal for streaming and transmission. Apart from removing spatial redundancies, the approximated light field is used to construct a Fourier disparity layers representation to further exploit other non-linear, temporal, intra and inter-view correlations present among the approximated sub-aperture images. Four scanning or view prediction patterns are utilized and the subsets in each pattern hierarchically construct the FDL representation and synthesize subsequent views. Iterative refinement and encoding with HEVC are followed by the final light field reconstruction. The complete end-to-end processing pipeline can flexibly work for multiple bitrates and is adaptable for a variety of multi-view autostereoscopic platforms. The compression performance of the proposed scheme is analyzed on real light fields. We achieved substantial bitrate savings compared to state-of-the-art codecs, while maintaining good reconstruction quality.
A Novel 3D-Unet Deep Learning Framework Based on High-Dimensional Bilateral Grid for Edge Consistent Single Image Depth Estimation
15-12-2020, Sharma, Mansi, Sharma, Abheesht, Tushar, Kadvekar Rohit, Panneer, Avinash
The task of predicting smooth and edge-consistent depth maps is notoriously difficult for single image depth estimation. This paper proposes a novel Bilateral Grid based 3D convolutional neural network, dubbed as 3DBG-UNet, that parameterize high dimensional feature space by encoding compact 3D bilateral grids with UNets and infers sharp geometric layout of the scene. Further, an another novel 3DBGES-UNet model is introduced that integrate 3DBG-UNet for inferring an accurate depth map given a single color view. The 3DBGES-UNet concatenate 3DBG-UNet geometry map with the inception network edge accentuation map and a spatial object's boundary map obtained by leveraging semantic segmentation and train the UNet model with ResNet backbone. Both models are designed with a particular attention to explicitly account for edges or minute details. Preserving sharp discontinuities at depth edges is critical for many applications such as realistic integration of virtual objects in AR video or occlusion-aware view synthesis for 3D display applications. The proposed depth prediction network achieves state-of-the-art performance in both qualitative and quantitative evaluations on the challenging NYUv2-Depth data. The code and corresponding pre-trained weights will be made publicly available.
Latent Factor Modeling of Perceived Quality for Stereoscopic 3D Video Recommendation
01-01-2021, Appina, Balasubramanyam, Sharma, Mansi, Kumar, Santosh, Kara, Peter A., Simon, Aniko, Guindy, Mary
Numerous stereoscopic 3D movies are released every single year to movie theaters and they evidently generate large revenues. Despite the notable improvements in stereo capturing and 3D video post-production technologies, stereoscopic artefacts continue to appear even in high-budget films. Existing automatic 3D video quality measurement tools can detect distortions in stereoscopic images and videos, but they fail to determine the viewer's subjective perception of those arte-facts, and how these distortions affect their choices and the overall visual experience. In this paper, we introduce a novel recommendation system for stereoscopic 3D movies based on a latent factor model that meticulously analyzes the viewer's subjective ratings and the influence of 3D video distortions on their personal preferences. To the best knowledge of the authors, this is definitely a first-of-its-kind model that recommends 3D movies based on quality ratings. It takes the correlation between the viewer's visual discomfort and the perception of stereoscopic artefacts into account. The proposed model is trained and tested on the benchmark Nama3ds1-cospad1 and LFOVIAS3DPh2 S3D video quality assessment datasets. The experiments highlight the practical efficiency and considerable performance of the resulting matrix-factorization-based recommendation system.
An Integrated Optimization Approach for Depth Map Enhancement on Special Riemannian Manifold
18-12-2018, Sharma, Mansi
Depth images captured by consumer depth sensors like ToF Cameras or Microsoft Kinect are often noisy and incomplete. Most existing methods recover missing depth values from low quality measurements using information in the corresponding color images. However, the performance of such methods is susceptible when color image is noisy or correlation between RGB-D is weak. This paper presents a depth map enhancement algorithm based on Riemannian Geometry that performs depth map de-noising and completion simultaneously. The algorithm is based on the observation that similar RGB-D patches lie in a very low-dimensional subspace over the Riemannian quotient manifold of varying-rank matrices. The similar RGB-D patches are assembled into a matrix and optimization is performed on the search space of this quotient manifold with Kronecker product trace norm penalty. The proposed convex optimization problem on a special quotient manifold essentially captures the underlying structure in the color and depth patches. This enables robust depth refinement against noise or weak correlation between RGB-D data. This non-Euclidean approach with Kronecker product trace-norm constraints and cones in the non-linear matrix spaces provide a proper geometric framework to perform optimization. This formulates depth map enhancement as a matrix completion problem in the product space of Riemannian manifolds. This Riemannian submersion automatically handles ranks that change over matrices, and ensures guaranteed convergence over constructed manifold. The experiments on public benchmarks RGB-D images show that proposed method can effectively enhance depth maps.
A Novel Randomize Hierarchical Extension of MV-HEVC for Improved Light Field Compression
01-12-2019, Sharma, Mansi, Ragavan, Gowtham
This paper presents a novel scheme for light field compression based on a randomize hierarchical multi-view extension of high efficiency video coding (dubbed as RH-MVHEVC). Specifically, a light field data are arranged as a multiple pseudo-temporal video sequences which are efficiently compressed with MV-HEVC encoder, following an integrated random coding technique and hierarchical prediction scheme. The critical advantage of proposed RH-MVHEVC scheme is that it utilizes not just a temporal and inter-view prediction, but efficiently exploits the strong intrinsic similarities within each sub-aperture image and among neighboring sub-aperture images in both horizontal and vertical directions. Experimental results consistently outperform the state-of-the-art compression methods on benchmark ICME 2016 and ICIP 2017 grand challenge data sets. It achieves an average up to 33.803% BD-rate reduction and 1.7978 dB BD-PSNR improvement compared with an advanced JEM video encoder, and an average 20.4156% BD-rate reduction and 2.0644 dB BD-PSNR improvement compared with a latest image-based JEM-anchor coding scheme.
A Rich Stereoscopic 3D High Dynamic Range Image & Video Database of Natural Scenes
01-12-2019, Wadaskar, Aditya, Sharma, Mansi, Lal, Rohan
The consumer market of High Dynamic Range (HDR) displays and cameras is blooming rapidly with the advent of 3D video and display technologies. Specialised agencies like Moving Picture Experts Group and International Telecommunication Union are demanding the standardization of latest display advancements. Lack of sufficient experimental data is a major bottleneck for the development of preliminary research efforts in 3D HDR video technology. We propose to make publicly available to the research community, a diversified database of Stereoscopic 3D HDR images and videos, captured within the beautiful campus of Indian Institute of Technology, Madras, which is blessed with rich flora and fauna, and is home to several rare wildlife species. Further, we have described the procedure of capturing, aligning, calibrating and post-processing of 3D images and videos. We have discussed research opportunities and challenges, and the potential use cases of HDR stereo 3D applications and depth-from-HDR aspects.
- «
- 1 (current)
- 2
- 3
- »