Now showing 1 - 6 of 6
  • Placeholder Image
    Publication
    A Hierarchical Approach for Lossy Light Field Compression With Multiple Bit Rates Based on Tucker Decomposition via Random Sketching
    (01-01-2022)
    Ravishankar, Joshitha
    ;
    Recently, there has been extensive progress in developing autostereoscopic platforms for display purposes to present real-world 3D scenes. Light fields are the best emerging choice for computational multi-view autostereoscopic displays since they provide an optimized solution to support direction-dependent outputs simultaneously without sacrificing the resolution. We present a novel light field representation, coding and streaming scheme that efficiently handles large tensor data. Intrinsic redundancies in light field subsets are eliminated through low-rank representation using Tucker decomposition with tensor sketching for various ranks and sketch dimension parameters, making it ideal for streaming and transmission. Apart from removing spatial redundancies, the approximated light field is used to construct a Fourier disparity layers representation to further exploit other non-linear, temporal, intra and inter-view correlations present among the approximated sub-aperture images. Four scanning or view prediction patterns are utilized and the subsets in each pattern hierarchically construct the FDL representation and synthesize subsequent views. Iterative refinement and encoding with HEVC are followed by the final light field reconstruction. The complete end-to-end processing pipeline can flexibly work for multiple bitrates and is adaptable for a variety of multi-view autostereoscopic platforms. The compression performance of the proposed scheme is analyzed on real light fields. We achieved substantial bitrate savings compared to state-of-the-art codecs, while maintaining good reconstruction quality.
  • Placeholder Image
    Publication
    A Hybrid Tucker-VQ Ttensor Sketch decomposition model for coding and streaming real world light fields using stack of differently focused images
    (01-07-2022)
    Ravishankar, Joshitha
    ;
    ;
    Khaidem, Sally
    Computational multi-view displays involving light fields are a fast emerging choice for 3D presentation of real-world scenes. Tensor autostereoscopic glasses-free displays use just few light attenuating layers in front of a backlight to output high quality light field. We propose three novel schemes, Focal Stack - Hybrid Tucker-TensorSketch Vector Quantization (FS-HTTSVQ), Focal Stack - Tucker-TensorSketch (FS-TTS), and Focal Stack - Tucker Alternating Least-Squares (FS-TALS), for efficient representation, streaming and coding of light fields using a stack of differently focused images. Working with a focal stack instead of the entire light field majorly reduces the data acquisition cost as well as the computation and processing cost. Extensive experiments with real world light field focal stacks demonstrate that proposed novel one-pass Tucker decomposition using TensorSketch with hybrid vector quantization in FS-HTTSVQ, compactly represents the approximated focal stack in codebook form for better transmission and streaming. Encoding with High Efficiency Video Coding (HEVC) eliminates all intrinsic redundancies present in the approximated focal stack. Resultant low-rank approximated and coded focal stack is then employed to analytically optimize layer patterns for the tensor display. The complete end-to-end light field processing pipelines flexibly work for multiple bitrates and are adaptable for a variety of multi-view autostereoscopic platforms. Our schemes exhibit note-worthy performances on focal stacks compared to direct encoding of an entire light field using a standard codec like HEVC.
  • Placeholder Image
    Publication
    A novel hierarchical light field coding scheme based on hybrid stacked multiplicative layers and Fourier disparity layers for glasses-free 3D displays
    (01-11-2022)
    Ravishankar, Joshitha
    ;
    We present a novel hierarchical coding scheme for light fields based on transmittance patterns of low-rank multiplicative layers and Fourier disparity layers. The proposed scheme identifies multiplicative layers of light field view subsets optimized using convolutional neural networks for different scanning orders. Our approach exploits the hidden low-rank structure in the multiplicative layers obtained from the subsets of different scanning patterns. The spatial redundancies in the multiplicative layers can be efficiently removed by performing low-rank approximation at different ranks on the Krylov subspace. The intra-view and inter-view redundancies between approximated layers are further removed by HEVC encoding. Next, a Fourier disparity layer representation is constructed from the first subset of the approximated light field based on the chosen hierarchical order. Subsequent view subsets are synthesized by modeling the Fourier disparity layers that iteratively refine the representation with improved accuracy. The critical advantage of the proposed hybrid layered representation and coding scheme is that it utilizes not just spatial and temporal redundancies in light fields, but also efficiently exploits intrinsic similarities among neighboring sub-aperture images in both horizontal and vertical directions as specified by different predication orders. In addition, the scheme is flexible to realize a range of multiple bitrates at the decoder within a single integrated system. Comparison with state-of-the-art light field coders exhibits superior compression performance of the proposed scheme for real-world light fields. We achieve substantial bitrate savings and also maintain good light field reconstruction quality.
  • Placeholder Image
    Publication
    MEStereo-Du2CNN: a dual-channel CNN for learning robust depth estimates from multi-exposure stereo images for HDR 3D applications
    (01-01-2023)
    Choudhary, Rohit
    ;
    ;
    Uma, T. V.
    ;
    Anil, Rithvik
    Display technologies have evolved over the years. It is critical to develop practical HDR capturing, processing, and display solutions to bring 3D technologies to the next level. Depth estimation of multi-exposure stereo image sequences is an essential task in the development of cost-effective 3D HDR video content. In this paper, we develop a deep architecture for multi-exposure stereo depth estimation. The proposed architecture has two novel components. First, the stereo matching technique used in traditional stereo depth estimation is revamped. For the stereo depth estimation component of our architecture, a mono-to-stereo transfer learning approach is deployed. The proposed formulation circumvents the cost volume construction requirement, which is replaced by a dual-encoder single-decoder CNN with different weights for feature fusion. EfficientNet-based blocks are used to learn the disparity. Secondly, we combine disparity maps obtained from the stereo images at different exposure levels using a robust disparity feature fusion approach. The disparity maps obtained at different exposures are merged using weight maps calculated for different quality measures. The final predicted disparity map obtained is more robust and retains best features that preserve the depth discontinuities. The proposed CNN offers flexibility to train using standard dynamic range stereo data or with multi-exposure low dynamic range stereo sequences. In terms of performance, the proposed model surpasses state-of-the-art monocular and stereo depth estimation methods, both quantitatively and qualitatively, on challenging Scene flow and differently exposed Middlebury stereo datasets. The architecture performs exceedingly well on complex natural scenes, demonstrating its usefulness for diverse 3D HDR applications.
  • Placeholder Image
    Publication
    A flexible coding scheme based on block krylov subspace approximation for light field displays with stacked multiplicative layers
    (01-07-2021)
    Ravishankar, Joshitha
    ;
    ;
    Gopalakrishnan, Pradeep
    To create a realistic 3D perception on glasses-free displays, it is critical to support continuous motion parallax, greater depths of field, and wider fields of view. A new type of Layered or Tensor light field 3D display has attracted greater attention these days. Using only a few light-attenuating pixelized layers (e.g., LCD panels), it supports many views from different viewing directions that can be displayed simultaneously with a high resolution. This paper presents a novel flexible scheme for efficient layer-based representation and lossy compression of light fields on layered displays. The proposed scheme learns stacked multiplicative layers optimized using a convolutional neural network (CNN). The intrinsic redundancy in light field data is efficiently removed by analyzing the hidden low-rank structure of multiplicative layers on a Krylov subspace. Factorization derived from Block Krylov singular value decomposition (BK-SVD) exploits the spatial correlation in layer patterns for multiplicative layers with varying low ranks. Further, encoding with HEVC eliminates inter-frame and intra-frame redundancies in the low-rank approximated representation of layers and improves the compression efficiency. The scheme is flexible to realize multiple bitrates at the decoder by adjusting the ranks of BK-SVD representation and HEVC quantization. Thus, it would complement the generality and flexibility of a data-driven CNN-based method for coding with multiple bitrates within a single training framework for practical display applications. Extensive experiments demonstrate that the proposed coding scheme achieves substantial bitrate savings compared with pseudo-sequence-based light field compression approaches and state-of-the-art JPEG and HEVC coders.
  • Placeholder Image
    Publication
    An integrated learning and approximation scheme for coding of static or dynamic light fields based on hybrid Tucker–Karhunen–Loève transform-singular value decomposition via tensor double sketching
    (01-08-2022)
    Ravishankar, Joshitha
    ;
    This study presents a scheme for efficient representation, coding and streaming of static or dynamic light fields using the authors’ novel hybrid Tucker-TensorSketch Karhunen–Loève transform-singular value decomposition via double sketching (HTTS-KLTSVD-DS) algorithm. A deep learning model is employed to obtain acquired images from the light fields by simulating coded aperture patterns. These acquired images can represent the entire light field and are low-rank approximated using HTTS-KLTSVD-DS. Incorporation of double sketching using TensorSketch allows the authors’ algorithm to work faster in a single pass itself and there is no need to store large Kronecker products of Tucker decomposition in the memory. This provides an efficient transmission and streaming adaptability of the light field, making it suitable for 3D display applications. Besides, compact representation of factor matrices by KLT-SVD in the authors’ proposed model acts as an optimal transform with good energy compaction property. Encoding of low-rank approximated acquired images using HEVC eliminates intra-frame, inter-frame and other intrinsic redundancies in the light field. The authors’ complete light field processing pipeline flexibly works for multiple bitrates and is adaptable for a variety of multi-view autostereoscopic platforms. Comparison with state-of-the-art codecs shows reasonable savings and PSNR gains for low and high bitrates, while maintaining good reconstruction quality.