Options
Madhu Mutyam
Loading...
Preferred name
Madhu Mutyam
Official Name
Madhu Mutyam
Alternative Name
Mutyam, Madhu
Main Affiliation
Email
ORCID
Scopus Author ID
Researcher ID
Google Scholar ID
4 results
Now showing 1 - 4 of 4
- PublicationFormal modeling and verification of controllers for a family of DRAM caches(01-11-2018)
;Sahoo, Debiprasanna ;Sha, Swaraj ;Satpathy, Manoranjan; ;Ramesh, S.Roop, ParthaDie-stacking technology enables the use of a high density DRAM as a cache. Major processor vendors have recently started using these stacked DRAM modules as the last level cache of their products. These stacked DRAM modules provide high bandwidth with relatively low latency compared to the off-package DRAM modules. Recent studies on DRAM caches propose several variants to optimize performance and power of the systems. However, none of the existing works discuss its design and verification aspect. DRAM cache controller (DCC) design is significantly complex in comparison to a conventional DRAM-based main memory controller. This is because it involves controlling both the timing aspect of DRAM system as well as the functional aspect of cache. Therefore, without rigorous modeling and verification of such designs, it would be difficult to ensure correctness. In the current research, we focus on the design and verification issues of DCC. We select a common variant of DRAM cache and build a formal model of its controller in terms of interacting state machines; we term the common variant as the baseline and its model as the base model. We then verify safety, liveness, and timing properties of this variant using model checking. Next, we demonstrate how the formal models and the associated properties of other variants of DCCs can be derived from the base model in a systematic way. Analyzing the individual DRAM cache variations, we observe that most of the variants exhibit product-line characteristics. - PublicationReDRAM: A Reconfigurable DRAM Cache for GPGPUs(01-07-2018)
;Sahoo, Debiprasanna ;Sha, Swaraj ;Satpathy, ManoranjanHardware-based DRAM cache techniques for GPGPUs propose to use GPU DRAM as a cache of the host (system) memory. However, these approaches do not exploit the opportunity of allocating store-before-load data (data that is written before being read by GPU cores) on GPU DRAM that would save multiple CPU-GPU transactions. In this context, we propose ReDRAM, a novel memory allocation strategy for GPGPUs which re-configures GPU DRAM cache as a heterogeneous unit. It allows allocation of store-before-load data directly onto GPU DRAM and also utilizes it as a cache of the host memory. Our simulation results using a modified version of GPGPU-Sim show that ReDRAM can improve performance for applications that use store-before-load data by 57.6 percent (avg.) and 4.85x (max.) when compared to the existing proposals on state-of-The-Art GPU DRAM caches. - PublicationFormal modeling and verification of a victim DRAM cache(01-02-2019)
;Sahoo, Debiprasanna ;Sha, Swaraj ;Satpathy, Manoranjan; ;Ramesh, S.Roop, ParthaThe emerging Die-stacking technology enables DRAM to be used as a cache to break the “Memory Wall” problem. Recent studies have proposed to use DRAM as a victim cache in both CPU and GPU memory hierarchies to improve performance. DRAM caches are large in size and, hence, when realized as a victim cache, non-inclusive design is preferred. This non-inclusive design adds significant differences to the conventional DRAM cache design in terms of its probe, fill, and writeback policies. Design and verification of a victim DRAM cache can be much more complex than that of a conventional DRAM cache. Hence, without rigorous modeling and formal verification, ensuring the correctness of such a system can be difficult. The major focus of this work is to show how formal modeling is applied to design and verify a victim DRAM cache. In this approach, we identify the agents in the victim DRAM cache design and model them in terms of interacting state machines. We derive a set of properties from the specifications of a victim cache and encode them using Linear Temporal Logic. The properties are then proven using symbolic and bounded model checking. Finally, we discuss how these properties are related to the dataflow paths in a victim DRAM cache. - PublicationOptimization of Intercache Traffic Entanglement in Tagless Caches with Tiling Opportunities(01-11-2020)
;Swamy Saranam Chongala, S. R. ;George, Sumitha ;Govindarajan, Hariram Thirucherai ;Kotra, Jagadish; ;Sampson, John ;Kandemir, Mahmut T.Narayanan, VijaykrishnanSo-called 'tagless' caches have become common as a means to deal with the vast L4 last-level caches (LLCs) enabled by increasing device density, emerging memory technologies, and advanced integration capabilities (e.g., 3-D). Tagless schemes often result in intercache entanglement between tagless cache (L4) and the cache (L3) stewarding its metadata. We explore new cache organization policies that mitigate overheads stemming from the intercache-level replacement entanglement. We incorporate support for explicit tiling shapes that can better match software access patterns to improve the spatial and temporal locality of large block allocations in many essential computational kernels. To address entanglement overheads and pathologies, we propose new replacement policies and energy-friendly mechanisms for tagless LLCs, such as restricted block caching (RBC) and victim tag buffer caching (VBC) to incorporate L4 eviction costs into L3 replacement decisions efficiently. We evaluate our schemes on a range of linear algebra kernels that are software tiled. RBC and VBC demonstrate a reduction in memory traffic of 83/4.4/67% and 69/35.5/76% for 8/32/64 MB L4s, respectively. Besides, RBC and VBC provide speedups of 16/0.3/0.6% and 15.7/1.8/0.8%, respectively, for systems with 8/32/64 MB L4, over a tagless cache with an LRU policy in the L3. We also show that matching the shape of the hardware allocation for each tagless region superblocks to the access order of the software tile improves latency by 13.4% over the baseline tagless cache with reductions in memory traffic of 51% over linear superblocks.