Now showing 1 - 10 of 71
  • Placeholder Image
    Publication
    Poster: Towards identifying early indicators of a malware infection
    (02-07-2019)
    Sareena, K. P.
    ;
    ;
    Parekh, Unnati
    ;
    A malware goes through multiple stages in its life-cycle at the target machine before mounting its expected attack. The entire life-cycle can span anywhere from a few weeks to several months. The network communications during the initial phase could be the earliest indicators of a malware infection. While prior works have leveraged network traffic, none have focused on the temporal analysis of how early can the malware be detected. The main challenges here are the difficulty in differentiating benign-looking malware communications in the early stages of the malware life-cycle. In our quest to build an early warning system, we analyze malware communications to identify such early indicators.
  • Placeholder Image
    Publication
    Depending on HTTP/2 for Privacy? Good Luck!
    (01-06-2020)
    Mitra, Gargi
    ;
    Vairam, Prasanna Karthik
    ;
    Patanjali, S. L.P.S.K.
    ;
    ;
    HTTP/2 introduced multi-threaded server operation for performance improvement over HTTP/1.1. Recent works have discovered that multi-threaded operation results in multiplexed object transmission, that can also have an unanticipated positive effect on TLS/SSL privacy. In fact, these works go on to design privacy schemes that rely heavily on multiplexing to obfuscate the sizes of the objects based on which the attackers inferred sensitive information. Orthogonal to these works, we examine if the privacy offered by such schemes work in practice. In this work, we show that it is possible for a network adversary with modest capabilities to completely break the privacy offered by the schemes that leverage HTTP/2 multiplexing. Our adversary works based on the following intuition: restricting only one HTTP/2 object to be in the server queue at any point of time will eliminate multiplexing of that object and any privacy benefit thereof. In our scheme, we begin by studying if (1) packet delays, (2) network jitter, (3) bandwidth limitation, and (4) targeted packet drops have an impact on the number of HTTP/2 objects processed by the server at an instant of time. Based on these insights, we design our adversary that forces the server to serialize object transmissions, thereby completing the attack. Our adversary was able to break the privacy of a real-world HTTP/2 website 90% of the time, the code for which will be released. To the best of our knowledge, this is the first privacy attack on HTTP/2.
  • Placeholder Image
    Publication
    Shakti-MS: A RISC-V processor for memory safety in C
    (23-06-2019)
    Das, Sourav
    ;
    Harikrishnan Unnithan, R.
    ;
    Menon, Arjun
    ;
    ;
    In this era of IoT devices, security is very often traded off for smaller device footprint and low power consumption. Considering the exponentially growing security threats of IoT and cyber-physical systems, it is important that these devices have built-in features that enhance security. In this paper, we present Shakti-MS, a lightweight RISC-V processor with built-in support for both temporal and spatial memory protection. At run time, Shakti-MS can detect and stymie memory misuse in C and C++ programs, with minimum runtime overheads. The solution uses a novel implementation of fat-pointers to efficiently detect misuse of pointers at runtime. Our proposal is to use stack-based cookies for crafting fat-pointers instead of having object-based identifiers. We store the fat-pointer on the stack, which eliminates the use of shadow memory space, or any table to store the pointer metadata. This reduces the storage overheads by a great extent. The cookie also helps to preserve control flow of the program by ensuring that the return address never gets modified by vulnerabilities like buffer overflows. Shakti-MS introduces new instructions in the microprocessor hardware, and also a modified compiler that automatically inserts these new instructions to enable memory protection. This co-design approach is intended to reduce runtime and area overheads, and also provides an end-to-end solution. The hardware has an area overhead of 700 LUTs on a Xilinx Virtex Ultrascale FPGA and 4100 cells on an open 55nm technology node. The clock frequency of the processor is not affected by the security extensions, while there is a marginal increase in the code size by 11% with an average runtime overhead of 13%.
  • Placeholder Image
    Publication
    Sparsity-Aware Caches to Accelerate Deep Neural Networks
    (01-03-2020)
    Ganesan, Vinod
    ;
    Sen, Sanchari
    ;
    Kumar, Pratyush
    ;
    Gala, Neel
    ;
    ;
    Raghunathan, Anand
    Deep Neural Networks (DNNs) have transformed the field of artificial intelligence and represent the state-of-the-art in many machine learning tasks. There is considerable interest in using DNNs to realize edge intelligence in highly resource-constrained devices such as wearables and IoT sensors. Unfortunately, the high computational requirements of DNNs pose a serious challenge to their deployment in these systems. Moreover, due to tight cost (and hence, area) constraints, these devices are often unable to accommodate hardware accelerators, requiring DNNs to execute on the General Purpose Processor (GPP) cores that they contain. We address this challenge through lightweight micro-architectural extensions to the memory hierarchy of GPPs that exploit a key attribute of DNNs, viz. sparsity, or the prevalence of zero values. We propose SparseCache, an enhanced cache architecture that utilizes a null cache based on a Ternary Content Addressable Memory (TCAM) to compactly store zero-valued cache lines, while storing non-zero lines in a conventional data cache. By storing address rather than values for zero-valued cache lines, SparseCache increases the effective cache capacity, thereby reducing the overall miss rate and execution time. SparseCache utilizes a Zero Detector and Approximator (ZDA) and Address Merger (AM) to perform reads and writes to the null cache. We evaluate SparseCache on four state-of-the-art DNNs programmed with the Caffe framework. SparseCache achieves 5-28% reduction in miss-rate, which translates to 5-21% reduction in execution time, with only 0.1% area and 3.8% power overhead in comparison to a low-end Intel Atom Z-series processor.
  • Placeholder Image
    Publication
    The implications of shared data synchronization techniques on multi-core energy efficiency
    (01-01-2012)
    Gautham, Ashok
    ;
    Korgaonkar, Kunal
    ;
    Slpsk, Patanjali
    ;
    ;
    Shared data synchronization is at the heart of the multicore revolution since it is essential for writing concurrent programs. Ideally, a synchronization technique should be able to fully exploit the available cores, leading to improved performance. However, with the growing demand for energy-efficient systems, it also needs to work within the energy and power budget of the system. In this paper, we perform a detailed study of the performance as well as energy efficiency of popular shared-data synchronization techniques on a commodity multicore processor. We show that Software Transactional Memory (STM) systems can perform better than locks for workloads where a significant portion of the running time is spent in the critical sections. We also show how power-conserving techniques available on modern processors like C-states and clock frequency scaling impact energy consumption and performance. Finally, we compare the performance of STMs and locks under similar power budgets.
  • Placeholder Image
    Publication
    A universal random test generator for functional verification of microprocessors and system-on-chip
    (01-12-2005)
    Uday Bhaskar, K.
    ;
    Prasanth, M.
    ;
    Chandramouli, G.
    ;
    This paper presents a Universal Random Test Generator template for the Design Verification of Microprocessors and System-on-Chips(SOCs). The tool enables verification of the product in one continuous, integrated environment, from C model to behavioral RTL and gate to system-level integration, all in one self-contained chassis. Due to complexity of large designs, it has been a common practice to rely on the power of Randomization, to bless us with the humanly not-conceivable corner cases, that can arise in reality. There are lot of common features shared by random tools used for testing products with diverse functionalities. This paper proposes a template which captures the commonalities among the different random testing tools and enable the user to quickly design a random test generator by adding product-specific details and using most of the methods available in the template. This leads to high degree of code reuse, less debugging of the random tool and huge reduction in design-cycle time. In addition the template provides enough flexibility and interfaces to enable the execution of the generated tests on targets which may be a C model, RTL or the final chip. By this, one may test a software component, say a bootup code for the System-on-Chip or Microprocessor at all stages of its design, namely, the software prototype, the RTL at the pre-silicon level and finally the chip, at a post-silicon level. This satisfies the expectations out of a verification platform for a Hardware-Software Codesign environment. The Random test Generator template was employed for testing a x86-compatible Microprocessor both at RTL and post-silicon stage and a software model of a 802.11 MAC. The results are presented in the paper. © 2005 IEEE.
  • Placeholder Image
    Publication
    Controllability-driven power virus generation for digital circuits
    (01-12-2007)
    Najeeb, K.
    ;
    Gururaj, Karthik
    ;
    ;
    Vedula, Vivekanand M.
    The problem of peak power estimation in CMOS circuits is essential for analyzing the reliability and performance of circuits at extreme conditions. The Power Virus problem involves finding input vectors that cause maximum dynamic power dissipation (maximum toggles) in circuits. In this paper, an approach for power virus generation for both combinational and sequential circuits is presented. The basic intuition behind the approach is to use the 0- and 1- controllability measures of the gate outputs in the circuit to guide the D-Algorithm. The proposed technique was employed on the ISCAS'85 and ISCAS'89 circuits. The results of the above show a significant increase in power dissipation when compared to the best known existing techniques reported in the literature. © 2007 IEEE.
  • Placeholder Image
    Publication
    SER mitigation technique through selective flip-flop replacement
    (21-09-2015)
    Torvi, Pavan Vithal
    ;
    Devanathan, V. R.
    ;
    Vanjari, Ashish
    ;
    The advancement in the semiconductor manufacturing process has reduced the device dimensions, which in turn has reduced design and manufacturing costs of the Integrated Chips (IC). This has accelerated the IC penetration in automobiles, health care and safety critical systems. However, the smaller device dimensions have made the ICs vulnerable to soft-errors. The sequential cells in a given design contribute significantly to its soft-error rate (SER). Some of the soft-errors get masked and do not cause any adverse impact. The masking can occur due to logic or timing reasons. This paper presents a flow that uses the Timing Vulnerability Factor (TVF) and Architecture Vulnerability Factor (AVF) of the sequential instances in a given design to reduce its soft-error rate (SER). The paper proposes a novel method to efficiently compute the TVF and AVF parameters followed by a linear programming technique that uses these parameters to reduce the SER of the given design. Using the proposed technique, we have reduced the sequential cell contribution to the SER of an in-house IP design by 36% for an increase of 9% in sequential cells area.
  • Placeholder Image
    Publication
    Impact of temperature on test quality
    (31-03-2010)
    Jagan, Lavanya
    ;
    Hora, Camelia
    ;
    Kruseman, Bram
    ;
    Eichenberger, Stefan
    ;
    Majhi, Ananta K.
    ;
    The usage of more advanced, less mature processes during manufacturing of semiconductor devices has increased the need for performing unconventional types of testing, like temperature-testing, in order to maintain the same high quality levels. However, performing temperature-testing is costly. This paper proposes a viable low-cost alternative to temperature testing that quantifies the impact of temperature variations on the test quality and also determines optimal test conditions. The test flow proposed is empirically validated on an industrial-standard die. The results obtained show that majority of the defects that were originally detected by temperature-testing are also detected by the proposed test flow, thereby reducing the dependence on temperature testing to achieve zero-defect quality. Details of an interesting defect behavior at cold test conditions is also presented. © 2010 IEEE.
  • Placeholder Image
    Publication
    Placement and routing for 3D-FPGAs using reinforcement learning and support vector machines
    (01-12-2005)
    Manimegalai, R.
    ;
    Soumya, E. Siva
    ;
    Muralidharan, V.
    ;
    Ravindran, B.
    ;
    ;
    Bhatia, D.
    The primary advantage of using 3D-FPGA over 2D-FPGA is that the vertical stacking of active layers reduce the Manhattan distance between the components in 3D-FPGA than when placed on 2D-FPGA. This results in a considerable reduction in total interconnect length. Reduced wire length eventually leads to reduction in delay and hence improved performance and speed. Design of an efficient placement and routing algorithm for 3D-FPGA that fully exploits the above mentioned advantage is a problem of deep research and commercial interest. In this paper, an efficient placement and routing algorithm is proposed for 3D-FPGAs which yields better results in terms of total interconnect length and channel-width. The proposed algorithm employs two important techniques, namely, Reinforcement Learning (RL) and Support Vector Machines (SVMs), to perform the placement. The proposed algorithm is implemented and tested on standard benchmark circuits and the results obtained are encouraging. This is one of the very few instances where reinforcement learning is used for solving a problem in the area of VLSI. © 2005 IEEE.