Options
Ttlg-an efficient tensor transposition library for GPUs
Date Issued
03-08-2018
Author(s)
Vedurada, Jyothi
Suresh, Arjun
Rajam, Aravind Sukumaran
Kim, Jinsung
Hong, Changwan
Panyala, Ajay
Krishnamoorthy, Sriram
Indian Institute of Technology, Madras
Srivastava, Rohit Kumar
Sadayappan, P.
Abstract
This paper presents a Tensor Transposition Library for GPUs (TTLG). A distinguishing feature of TTLG is that it also includes a performance prediction model, which can be used by higher level optimizers that use tensor transposition. For example, tensor contractions are often implemented by using the TTGT (Transpose-Transpose-GEMM-Transpose) approach-transpose input tensors to a suitable layout and then use high-performance matrix multiplication followed by transposition of the result. The performance model is also used internally by TTLG for choosing among alternative kernels and/or slicing/blocking parameters for the transposition. TTLG is compared with current state-of-The-Art alternatives for GPUs. Comparable or better transposition times for the 'repeated-use' scenario and considerably better 'single-use' performance are observed.