Repository logo
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
Repository logo
  • Communities & Collections
  • Research Outputs
  • Fundings & Projects
  • People
  • Statistics
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Yкраї́нська
  • Log In
    or
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Indian Institute of Technology Madras
  3. Publication2
  4. Efficient knowledge distillation of teacher model to multiple student models
 
  • Details
Options

Efficient knowledge distillation of teacher model to multiple student models

Date Issued
27-07-2021
Author(s)
Gl, Thrivikram
Ganesh, Vidya
Sethuraman, T. V.
Perepu, Satheesh K.
DOI
10.1109/IAICT52856.2021.9532543
Abstract
Deep learning models are proven to deliver satisfactory results on training a complex non-linear relationship between the set of input features and different task outputs. However, they are memory intensive and require good computational power for both training as well as inferencing. In literature one can find different model compression techniques which enables easy deployment on edge devices. Knowledge distillation is one such approach where the knowledge of complex teacher model is transferred to a lower parameter student model. However, the limitation is that the architecture of the student model should be comparable to the complex teacher model for better knowledge transfer. Due to this limitation, we cannot deploy this student model that learns from a complex and huge teacher on edge devices. In this work, we propose to use a combined student approach wherein different student models learn from a common teacher model. Further, we propose a unique loss function which will train multiple student models simultaneously. An advantage of this approach is that these student models can be as simple as possible when compared with traditional single student model and also the complex teacher model. Finally, we provide an extensive evaluation to prove that our approach can improve the overall accuracy significantly and allow a further compression by 10% when compared with generic model.
Subjects
  • Knowledge distillatio...

  • Model compression

  • Multiple student mode...

Indian Institute of Technology Madras Knowledge Repository developed and maintained by the Library

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback