Improving the Performance of Transformer Based Low Resource Speech Recognition for Indian Languages

Shetty, Vishwas M.; Sagaya Mary N J, Metilda; Umesh, S.

doi:10.1109/ICASSP40776.2020.9053808

Improving the Performance of Transformer Based Low Resource Speech Recognition for Indian Languages

Date Issued

01-05-2020

Author(s)

Shetty, Vishwas M.

Sagaya Mary N J, Metilda

Umesh, S.

Indian Institute of Technology, Madras

DOI

10.1109/ICASSP40776.2020.9053808

Abstract

The recent success of the Transformer based sequence-to-sequence framework for various Natural Language Processing tasks has motivated its application to Automatic Speech Recognition. In this work, we explore the application of Transformers on low resource Indian languages in a multilingual framework. We explore various methods to incorporate language information into a multilingual Transformer, i.e., (i) at the decoder, (ii) at the encoder. These methods include using language identity tokens or providing language information to the acoustic vectors. Language information to the acoustic vectors can be given in the form of one hot vector or by learning a language embedding. From our experiments, we observed that providing language identity always improved performance. The language embedding learned from our proposed approach, when added to the acoustic feature vector, gave the best result. The proposed approach with retraining gave 6% - 11% relative improvements in character error rates over the monolingual baseline.

Volume

2020-May

Subjects

Options

Improving the Performance of Transformer Based Low Resource Speech Recognition for Indian Languages