Options
Domain Adaptation of Low-Resource Target-Domain Models Using Well-Trained ASR Conformer Models
Date Issued
01-01-2023
Author(s)
Sukhadia, Vrunda N.
Indian Institute of Technology, Madras
Abstract
In encoder-decoder framework for Automatic Speech Recognition (ASR) systems, the decoder of the well-trained ASR model is largely tuned towards the source-domain, hurting the performance of target-domain models in vanilla transfer-learning. On the other hand, the encoder layers of the well-trained ASR model mainly capture the acoustic characteristics. In this paper, the embeddings tapped from the encoder layers of a well-trained ASR model are used as features for domain adaptation of a downstream low resource Conformer target-domain model. We do ablation studies on optimal encoder layers for tapping embeddings and the effect of freezing or updating the well-trained ASR model's encoder layers. Lastly, the application of Spectral Augmentation (SpecAug) on the proposed features improves the target-domain performance further. The proposed method reports an average relative improvement of 40% over baseline with different source-domain model and target-domain Conformer model combinations.