Span Classification with Structured Information for Disfluency Detection in Spoken Utterances

Ghosh, Sreyan; Kumar, Sonal; Singla, Yaman Kumar; Shah, Rajiv Ratn; Umesh Srinivasan

doi:10.21437/Interspeech.2022-11242

Span Classification with Structured Information for Disfluency Detection in Spoken Utterances

Date Issued

01-01-2022

Author(s)

Ghosh, Sreyan

Kumar, Sonal

Singla, Yaman Kumar

Shah, Rajiv Ratn

Umesh Srinivasan

Indian Institute of Technology, Madras

DOI

10.21437/Interspeech.2022-11242

Abstract

Existing approaches in disfluency detection focus on solving a token-level classification task for identifying and removing disfluencies in text. Moreover, most works focus on leveraging only contextual information captured by the linear sequences in text, thus ignoring the structured information in the text which is efficiently captured by dependency trees. In this paper, building on the span classification paradigm of entity recognition, we propose a novel architecture for detecting disfluencies in transcripts from spoken utterances, incorporating both contextual information through transformers and long-distance structured information captured by dependency trees, through graph convolutional networks (GCNs). Experimental results show that our proposed model achieves state-of-the-art results on the widely used English Switchboard dataset for disfluency detection and outperforms prior-art by a significant margin. We make all our codes publicly available on GitHub.

Volume

2022-September

Subjects

Options

Span Classification with Structured Information for Disfluency Detection in Spoken Utterances