INVESTIGATION OF SPEECH DISFLUENCIES CLASSIFICATION ON DIFFERENT THRESHOLD SELECTION TECHNIQUES USING ENERGY FEATURE EXTRACTION

INVESTIGATION OF SPEECH DISFLUENCIES CLASSIFICATION ON DIFFERENT THRESHOLD SELECTION TECHNIQUES USING ENERGY FEATURE EXTRACTION

R. Hamzah¹ and N.Jamil²
^1,2Faculty of Computer and Mathematical Sciences, UiTM Shah Alam, Selangor, Malaysia
¹This email address is being protected from spambots. You need JavaScript enabled to view it., ²This email address is being protected from spambots. You need JavaScript enabled to view it.

ABSTRACT

Filled pause and Elongation are the two types of speech disfluencies that need more suitable acoustical features to be classified correctly since they are always being misclassified. This work concentrates on developing an accurate and robust energy feature extraction for modelling filled pause and elongation by investigating different energy features using local maxima points of the speech energy. Method: In this paper, we extracted peak values from each frame of a voiced signal by implementing different thresholding techniques to classify filled pause and elongation. These energy features are evaluated by using statistical naïve Bayes classifier to see the contribution on the classification processes. Various samples of sustained syllables and filled pauses of spontaneous speech were extracted from Malaysian Parliamentary Debate Database of the year 2008. A naïve Bayes was used as a classifier. We performed F-measure evaluation to investigate the significant differences in mean of filled pause and elongation samples. Results: Results revealed that our proposed LM-E has increase the classification with up to 71% and 75% F-measure for elongation and filled pause. Conclusion: The best achieved accuracies in both filled pause and elongation classification were varied depending on the types of thresholding techniques applied during the local maxima of speech energy extraction. The most contributed thresholding technique is our proposed technique which is by using the adaptive height as the threshold that extracts the local maxima of the speech energy (LM-E).

Keywords: Filled Pause and Elongation, Naïve Bayes, Energy Feature Extraction, Automatic Speech Recognition.

Published On: 19 June 2019

Full Download