Effect of time-domain windowing on isolated speech recognition system performance

Ananthakrishna Thalengala; Anitha Hoblidar; Girisha S Tumkur

Effect of time-domain windowing on isolated speech recognition system performance

Authors

Ananthakrishna Thalengala Manipal Academy of Higher Education (MAHE) http://orcid.org/0000-0003-1916-493X
Anitha Hoblidar Manipal Academy of Higher Education (MAHE) http://orcid.org/0000-0001-5898-4442
Girisha S Tumkur Manipal Academy of Higher Education (MAHE) http://orcid.org/0000-0003-1563-0040

Abstract

Speech recognition system extract the textual data from the speech signal. The research in speech recognition domain is challenging due to the large variabilities involved with the speech signal. Variety of signal processing and machine learning techniques have been explored to achieve better recognition
accuracy. Speech is highly non-stationary in nature and therefore analysis is carried out by considering short time-domain window or frame. In the speech recognition task, cepstral (Mel frequency cepstral coefficients (MFCC)) features are commonly used and are extracted for short time-frame. The effectiveness of features depend upon duration of the time-window chosen. The present study is aimed at investigation of optimal time-window duration for extraction of cepstral features in the context of speech recognition task. A speaker independent speech recognition system for the Kannada language has been considered for the analysis. In the current work, speech utterances of Kannada news corpus
recorded from different speakers have been used to create speech database. The hidden Markov tool kit (HTK) has been used to implement the speech recognition system. The MFCC along with their first and second derivative coefficients are considered as feature vectors. Pronunciation dictionary required for the study
has been built manually for mono-phone system. Experiments have been carried out and results have been analyzed for different time-window lengths. The overlapping Hamming window has been considered in this study. The best average word recognition accuracy of 61.58% has been obtained for a window length of 110 msec duration. This recognition accuracy is comparable with the similar work found in literature. The experiments have shown that best word recognition performance can be achieved by tuning the window length to its optimum value.

Author Biographies

Ananthakrishna Thalengala, Manipal Academy of Higher Education (MAHE)

Department of Electronics and Communication Engineering,

Manipal Institute of Technology (MIT)

Anitha Hoblidar, Manipal Academy of Higher Education (MAHE)

Department of Electronics and Communication Engineering,

Manipal Institute of Technology (MIT)

Girisha S Tumkur, Manipal Academy of Higher Education (MAHE)

Department of Electronics and Communication Engineering,

Manipal Institute of Technology (MIT)

References

Bharali, S. S., & Kalita, S. K., ”A comparative study of different features for isolated spoken word recognition using HMM with reference to Assamese language”. International Journal of Speech Technology, 18(4), 673-684 (2015).

Kumar, K., Aggarwal, R. K., & Jain, A., ”A Hindi speech recognition system for connected words using HTK”, International Journal of Computational Systems Engineering, 1(1), 25-32 (2012).

Thangarajan, R., Natarajan, A. M., & Selvam, M., ”Syllable modeling in

continuous speech recognition for Tamil language”, International Journal

of Speech Technology, 12(1), 47-57 (2009).

Dua, M., Aggarwal, R. K., Kadyan, V., & Dua, S., ”Punjabi automatic

speech recognition using HTK”, IJCSI International Journal of Computer

Science Issues, 9(4), 1694-0814 (2012).

Hegde, S., Achary, K., & Shetty, S., ”Statistical analysis of features

and classification of alphasyllabary sounds in Kannada language”, International Journal of Speech Technology, 18(1), 65–75 (2015).

Panda, S. P., & Nayak, A. K., ”Automatic speech segmentation in

syllable centric speech recognition system”, International Journal of Speech Technology, 19(1), 9-18 (2016).

Thangarajan, R., Natarajan, A., & Selvam, M., ”Syllable modeling in

continuous speech recognition for Tamil language”, International Journal

of Speech Technology, 12(1), 47–57 (2009).

Manjunath, K. E., Jayagopi, D. B., Rao, K. S., & Ramasubramanian,

V. (2019), ”Development and analysis of multilingual phone recogni-

tion systems using Indian languages”, International Journal of Speech

Technology, 22(1), 157-168.

Kumar, C. S., & Mohandas, V. P. (2011), ”Robust features for multi-

lingual acoustic modeling”, International Journal of Speech Technology,

(3), 147-155.

Ananthakrishna, T., Maithri, M., & Shama, K., ”Kannada word recognition system using HTK”, In 2015 Annual India Conference, INDICON, New Delhi, India , pp. 1-5, (2015, December).

Thalengala, A., & Shama, K., ”Study of sub-word acoustical models

for Kannada isolated word recognition system”, International Journal of

Speech Technology, 19(4), 817-826, (2016).

Thalengala Ananthakrishna, Kumara Shama, and Maithri Mangalore,

”Performance Analysis of Isolated Speech Recognition System Using

Kannada Speech Database”, Pertanika Journal of Science & Technology

4 (2018).

Rabiner, L. R., Juang B. H., & Yegnanarayana B., ”Fundamentals of

speech recognition”, Englewood Cliffs: PTR Prentice Hall (2012).

Rabiner, L. R., ”A tutorial on hidden Markov models and selected

applications in speech recognition”, Proceedings of the IEEE, 77(2),

-286 (1989).

Deller J. R., Proakis J. G. & Hansen J. H. L., ”Discrete Time Processing of Speech Signals”, New York: Macmillan Publishing Company, (1993).

Shridhara, M. V., Banahatti, B. K., Narthan, L., Karjigi, V., & Kumaraswamy, R. (2013, November), ”Development of Kannada speech

corpus for prosodically guided phonetic search engine”, In 2013 international conference oriental COCOSDA held jointly with 2013

conference on Asian spoken language research and evaluation (O-COCOSDA/CASLRE) (pp. 1-6). IEEE.

Krishnamurti, B., ”The Dravidian Languages”, Cambridge: Cambridge

University Press, (2003).

Steever, S. B., ”The Dravidian languages. London: Routledge Publications, (2015).

Akhmetov, B., Tereykovsky, I., Doszhanova, A., & Tereykovskaya,

L. (2018), ”Determination of input parameters of the neural network

model, intended for phoneme recognition of a voice signal in the systems of distance learning”, International Journal of Electronics and

Telecommunications, 64(4), 425-432.

Kumar, R. S., & Lajish, V. L. (2013), ”Phoneme recognition using zero-crossing interval distribution of speech patterns and ANN”, International Journal of Speech Technology, 16(1), 125-131.

Young S., Evermann G, Gales M., Hain T., Kershaw D., Liu, ”The HTK

book (Vol. 2)” Cambridge: Entropic Cambridge Research Laboratory.

Davis, S., & Mermelstein, P., ”Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”, IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(4), 357-366 (1980).

Nilsson, M., ”First Order Hidden Markov Model: Theory and implementation issues”, Technical Report, 2005:02. Blekinge Institute of Technology.

OShaughnessy, D., ”Automatic speech recognition: History, methods and challenges”, Pattern Recognition, 41(10), 2965–2979 (2008).

Downloads

Published

2024-04-19

Issue

Vol. 68 No. 1 (2022)

Section

Digital Signal Processing

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

1. License

The non-commercial use of the article will be governed by the Creative Commons Attribution license as currently displayed on https://creativecommons.org/licenses/by/4.0/.

2. Author’s Warranties

The author warrants that the article is original, written by stated author/s, has not been published before, contains no unlawful statements, does not infringe the rights of others, is subject to copyright that is vested exclusively in the author and free of any third party rights, and that any necessary written permissions to quote from other sources have been obtained by the author/s. The undersigned also warrants that the manuscript (or its essential substance) has not been published other than as an abstract or doctorate thesis and has not been submitted for consideration elsewhere, for print, electronic or digital publication.

3. User Rights

Under the Creative Commons Attribution license, the author(s) and users are free to share (copy, distribute and transmit the contribution) under the following conditions: 1. they must attribute the contribution in the manner specified by the author or licensor, 2. they may alter, transform, or build upon this work, 3. they may use this contribution for commercial purposes.

4. Rights of Authors

Authors retain the following rights:

- copyright, and other proprietary rights relating to the article, such as patent rights,

- the right to use the substance of the article in own future works, including lectures and books,

- the right to reproduce the article for own purposes, provided the copies are not offered for sale,

- the right to self-archive the article

- the right to supervision over the integrity of the content of the work and its fair use.

5. Co-Authorship

If the article was prepared jointly with other authors, the signatory of this form warrants that he/she has been authorized by all co-authors to sign this agreement on their behalf, and agrees to inform his/her co-authors of the terms of this agreement.

6. Termination

This agreement can be terminated by the author or the Journal Owner upon two months’ notice where the other party has materially breached this agreement and failed to remedy such breach within a month of being given the terminating party’s notice requesting such breach to be remedied. No breach or violation of this agreement will cause this agreement or any license granted in it to terminate automatically or affect the definition of the Journal Owner. The author and the Journal Owner may agree to terminate this agreement at any time. This agreement or any license granted in it cannot be terminated otherwise than in accordance with this section 6. This License shall remain in effect throughout the term of copyright in the Work and may not be revoked without the express written consent of both parties.

7. Royalties

This agreement entitles the author to no royalties or other fees. To such extent as legally permissible, the author waives his or her right to collect royalties relative to the article in respect of any use of the article by the Journal Owner or its sublicensee.

8. Miscellaneous

The Journal Owner will publish the article (or have it published) in the Journal if the article’s editorial process is successfully completed and the Journal Owner or its sublicensee has become obligated to have the article published. Where such obligation depends on the payment of a fee, it shall not be deemed to exist until such time as that fee is paid. The Journal Owner may conform the article to a style of punctuation, spelling, capitalization and usage that it deems appropriate. The Journal Owner will be allowed to sublicense the rights that are licensed to it under this agreement. This agreement will be governed by the laws of Poland.

By signing this License, Author(s) warrant(s) that they have the full power to enter into this agreement. This License shall remain in effect throughout the term of copyright in the Work and may not be revoked without the express written consent of both parties.

Effect of time-domain windowing on isolated speech recognition system performance

Authors

Abstract

Author Biographies

Ananthakrishna Thalengala, Manipal Academy of Higher Education (MAHE)

Anitha Hoblidar, Manipal Academy of Higher Education (MAHE)

Girisha S Tumkur, Manipal Academy of Higher Education (MAHE)

References

Downloads

Published

Issue

Section

License

Information

Current Issue