Comparative Analysis of Natural and Synthesized Polish Speech


  • Michał Daniluk Warsaw University of Technology, Institute of Radiocommunication and Multimedia Technology
  • Agnieszka Paula Pietrzak Warsaw University of Technology, Institute of Radiocommunication and Multimedia Technology


In the evolving field of speech synthesis, not only intelligibility, but also naturalness remains an important factor. This paper presents a comparative analysis of natural versus synthesized Polish speech. Speech synthesizers: Ivona, Mekatron, Notevibes, and ttsmp3 were explored. Four methods for assessing synthesized speech quality and comparing it to natural speech were presented: the AB test, MOS, logatom articulation test, and MUSHRA. Sentence databases and a database of logatoms were generated for each synthesizer and recorded for natural speech. Results indicated natural speech was consistently better than synthesized speech. Among the synthesizers, Notevibes performed best in all comparisons, while Mekatron ranked lowest.


D. H. Klatt, "Review of text-to-speech conversion for English," J. Acoust. Soc. Am., vol. 82, no. 3, pp. 737-793, 1987.

Y. Ning, S. He, Z. Wu, C. Xing, and L. J. Zhang, "A review of deep learning based speech synthesis," Applied Sciences, vol. 9, no. 19, p. 4050, 2019

T. Dutoit, "High-quality text-to-speech synthesis: An overview," J. Electrical & Electronics Engg., vol. 17, no. 1-2, pp. 25-33, 1997.

N. Kaur and P. Singh, "Conventional and contemporary approaches used in text to speech synthesis: A review," Artificial Intelligence Review, vol. 56, no. 7, pp. 5837-5880, 2023.

ITU-R BS.1116-3, "Methods for the subjective assessment of small impairments in audio systems including multichannel sound systems," International Telecommunication Union, 2015.

J. L. Flanagan, "Speech analysis, synthesis, and perception," Springer, 1972.

M. Kaszczuk and L. Osowski, "Evaluating IVONA Speech Synthesis System for Blizzard Challenge 2006," Blizzard Workshop, Pittsburgh, PA, 2006.

H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis," Speech Communication, 2009.

Tacotron2, “Tacotron 2 synthesis”, Google Colab, 2023. [Online]. Available:

A. van den Oord et al., "WaveNet: A generative model for raw audio," arXiv preprint arXiv:1609.03499, 2016.

Notevibes, "Polish text-to-speech," Notevibes, 2023. [Online]. Available:

TTSMP3, "ttsmp3 API Documentation," TTSMP3, 2023. [Online]. Available:

E. Ozimek, A. Warzybok, and D. Kutzner, "Polish sentence matrix test for speech intelligibility measurement in noise," International Journal of Audiology, vol. 49, no. 6, pp. 444-454, 2010.

J. Rafałko, "Algorytmy automatyzacji tworzenia baz jednostek akustycznych w syntezie mowy polskiej," Institute of Systems Research of the Polish Academy of Sciences, 2014.

International Telecommunication Union, "Recommendation I.T.U.T. P. 800: Methods for subjective determination of transmission quality," Geneva, 1996.

International Telecommunications Union, "Recommendation I.T.U.R. 1534-1: Method for the Subjective Assessment of Intermediate Sound Quality (MUSHRA)," Geneva, Switzerland, 2001.

International Telecommunication Union Radiocommunication Assembly, "Method for the subjective assessment of intermediate quality level of audio systems," Series B, 2014.

W. Bartosik, "Projekt i realizacja aplikacji webowej do tworzenia i przeprowadzania testów słuchowych MUSHRA," Institute of Radioelectronics and Multimedia Technology, Warsaw University of Technology, 2020.

S. Brachmański, "Test material used to assess speech quality in Poland," in Acoustics, Acoustoelectronics and Electrical Engineering, F. Witos, Ed., Gliwice, 2021, pp. 65-79.

S. Brachmański, "Selected Issues of Speech Signal Transmission Quality Assessment [Wybrane zagadnienia oceny jakości transmisji sygnału mowy]," Wrocław, Poland: Oficyna Wydawnicza Politechniki Wrocławskiej, 2015.

Additional Files