PYTHON SPEECH RECOGNITION API COMPARISON

Keywords: speech recognition, API, Speech-to-Text, Speech Service, language models

Abstract

With the development of computer systems, it is becoming increasingly clear that the use of speech recognition systems will expand greatly if it becomes possible to use human speech when working directly with a computer, and in particular, it will become possible to control a machine with a normal voice in real time, as well as to input and output information in the form of normal human speech. The voice interface is an essential component when it comes to creating a comfortable living environment. Such systems are part of everyday life, and they can also be used in production as part of actuator control systems.When creating a voice command recognition system, the developer faces certain problems: the lack of a mathematical model of speech signal semantics; the fact that individual characteristics of the speaker: specific pronunciation, accents, accents, etc. are required to determine the semantics of the speech signal; working with spontaneous speech and the need to highlight the presence of a keyword; differences in the acoustic environment, noise, etc. Parameterization of the analog speech signal is the first step in the speech recognition process. Algorithms are designed to perform a parametric representation of the speech signal: parameters that describe the behavior of the human auditory system. Naturally, these algorithms are specifically designed to increase the performance of the speech recognition system. Preferred parameters that are lists of spectral energies of sound rather than details of a particular speaker’s voice This article compares the leading speech recognition APIs by examining their features, use cases, and performance metrics. The analysis aims to provide developers with a complete understanding of these technologies, emphasizing their advantages and limitations. Python was used to test these APIs with microphone input, offering insight into their latency, accuracy, and practical applications. This study serves as a guide to selecting the best API for specific project requirements, with a visual representation of the results for clarity.

References

1. О. І. Безверхий, Д. О. Александренко, В. Є. Луц. Проектування інформаційної системи з можливістю голосового управління, Системи та технології, No 2 (66), 2023, С. 13-20 DOI: https://doi.org/10.32782/2 521-6643-2023.2-66.2
2. Dong Yu,Li Deng. Automatic Speech Recognition: A Deep Learning Approach. L.: Springer-Verlag London, 2015. 320 p.
3. Automatic Speech recognition: short introduction. URL:https://www.esat.kuleuven.be/psi/spraak/demo/ Recog/asr_intro.html (дата звернення: 21.12.2024)
4. Al-Fraihat, Dimah & Sharrab, Yousef & Alzyoud, Faisal & Qahmash, Ayman & Maaita, Adi. Speech Recognition Utilizing Deep Learning: A Systematic Review of the Latest Developments. 2024.Human-centric Computing and Information Sciences. 15. 10.22967/HCIS.2024.14.015.
5. Introducing the Web Speech API. URL:https://www.sitepoint.com/introducing-web-speech-api/ (дата звернення: 27.12.2024)
6. Speech-to-Text AI: speech recognition and transcription. URL:https://cloud.google.com/speech-to-text (дата звернення: 20.12.2024)
7. IBM Watson. What’s Next in AI is foundation models at rock URL:https://research.ibm.com/artificial-intelligence (дата звернення: 21.12.2024)
8. AssemblyAi Documentation. URL:https://www.assemblyai.com/docs (дата звернення: 17.12.2024)
9. Azure AI Speech. URL:https://azure.microsoft.com/en-us/products/ai-services/ai-speech (дата звернення: 21.12.2024)
10. Speech To Text Amazon Transcribe: URL:https://aws.amazon.com/transcribe/?nc1=h_ls (дата звернення: 25.12.2024)
Published
2025-06-09
How to Cite
BezverkhyіО. І., & LutsV. Е. (2025). PYTHON SPEECH RECOGNITION API COMPARISON. Systems and Technologies, 69(1), 51-57. https://doi.org/10.32782/2521-6643-2025-1-69.6
Section
COMPUTER SCIENCES