IMPROVING STRUCTURED TEXT RECOGNITION WITH YOLO NEURAL NETWORK

Keywords: YOLO, OCR, text recognition, neural networks, image processing.

Abstract

This paper explores improving structured text recognition by integrating the YOLO (You Only Look Once) neural network and OCR (Optical Character Recognition) technology. This work aims to explore ways to enhance the efficiency of structured text recognition through the integration of the YOLO model with OCR technology, as well as to develop an automated information system for detecting and subsequently recognizing text objects, which improves the overall efficiency of structured text processing. The developed system is a web application that enables users to upload invoices, store their data, and receive insights into expenses and client interactions. It includes a YOLOv10 neural network trained on a dataset of 500 invoice images, a REST API for user interaction, a user interface for invoice uploading, and a MySQL database to store information about users and their invoices. The authors propose a multithreaded architecture model that utilizes recurrent and two- and three-dimensional convolutional neural networks. The software, which implements algorithms for optical flow calculation and frequency analysis of characters, is developed in Python using Ultralytics, Pytesseract, Python Image Library, and Flask libraries. HTML, CSS, and JavaScript are used for the user interface, and MySQL is the chosen database model. The system’s main feature is the integration of the YOLO and OCR models to ensure accurate and fast recognition of text objects in images. The system architecture follows the MVC (Model-View-Controller) pattern, where the model handles data and logic, the controller acts as an intermediary between the view and model, and the view displays data to users. Each component has its respective roles and functions, making the system well-structured and easy to modify. Additional service layers are used for business logic and routing, and Flask’s Blueprint tool organizes the application into smaller components and URL structures. Overall, the system is well-structured, ensuring efficient data handling and a userfriendly interface for seamless interaction. The analysis of text recognition results demonstrated high OCR accuracy, particularly with structured text, though some limitations were observed, such as disruptions in the original text structure. These drawbacks can be mitigated by combining the YOLO network with OCR technology. Integrating YOLO with OCR enhances the system’s text recognition efficiency, enabling more precise detection of text objects and their subsequent recognition. Despite these achievements, there is room for further improvement, specifically in refining object detection and text recognition algorithms to achieve even greater accuracy and processing speed.

References

1. Supriyanto S., Maisevli H., Maya S. R., Diena R. R. Multiscale Retinex Application to Analyze Face Recognition. Jurnal Online Informatika. 2020. Vol 5. No. 2. P. 217. DOI: http://dx.doi.org/10.15575/join.v5i2.668 (дата звернення: 01.11.2024)
2. Patel B., Pankaj K. M., Amit K. Lung Cancer Detection on CT Images by using Image Processing. International Journal of Trend in Scientific Research and Development. 2018. Vol. 2. Issue 3. P. 2525-2531. DOI: http://dx.doi.org/10.31142/ijtsrd11674. (дата звернення: 01.11.2024)
3. Bardhan Y., Tejas A. F., Prabhat R., Shekhar U., Bharate V.D. Emotion Recognition using Image Processing. International Journal of Trend in Scientific Research and Development. 2018. Vol. 2. Issue 3. P. 1523-1526. DOI: http://dx.doi.org/10.31142/ijtsrd10995 (дата звернення: 01.11.2024)
4. Jagan Mohan R. N. V., Vasamsetty C. S., Gupta V. M. N. S. S. V. K. R. Algorithms in Advanced Artificial Intelligence // Prakash I.V., Palanivelan M. A Study of YOLO (You Only Look Once) to YOLOv8. 2024. London. CRC Press. p. 257–266. DOI: https://doi.org/10.1201/9781003529231 (дата звернення: 01.11.2024)
5. Luo Z., Tian Y. Improved Infrared Road Object Detection Algorithm Based on Attention Mechanism in YOLOv8. IAENG International Journal of Computer Science. 2024. Vol. 51, p. 673 – 680. URL: https://www.iaeng.org/IJCS/issues_v51/issue_6/IJCS_51_6_12.pdf (дата звернення: 01.11.2024)
6. Legland D., Marie-Françoise D. ImageM: a user-friendly interface for the processing of multi-dimensional images with Matlab. 2021. F1000Research. p. 10-33. DOI: http://dx.doi.org/10.12688/f1000research.51732.1 (дата звернення: 01.11.2024)
7. Зінченко А. Ю. Проектування розподілених інформаційних систем на основі використання технології слабозв’язаних компонентів. Системи та технології. 2023. 63(1), с. 5-14. DOI: https://doi.org/10.32782/2521-6643-2022.1-63.1 (дата звернення: 01.11.2024)
Published
2024-12-17
How to Cite
Zinchenko, A. Y., & Khaidurov, V. V. (2024). IMPROVING STRUCTURED TEXT RECOGNITION WITH YOLO NEURAL NETWORK. Systems and Technologies, 68(2), 23-31. Retrieved from https://st.umsf.in.ua/index.php/journal/article/view/155
Section
COMPUTER SCIENCES