МЕТОД ВИЯВЛЕННЯ ПЕРСПЕКТИВНИХ НАУКОВИХ ТЕМ НА ОСНОВІ ГІБРИДНИХ TRANSFORMER-АРХІТЕКТУР ЗА ДАНИМИ НАУКОМЕТРИЧНИХ БАЗ

O. V. Moiseienko

doi:10.32782/2521-6643-2026-2-72.19

O. V. Moiseienko Ivano-Frankivsk National Technical University of Oil and Gas https://orcid.org/0000-0002-7995-2949

DOI: https://doi.org/10.32782/2521-6643-2026-2-72.19

Keywords: Artificial Intelligence, Deep Learning, Transformer, Cybersecurity, Scopus, Topic Detection, Natural Language Processing, Hybrid Models, Trend Forecasting

Abstract

The article addresses the critical challenge of automating the identification of promising research topics within largescale scientometric databases. The exponential growth of scientific publications creates significant information overload, complicating the discovery of latent trends. To solve this, the study proposes a universal method based on a hybrid deep learning architecture that integrates the strengths of Convolutional Neural Networks (CNN) and Transformer architectures. The core of the proposed architecture utilizes a CNN layer as a local feature extractor to identify stable terminological patterns (n-grams) within article abstracts. This is followed by a cascade of three Transformer layers with a MultiHeadAttention mechanism, which models global semantic dependencies, ensuring high relevance in specialized domains. To handle the significant class imbalance typical of scientometric data (where «High Impact» topics account for approximately 28 %), the Focal Loss function was implemented. This allows the model to concentrate on «hard» examples, significantly improving the detection of emerging breakthrough directions. The method was validated using a Scopus metadata dataset from the IoT Security domain, comprising 4833 publications from 2020–2025. Experimental results show that the model achieves a Recall of 0.80 and an F1-score of 0.61, outperforming previous LSTM-based approaches. Such a balance of metrics is methodologically justified for research discovery tasks, where minimizing the risk of missing a «breakthrough» topic (Type II error) is the primary priority. Furthermore, the approach includes a trend forecasting module based on linear regression of citation time series, enabling the prediction of a topic’s future popularity. The developed solution is universal and can be adapted to any scientific or applied field to accelerate research discovery and support decision-making in grant allocation.

References

1. Term Weighting for Information / J. Ropero et al. Fuzzy Logic – Algorithms, Techniques and Implementations. 2012. DOI: https://doi.org/10.5772/37837.
2. Efficient estimation of word representations in vector space / Mikolov T., Chen K., Corrado G., Dean J. 2013. arXiv:1301.3781 [cs.CL]. URL: https://arxiv.org/abs/1301.3781 (дата звернення: 14.01.2026).
3. Devlin J. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018. 16 с. (Препринт. 10.48550/arXiv.1810.04805). URL: https://arxiv.org/pdf/1810.04805 (дата звернення: 14.01.2026).
4. Kim Y. Convolutional Neural Networks for Sentence Classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Doha, Qatar, 25–29 Oct. 2014). Doha: Association for Computational Linguistics, 2014. P. 1746–1751.
5. Research on a hybrid LSTM-CNN-Attention model for textbased web content classification. Radio Electronics, Computer Science, Control. 2025. № 4. URL: https://ric.zp.edu.ua/article/view/346199. (дата звернення: 14.01.2026)
6. A Fine-Tuned BERT-Based Transfer Learning Approach for Text Classification / R. Qasim et al. Journal of Healthcare Engineering. 2022. Vol. 2022. P. 1–17. DOI: https://doi.org/10.1155/2022/3498123
7. Simha A. Understanding TF-IDF for machine learning. Capital One Tech. URL: https://medium.com/capitalone-tech/understanding-tf-idf-for-machine-learning-capital-one-dea9ab4a586d (дата звернення: 14.01.2026).
8. Feldges C. LSTM, BERT: a comparison of performance. https://medium.com/. URL: https://medium. com/@claude.feldges/text-classification-with-tf-idf-lstm-bert-a-quantitative-comparison-b8409b556cb3 (дата звернення: 14.01.2026).
9. Evaluating text classification: A benchmark study / M. Reusens et al. Expert Systems with Applications. 2024. P. 124302. DOI: https://doi.org/10.1016/j.eswa.2024.124302.
10. Generative AI and the future of scientometrics: current topics and future questions / Eger S., Bornmann L., van Eck N. J. 2025. arXiv:2507.00783 [cs.DL]. URL: https://arxiv.org/pdf/2507.00783 (дата звернення: 14.01.2026).

METHOD FOR IDENTIFYING PROMISING RESEARCH TOPICS BASED ON HYBRID TRANSFORMER ARCHITECTURES USING SCIENTOMETRIC DATABASES

Abstract

References