СТОХАСТИЧНІ ОПЕРАТОРИ ВПЛИВУ ДІЙ: ФОРМАЛІЗАЦІЯ ТА БАГАТОКРОКОВЕ РЕГУЛЯРИЗОВАНЕ НАВЧАННЯ

B. Yu. Zaika; S. V. Yershov

B. Yu. Zaika V. M. Glushkov Institute of Cybernetics of the NAS of Ukraine https://orcid.org/0009-0001-9567-8361
S. V. Yershov V. M. Glushkov Institute of Cybernetics of the NAS of Ukraine https://orcid.org/0000-0002-9895-777X

Keywords: stochastic dynamical systems; action impact operator; recursive composition; multi-step regularisation; uncertainty calibration; long-horizon stability; machine learning

Abstract

This paper proposes a formalisation of action impact in stochastic dynamical systems as a dedicated stochastic operator acting on system states. Accurate modelling of action impact is an important problem in sequential decision-making under uncertainty, since in many real-world systems actions are applied repeatedly and their consequences propagate through system dynamics over time. While modern machine learning approaches, including reinforcement learning and conditional density estimation, can approximate short-term transitions, the behaviour of learned models under recursive multi-step application remains insufficiently studied. In most existing frameworks, transition dynamics are embedded within policy optimisation or trajectory prediction objectives and are rarely treated as independent modelling entities. In the proposed approach, the action impact operator maps the current system state and applied action to a conditional distribution of future states and is defined with explicit compositional structure. This enables the analysis of recursive operator application across multiple time steps. A learning objective is introduced that combines one-step negative log-likelihood with a multi-step consistency term derived from operator composition. The central hypothesis of the study is that one-step maximum likelihood training does not guarantee stable long-horizon behaviour when the learned operator is recursively applied. To investigate this hypothesis, empirical evaluation is conducted in a fully observable stochastic dynamical system using a minimal realisable linear Gaussian model. The empirical results show that purely one-step training leads to long-horizon degradation, including accumulation of trajectory error and systematic underestimation of predictive uncertainty. Introducing explicit multi-step regularisation significantly improves long-horizon stability and uncertainty calibration, and the improvement persists beyond the training horizon. The proposed formulation establishes a basis for modelling action impact in stochastic dynamical systems and provides a machine-learning framework for robust modelling of recursively applied transitions. This provides a foundation for further research in partially observable environments, nonlinear architectures, and decision-support systems.

References

1. Tsironis G. Artificial intelligence and complex dynamical systems. Cham: Springer, 2025. 296 p. (Understanding Complex Systems). https://doi.org/10.1007/978-3-031-81946-9
2. Симонов Д. І. Метод ентропії як інструмент оптимізації складних систем. Журнал обчислювальної та прикладної математики. 2024. № 1. С. 49–58. https://doi.org/10.17721/2706-9699.2024.1.04
3. Cheng C., Ichinose G., Small M., Moreno Y. Uncertainty quantification in complex dynamical systems. Physica D: Nonlinear Phenomena. 2025. Vol. 481. Art. 134838. https://doi.org/10.1016/j.physd.2025.134838
4. Poquet O., Jovanovic J., Pardo A. Student profiles of change in a university course: A complex dynamical systems perspective. In: Proceedings of the 13th International Learning Analytics and Knowledge Conference (LAK 2023). New York : ACM, 2023. P. 197–207. https://doi.org/10.1145/3576050.3576077
5. Geier C., Hamdi S., Chancelier T., Dufrénoy P., Hoffmann N., Stender M. Machine learning-based state maps for complex dynamical systems: Applications to friction-excited brake system vibrations. Nonlinear Dynamics. 2023. Vol. 111, No. 24. P. 22137–22151. https://doi.org/10.1007/s11071-023-08739-6
6. Симонов Д. І., Горбачук В. М. Метод пошуку рішень у динамічній моделі управління запасами за невизначеності. Вісник Київського національного університету імені Тараса Шевченка. Серія фізико-математичні науки. 2022. № 4. С. 31–39. https://doi.org/10.17721/1812-5409.2022/4.4
7. Li J., Guo S., Ma R., et al. Comparison of the effects of imputation methods for missing data in predictive modelling of cohort study datasets. BMC Medical Research Methodology. 2024. Vol. 24, No. 1. Art. 41. https://doi.org/10.1186/s12874-024-02173-x
8. Char I., Abbate J., Bardoczi L., et al. Offline model-based reinforcement learning for tokamak control. In: Proceedings of The 5th Annual Learning for Dynamics and Control Conference. Vol. 211. PMLR, 2023. P. 1357–1372.
9. Graffeuille O., Koh Y. S., Wicker J. S., Lehmann M. K. Semi-supervised conditional density estimation with Wasserstein Laplacian regularisation. Proceedings of the AAAI Conference on Artificial Intelligence. 2022. https://doi.org/10.1609/aaai.v36i6.20630
10. Forgione M., Piga D. Neural state-space models: Empirical evaluation of uncertainty quantification. IFAC-PapersOnLine. 2023. Vol. 56, No. 2. P. 4082–4087. https://doi.org/10.1016/j.ifacol.2023.10.1736
11. Hu Z., Ahmadi Daryakenari N., Shen Q., Kawaguchi K., Karniadakis G. E. State-space models are accurate and efficient neural operators for dynamical systems. Neural Networks. 2026. Vol. 197. Art. 108496. https://doi.org/10.1016/j.neunet.2025.108496
12. Volkmann E., Brändle A., Durstewitz D., Koppe G. A scalable generative model for dynamical system reconstruction from neuroimaging data. Advances in Neural Information Processing Systems. 2024. Vol. 37. P. 80328–80362.
13. Hafner D., Pasukonis J., Ba J., Lillicrap T. Mastering diverse control tasks through world models. Nature. 2025. Vol. 640, No. 8059. P. 647–653. https://doi.org/10.1038/s41586-025-08744-2
14. Sun R., Zang H., Li X., Islam R. Learning latent dynamic robust representations for world models. In: Proceedings of the 41st International Conference on Machine Learning (ICML 2024). PMLR, 2024.
15. Frauenknecht B., Eisele A., Devdutt S., Solowjow F., Trimpe S. Trust the model where it trusts itself: Model-based actor-critic with uncertainty-aware rollout adaption. In: Proceedings of the 41st International Conference on Machine Learning (ICML 2024). PMLR, 2024.
16. Barenboim M., Shienman M., Indelman V. Monte Carlo planning in hybrid belief POMDPs. IEEE Robotics and Automation Letters. 2023. Vol. 8, No. 8. P. 4410–4417. https://doi.org/10.1109/LRA.2023.3282773
17. Arcieri G., Hoelzl C., Schwery O., et al. POMDP inference and robust solution via deep reinforcement learning: An application to railway optimal maintenance. Machine Learning. 2024. Vol. 113, No. 10. P. 7967–7995. https://doi.org/10.1007/s10994-024-06559-2
18. Peters J., Bauer S., Pfister N. Causal models for dynamical systems. In: Probabilistic and Causal Inference: The Works of Judea Pearl. 2022. P. 671–690.
19. Lozano-Durán A., Arranz G. Information-theoretic formulation of dynamical systems: Causality, modeling, and control. Physical Review Research. 2022. Vol. 4, No. 2. Art. 023195. https://doi.org/10.1103/PhysRevResearch.4.023195
20. Zeng Y., Cai R., Sun F., Huang L., Hao Z. A survey on causal reinforcement learning. IEEE Transactions on Neural Networks and Learning Systems. 2024.
21. Zhou Y., Qi Z., Shi C., Li L. Optimizing pessimism in dynamic treatment regimes: A Bayesian learning approach. In: Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS 2023). PMLR, 2023. P. 6704–6721.

STOCHASTIC OPERATORS OF ACTION IMPACT: FORMALISATION AND MULTI-STEP REGULARISED LEARNING

Abstract

References