EVALUATION OF DISTRIBUTION DENSITY: THREE MAIN APPROACHES
Abstract
The study conducted a detailed analysis of three main approaches to density estimation: non-parametric, parametric, and semi-parametric. The results of this comparison indicate that the effectiveness of each method depends on the specific context and characteristics of the input data. The research included an analysis of the methods and the environment used for density estimation. An important step was defining the set of input data used for comparing the methods, which could involve selecting a specific dataset and setting parameters influencing the research outcomes. For the comparative analysis, training was implemented, and models were built for density estimation using each of the chosen approaches. Libraries such as seaborn, numpy, pandas, matplotlib.pyplot, sklearn.datasets, sklearn.model_selection, and scipy.stats were employed to provide necessary tools for efficient implementation and visualization of results. The analysis involved calculating the average density and quadratic error for each type of iris on the selected data, allowing the determination of the effectiveness of each method for a specific class of data and the selection of the optimal approach. The study also considered important aspects such as the statistical significance of the obtained results and the robustness of methods to random anomalies or outliers in the data. The considered approaches to density estimation underwent testing in various scenarios, including cases with non-uniform data distribution, asymmetric distributions, and a significant number of anomalies. We note that taking into account the context and purpose of the research is important when choosing the optimal method. For example, if accurate reproduction of distribution characteristics is required for further application in complex analytical tasks, parametric methods may be preferred. On the other hand, nonparametric methods can be useful in cases where the data distribution is difficult to approximate by known functions. The research focused on comparing different metrics of model quality, such as mean squared error, to determine how accurately each method reproduces the real data distribution and assess its adequacy for a specific application. The main conclusion of the study is that the density of data distribution significantly depends on the dataset, text characteristics, the estimation approach, and data processing methods used. Therefore, recommendations for choosing methods and approaches to density estimation should be adapted to the specific task and application context.
References
2. Anderson W., Guikema S., Zaitchik B., Pan W. Methods for estimating population density in data-limited areas: evaluating regression and tree-based models in Peru. PLOS. 2014. Vol. 9(7). P. 1–15.
3. Angel S., Arango Franco S., Liu Y., Blei A.M. The shape compactness of urban footprints. Prog Plann. 2020. Vol. 139. P.100429.
4. Angel S., Lamson-Hall P., Blanco Z.G. Anatomy of density: measurable factors that together constitute urban density. Buildings and Cities. 2021. Vol. 2(1). P. 264–282.
5. Boyko C.T., Cooper R. Clarifying and re-conceptualising density. Prog Plann. 2011. Vol. 76(1). P. 1–61.
6. Brunsdon C., Fotheringham A.S., Charlton M.E. Geographically weighted regression: a method for exploring spatial nonstationarity. Geogr. Anal. 2010. Vol. 28(4). P. 281–298. https://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1996.tb00936.x
7. Credit K. Spatial models or random forest? Evaluating the use of spatially explicit machine learning methods to predict employment density around new transit stations in Los Angeles. Geog Anal. 2022. Vol. 54(1). P. 58–83.
8. Dovey K., Pafka E. The urban density assemblage: modelling multiple measures. Urban Des Int. 2014. Vol. 19(1). P. 66–76.
9. Ehrlich D., Kemper T., Pesaresi M., Corbane C. Built-up area and population density: two essential societal variables to address climate hazard impact. Environ Sci Policy. 2018. Vol. 90. P. 73–82.
10. Faour G. Evaluating urban expansion using remotely-sensed data in Lebanon. Leban. Sci. J. 2015. Vol. 16(1). P. 23–32.
11. Georganos S., Grippa T., Niang Gadiaga A., Linard C., Lennert M., Vanhuysse S., Mboga N., Wolff E., Kalogirou S. Geographical random forests: a spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling. Geocarto International. 2021. Vol. 36(2). P. 121–136.
12. Guastella G., Oueslati W., Pareglio S. Patterns of urban spatial expansion in European cities. Sustainability (Switzerland). 2019. Vol. 11(8). P. 2247.
13. Güneralp B., Zhou Y., Ürge-Vorsatz D., Gupta M., Yu S., Patel P.L., Fragkias M., Li X., Seto K.C. Global scenarios of urban density and its impacts on building energy use through 2050. Proc Natl Acad Sci U S A. 2017. Vol. 114(34). P. 8945–8950.
14. Jongman B., Ward P.J., Aerts J.C.J.H. Global exposure to river and coastal flooding: long term trends and changes. Global Environ Change. 2012. Vol. 22(4). P. 823–835.
15. McFarlane C. The geographies of urban density: topology, politics and the city. Prog Human Geogr. 2016. Vol. 40(5). P. 629–648.
16. Rodriguez-Galiano V., Sanchez-Castillo M., Chica-Olmo M., Chica-Rivas M. Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol Rev. 2015. Vol. 71. P. 804–818.
17. Shang S., Du S., Zhu S. Estimating building-scale population using multi-source spatial data. Cities. 2021. Vol. 111. P. 103002.
18. Sharifi A. Resilient urban forms: a review of literature on streets and street networks. Build Environ. 2019. Vol. 147. P. 171–187.
19. Talebi H., Peeters L.J.M., Otto A., Tolosana-Delgado R. A truly spatial random forests algorithm for geoscience data analysis and modelling. Math Geosci. 2022. Vol. 54(1). P. 1–22.