Preview

GEOGRAPHY, ENVIRONMENT, SUSTAINABILITY

Advanced search

Statistical Method For Reducing The Number Of Climatic Predictors In Species Distribution Modeling

https://doi.org/10.24057/2071-9388-2025-3734

Abstract

Nineteen bioclimatic parameters from BIOCLIM are widely used in Species Distribution Modeling (SDM). To improve modeling quality, it is essential to reduce the number of parameters. Several approaches have been proposed to solve this challenge, but each has its own limitations. In this study, we aimed to develop an effective statistical method based on identifying correlation groups of parameters and selecting the least correlated ones. Several statistical techniques were used to ensure a reliable parameter selection: simple correlation matrix analysis, cluster analysis (HDBSCAN), and factor analysis (varimax and quartimax). As an example, bioclimatic parameter values for the period 1991–2020 were analyzed for the whole globe. The results obtained using different methods show good consistency. Several correlation groups were identified, ranging from four to five, depending on the interpretation of the negative correlations. One group of two parameters, BIO14 and BIO17, can also be identified based on the results of the varimax factor analysis, although this correlation group was not identified by other methods. Finally, six bioclimatic parameters were selected (BIO2, BIO5, BIO7, BIO14, BIO15, and BIO18), one from each group that demonstrated the minimum average value of the correlation coefficient with parameters from other groups. The average correlation between the selected parameters was significantly lower than in the case of using previously applied methods with the same number of selected parameters.

About the Authors

Igor O. Popov
Yu. A. Israel Institute of Global Climate and Ecology; Institute of Geography, Russian Academy of Sciences
Russian Federation

Glebovskaya str., 20B, Moscow,107258

Staromonetniy pereulok, 29/4, Moscow,119017



Elena N. Popova
Institute of Geography, Russian Academy of Sciences
Russian Federation

Staromonetniy pereulok, 29/4, Moscow,119017



References

1. Araújo M.B., Anderson R.P., Barbosa M.A., Beale C.M., Dormann C.F., Early R., Garcia R.A., Guisan A., Maiorano L., Naimi B., O’Hara R.B., Zimmermann N.E., and Rahbek C. (2019). Standards for distribution models in biodiversity assessments. Science advances, 5(1), DOI: 10.1126/sciadv.aat4858.

2. Bellard C., Thuiller W., Leroy B., Genovesi P., Bakkenes M., and Courchamp F. (2013). Will climate change promote future invasions? Global Change Biology, 12(19), 3740–3748, DOI: 10.1111/gcb.12344.

3. Bodjrènou R., Sintondji L., N’Tcha Y., Germain D., Azonwade F., Sohindji F., Hounnou G., Amouzouvi E., Kpognin A., and Comandan F. (2025). Assessment of Hydrologic Data Estimates From ERA5 Reanalyses in Benin, West Africa. Geoscience Data Journal, 12(1), 1-16, DOI: 10.1002/gdj3.288

4. Bonan G.B. (2008). Ecological Climatology. 2nd ed. Cambrige: Cambrige University Press, DOI: 10.1017/CBO9780511805530.

5. Booth T.H. (2018). Why understanding the pioneering and continuing contributions of BIOCLIM to species distribution modelling is important. Austral Ecology, 43(8), 852-860, DOI: 10.1111/aec.12628.

6. Booth T.H. (2022). Checking bioclimatic variables that combine temperature and precipitation data before their use in species distribution models. Austral Ecology, 47(7), 1506-1514, DOI: 10.1111/aec.13234.

7. Bradie J. and Leunig B. (2017). A quantitative synthesis of the importance of variables used in MaxEnt species distribution models. Journal of Biogeograhy, 44(6), 1344–1361, DOI: 10.1111/jbi.12894.

8. Busby J. R. (1991). BIOCLIM – A bioclimate analysis and prediction system. Plant Protection Quarterly, 6(1), 8–9.

9. Campello R.J.G.B., Moulavi D., and Sander J. (2013). Density-Based Clustering Based on Hierarchical Density Estimates. In: J. Pei, V.S. Tseng, L. Cao, H. Motoda, and G. Xu, eds., Advances in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science, 7819. Berlin, Heidelberg: Springer, 160-172, DOI: 10.1007/978-3-642-37456-2_14.

10. Dinnage R. (2023). How many variables does Wordclim have, really? Generative A.I. unravels the intrinsic dimension of bioclimatic variables. bioRxiv preprint, DOI: 10.1101/2023.06.12.544623.

11. Dormann C.F., Elith J., Bacher S., Buchmann C., Carré G., Marquéz J.R.G., Gruber B., Lafourcade B., Leitão P.J., Münkemüller T., McClean C., Osborne P.E., Reineking B., Schröder B., Skidmore A.K., Zurell D., and Lautenbach, S. (2012). Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27–46, DOI: 10.1111/j.1600-0587.2012.07348.x.

12. Eldridge J., Belkin M., and Wang Y. (2015). Beyond Hartigan Consistency: Merge Distortion Metric for Hierarchical Clustering. Proceedings of The 28th Conference on Learning Theory. Proceedings of Machine Learning Research, 40, 588-606.

13. Fatolahzadeh G. A., Maghoul P., Ojo E.R., and Shalaby A. (2024). Reliability of ERA5 and ERA5-Land reanalysis data in the Canadian Prairies. Theoretical and Applied Climatology, 155(4), 3087-3098, DOI: 10.1007/s00704-023-04771-z.

14. Ferguson G. A. (1954). The concept of parsimony in factor analysis. Psychometrika, 19, 281–290, DOI: 0.1007/BF02289228.

15. Fick S.E. and Hijmans R.J. (2017). WorldClim 2: new 1km spatial resolution climate surfaces for global land areas. International Journal of Climatology, 37(12), 4302-4315, DOI: 10.1002/joc.5086.

16. Franklin J. (2009). Mapping species distributions. Spatial inference and prediction. Cambridge: Cambridge University Press. DOI: 10.1017/CBO9780511810602.

17. Gilman S.E., Urban M.C., Tewksbury J., Gilchrist G.W., and Holt R. D. (2010). A framework for community interactions under climate change. Trends in Ecology and Evolution, 25(6), 325-331, DOI: 10.1016/j.tree.2010.03.002.

18. Gorsuch R.L. (2014). Factor Analysis. New York: Routledge. DOI: 10.4324/9781315735740

19. Harris I., Osborn T.J., Jones P., and Lister D. (2020). Version 4 of the CRU TS Monthly High-Resolution Gridded Multivariate Climate Dataset. Scientific Data, 7(109), DOI: 10.1038/s41597-020-0453-3.

20. Hastie T., Tibshirani R., and Friedman J. (2009). The Elements of Statistical Learning. Data Mining, Inference, and Prediction. Second Edition. New York: Springer. DOI: 10.1007/978-0-387-84858-7.

21. Hijmans R. J., Cameron S. E., Parra J. L., Jones P. G., and Jarvis A. (2005). Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology, 25(15), 1965–1978, DOI: 10.1002/joc.1276.

22. Jennrich R.I. (2001). A simple general procedure for orthogonal rotation. Psychometrika, 66(2), 289-306, DOI: 10.1007/BF02294840.

23. Jennrich R.I. (2004). Derivative free gradient projection algorithms for rotation. Psychometrika, 69(3), 475-480, DOI: 10.1007/BF02295647.

24. Kaiser H.F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23(3), 187-200, DOI: 10.1007/BF02289233.

25. Malzer C. and Baum M. (2020). A Hybrid Approach To Hierarchical Density-based Cluster Selection. 2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), Karlsruhe, Germany, 2020, 223-228, DOI: 10.1109/MFI49285.2020.9235263.

26. McCarty J.P. (2001). Ecological Consequences of Recent Climate Change. Conservation Biology, 15(2), 320–331, DOI: 10.1046/j.1523-1739.2001.015002320.x.

27. McInnes L. and Healy J. (2017). Accelerated Hierarchical Density Based Clustering. IEEE International Conference on Data Mining Workshops (ICDMW), New Orleans, LA, USA, 2017, 33-42, DOI: 10.1109/ICDMW.2017.12.

28. Mulaik S.A. (2009). Foundations of Factor Analysis. 2nd ed. New York: Chapman and Hall/CRC, DOI: 10.1201/b15851.

29. Nix H.A. (1986). A biogeographic analysis of Australian elapid snakes. In: R. Longmore, ed., Atlas of Elapid Snakes of Australia. Australian Flora and Fauna Series No. 7. Canberra: Australian Government Publishing Service, 4-15.

30. Peterson A.T., Soberón J., Pearson R.G., Anderson R.P., Martínez-Meyer E., Nakamura M., and Araújo M.B. (2011). Ecological niches and geographic distributions. Princeton and Oxford: Princeton University Press, DOI: 10.1515/9781400840670.

31. Petrosyan V., Osipov F., Feniova I., Dergunova N., Warshavsky A., Khlyap L., and Dzialowski A. (2023). The TOP-100 most dangerous invasive alien species in Northern Eurasia: invasion trends and species distribution modelling. NeoBiota, 82, 23–56, DOI: 10.3897/neobiota.82.96282.

32. Phillips S., Dudík M., and Schapire R.E. (2004). A Maximum Entropy Approach to Species Distribution Modeling. In R. Greiner and D. Schuurmans, eds., Proceedings, Twenty-First International Conference on Machine Learning, ICML 2004, 655-662, DOI: 10.1145/1015330.1015412.

33. Phillips S.J., Anderson R.P., and Schapire R.E. (2006). Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190(3), 231-259, DOI: 10.1016/j.ecolmodel.2005.03.026.

34. Popova E.N. and Popov I.O. (2013). Climatic factors determining ranges of agricultural pests and agents of plant diseases and model methodology for assessment of change in ranges. Problems of Ecological Monitoring and Ecosystem Modelling, 25, 177–206 (in Russian with English summary).

35. Popova E.N. and Popov I.O. (2019). Modeling of potential climatic ranges of biological species and their climate-driven changes. Fundamental and Applied Climatology, 1, 58-75, DOI: 10.21513/2410-8758-2019-1-58-75 (In Russian with English summary).

36. Post E. (2013). Ecology of Climate Change. The Importance of Biotic Interactions. Princenton and Oxford: Princenton Unversity Press. DOI: 10.2307/j.ctt2jc8jj.

37. Purnadurga G., Kumar T., Kundeti K., Barbosa H., and Mall R. (2019). Evaluation of evapotranspiration estimates from observed and reanalysis data sets over Indian region. International Journal of Climatology, 39(15), DOI: 10.1002/joc.6189.

38. Reyment R.A. and Jöreskog K. G. (1996). Applied Factor Analysis in the Natural Sciences. Cambridge: Cambridge University Press. Rousseeuw P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65, DOI: 10.1016/0377-0427(87)90125-7.

39. Roweis S.T. and Saul L.K. (2000). Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 290(5500), 2323-2326, DOI: 10.1126/science.290.5500.2323.

40. Schimel D. (2013). Climate and ecosystems. Princenton and Oxford: Princenton Unversity Press.

41. Srivastava V., Lafond V., and Griess V.C. (2019). Species distribution models (SDM): Applications, benefits and challenges in invasive species management. CAB Reviews: Perspectives in Agriculture, Veterinary Science, Nutrition and Natural Resources, 14(20), 1-13, DOI: 10.1079/PAVSNNR201914020.

42. Wierzchoń S. and Kłopotek M. (2018). Modern Algorithms of Cluster Analysis. Studies in Big Data, 34. Cham: Springer, DOI: 10.1007/978-3-319-69308-8.

43. Zhang H., Zheng S., Huang T., Liu J., and Yue J. (2023). Estimation of potential suitable habitats for the relict plant Euptelea pleiosperma in China via comparison of three niche models. Sustainability, 15(14), 1-23, DOI: 10.3390/su151411035.


Review

For citations:


Popov I.O., Popova E.N. Statistical Method For Reducing The Number Of Climatic Predictors In Species Distribution Modeling. GEOGRAPHY, ENVIRONMENT, SUSTAINABILITY. 2025;18(3):19-31. https://doi.org/10.24057/2071-9388-2025-3734

Views: 7


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2071-9388 (Print)
ISSN 2542-1565 (Online)