Application of random forest machine learning and big geospatial data management systems applied to reconstruct the vegetation index data series—Proceedings of the International Conference “InterCarto. InterGIS”

Application of random forest machine learning and big geospatial data management systems applied to reconstruct the vegetation index data series

DOI: 10.35595/2414-9179-2024-1-30-295-305

View or download the article (Rus)

About the Author

Anna A. Vorobyeva

Saint-Petersburg State University, Institute of Earth Sciences, Department of Cartography and Geoinformatics,
2, Mendeleevskaya line, Saint Petersburg, 199034, Russia,
E-mail: st096985@student.spbu.ru

Abstract

This article discusses the content and results of the work devoted to the development of a machine learning model that allows for data incompleteness recovery using cloud computing. The problem is considered using the example of a study devoted to data modeling to fill in missing values of vegetation indices based on open data catalogs of cloud computing platforms. The proposed methodology is based on the use of a multi-year periodic sampling of vegetation index values and model training on large amounts of data to improve the quality of series reconstruction. The approach indicated in the work allows for higher accuracy than using classical interpolation methods for data recovery, which makes the modeled values suitable for use in solving various practical problems. The proposed method is implemented using the example of restoring the values of the Normalized Difference Vegetation Index used for monitoring and evaluating the state of vegetation cover. Arrays of values obtained from the catalogs of the Google Earth Engine cloud environment intended for processing and analyzing data from remote sensing of the Earth (on the territory of the central part of the Novgorod Region) were used as initial data. To accelerate the learning process of the model and increase efficiency and productivity, the capabilities of the Google Colaboratory platform were used, which made it possible not to use local computing capacity and do not use specialized software in the study. This approach can be adapted to reconstruct other indexes or resolve data incompleteness in various subject areas, which emphasizes its versatility and potential practical application.

Keywords

Google Earth Engine, regression, NDVI, Python

References

Buchnev A.A., Pyatkin V.P., Pyatkin F.V. Cloud environment model for processing Earth remote sensing data. ITNOU: Information technologies in science, education and management, 2017. No. 3. P. 57–61 (in Russian).
Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning, Data Mining, Inference, and Prediction. Springer, 2009. 746 p.
Julien Y., Sobrino J.A. Optimizing and comparing gap-filling techniques using simulated NDVI time series from remotely sensed global data. International Journal of Applied Earth Observation and Geoinformation, 2019. V. 76. P. 93–111. DOI: 10.1016/j.jag.2018.11.008.
Mordovina D.O. Cloud computing in the field of geoinformation technologies and remote sensing. Geomatics, 2012. No. 2. P. 9–11 (in Russian).
Pacifici F., Longbotham N., Emery W.J. The Importance of physical quantities for the analysis of multitemporal and multiangular optical very high spatial resolution images. IEEE Transactions on Geoscience and Remote Sensing, 2014. V. 52. No. 10. P. 6241–6256. DOI: 10.1109/TGRS.2013.2295819.
Pessoa T., Medeiros R., Nepomuceno T., Bian G., Albuquerque V.H.C., Filho P.P. Performance Analysis of Google Colaboratory as a Tool for Accelerating Deep Learning Applications. IEEE Access, 2018. V. 6. P. 61677–61685. DOI: 10.1109/ACCESS.2018.2874767.
Saad M., Chaudhary M., Karray F., Gaudet V. Machine learning based approaches for imputation in time series data and their impact on forecasting. 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2020. P. 2621–2627.
Sarafanov M., Kazakov E., Nikitin N.O., Kalyuzhnaya A.V. A machine learning approach for remote sensing data gap-filling with open-source implementation: An example regarding land surface temperature, surface albedo and NDVI. Remote Sensing, 2020. V. 12. Iss. 23. P. 3865. DOI: 10.3390/rs12233865.
Schmid J.N. Using Google Earth Engine for Landsat NDVI time series analysis to indicate the present status of forest stands. 2017. DOI: 10.13140/RG.2.2.34134.14402/6.
Schnelle F. Plant phenology. Leningrad: Gidrometeoizdat, 1961. 259 p. (in Russian).
Tarakanov D.A. Missing values recovering in hydrometeorological data using machine learning (a case study from the Belaya River, Republic of Bashkortostan). The Eurasian Scientific Journal, 2023. V. 15. No. 6 (in Russian).
Weigend A.S. Time series prediction: forecasting the future and understanding the past. Routledge, 2018. 663 p. DOI: 10.4324/9780429492648.
Zhu T. Analysis on the Applicability of the Random Forest. Journal of Physics: Conference Series, 2020. V. 1607. P. 012123. DOI: 10.1088/1742-6596/1607/1/012123.

For citation: Vorobyeva A.A. Application of random forest machine learning and big geospatial data management systems applied to reconstruct the vegetation index data series. InterCarto. InterGIS. Moscow: MSU, Faculty of Geography, 2024. V. 30. Part 1. P. 295–305. DOI: 10.35595/2414-9179-2024-1-30-295-305 (in Russian)

ISSN 2414-9179 (Print)
ISSN 2414-9209 (Online)