I am working on a project where the explanatory variables include soil attributes, land use and land cover properties, stream flow and climate (precipitation, temperature etc) measurements recorded at multiple locations across a study area. I am proposing to use a random forest regression model to predict the response (an eco-indicator) at these locations.
- Soil attributes and land-use/land-cover data are available as single measurements at each location.
- Stream flow and climate data are available as a time series. From these time series data I was hoping to extract long-term representative values such as mean daily stream flow, average annual precipitation total, average annual temperature and so on.
However during exploratory data analysis I noticed that at several locations in the study area the annual time series of stream flow and climate variables are exhibiting non-stationarity. I am aware that it is possible stationarize these series using techniques such as detrending or differencing. However, I would like to know if random forests can handle non stationary inputs (without stationarizing)? Are there any best practices when dealing with such data sets?
Thanks!
Aucun commentaire:
Enregistrer un commentaire