lundi 13 juin 2016

Can Random Forest regression handle non-stationary input variables?


I am working on a project where the explanatory variables include soil attributes, land use and land cover properties, stream flow and climate (precipitation, temperature etc) measurements recorded at multiple locations across a study area. I am proposing to use a random forest regression model to predict the response (an eco-indicator) at these locations.

  1. Soil attributes and land-use/land-cover data are available as single measurements at each location.
  2. Stream flow and climate data are available as a time series. From these time series data I was hoping to extract long-term representative values such as mean daily stream flow, average annual precipitation total, average annual temperature and so on.

However during exploratory data analysis I noticed that at several locations in the study area the annual time series of stream flow and climate variables are exhibiting non-stationarity. I am aware that it is possible stationarize these series using techniques such as detrending or differencing. However, I would like to know if random forests can handle non stationary inputs (without stationarizing)? Are there any best practices when dealing with such data sets?

Thanks!


Aucun commentaire:

Enregistrer un commentaire