Principle goals: to use data mining techniques to understand how variables drive ecosystem functioning and a qualitative study to determine which of a variety of data mining techniques best replicates observed ecosystem processes.
The functioning of an ecosystem depends on a variety of drivers. One of the best places to examine these parameters is the Andes-to-Amazon slope [3], where a gradient of ecosystems can be found, from high elevation grassland and montane rain forest to lowland rain forest, distinguished by differences in climate (related to altitude), and soil. This project involves conducting a numerical analysis of real data from databases governing ecosystem properties. Data available include Digital Elevation Models (DEMs), slope, biomass, soil CO2 efflux, tree growth and tree species diversity; the forest ecosystem data are derived from specific sites (plots) at 10 different elevations, ranging from 3000 m to 220 m above sea level. Weather data are available for 4 of these sites. Before any analysis can happen these data need to be integrated to make one coherent set.
In this project, the student will use data mining to try to determine how all of the variables interact to drive ecosystem functioning (e.g. growth). For example, genetic programming may be used to evolve equations that predict a function based on the drivers listed above. By ensuring these equations are semantically correct, this would help explain the underlying relationships. In addition, maps of the strongest correlations will be determined, for instance using a Self Organising Map algorithm [1,2]. Finally, a quality assessment may be carried out to determine the advantages and disadvantages of the methods used in analysing the data, in terms of useful knowledge extracted and how well the results relate to observed ecosystem processes.