Spacial Data Mining in Precision Agriculture

Esse site do pesquisador Georg Ruß é um amplo repositório de dados e pesquisas interessantes sobre utilização de técnicas de agrupamento sobre a atividade de agricultura, e como essas aplicações podem ser úteis na resolução do aproveitamento do espaço no campo, seja na melhoria da produtividade so setor agricola; bem como a manutenção do equilíbrio dentro do aspecto ambiental/ecológico.

Segue abaixo o resumo da tese de doutorado chamada Spatial Data Mining in Precision Agriculture:

Technological advances are nowadays often based on improvements in information and data processing capabilities. Even modern agriculture is to a large extent based on adequate data processing, since the usage of novel information devices, GPS-based georeferenced data collection and high-resolution spatial data sets have become standard modes of operation, turning the once uniform site management into site-specific management as one of the most important sub-fields in precision agriculture. On the one hand, the resulting data sets clearly provide the foundations for economic and ecologic improvements. On the other hand, these data sets pose novel challenges for spatial data mining. Two specific tasks are explored in this study: spatial variable importance and management zone delineation.

The foundations of this thesis are data originating in site-specific management operations. They typically include electrical conductivity readings, fertilizer applications, soil sampling results, vegetation indicators and yield measurements. These variables are georeferenced, i.e. for a particular point of the site under study the variables and their values are known at a certain spatial resolution. These spatial data sets are furthermore augmented with digital elevation models from which terrain attributes such as slope, wetness index and curvatures are derived.

The first of the tasks is concerned with yield prediction and based on an existing dissertation in this area. Yield prediction is handled as a multivariate regression task using spatial data sets. However, taking the spatial relationships of the data sets into account requires some changes in the standard cross-validation to make it aware of spatial relationships in the data sets. Based on this addition, the question can be answered which of a variety of regression models are best suited for yield prediction. Eventually the regression models help to estimate which of the variables are important for yield prediction using permutation-based variable importance measures.

The second task is concerned with management zone delineation. Based on a literature review of existing approaches, a lack of exploratory algorithms for this task is concluded, in both the precision agriculture and the computer science domains. Hence, a novel algorithm (HACC-spatial) is developed, fulfilling the requirements posed in the literature. It is based on hierarchical agglomerative clustering incorporating a spatial constraint. The spatial contiguity of the management zones is the key parameter in this approach. Furthermore, hierarchical clustering offers a simple and appealing way to explore the data sets under study, which is one of the main goals of data mining.

O autor ainda mantém um grupo de pesquisas com diversos trabalhos sobre regressão, agrupamento, e demais técnicas.