# Material Detail

## Redefining Class Definitions using Constraint-Based Clustering: An Application to Remote Sensing of the Earth's Surface

This video was recorded at 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Washington 2010. Two aspects are crucial when constructing any real world supervised classification task: the set of classes whose distinction might be useful for the domain expert, and the set of classifications that can actually be distinguished by the data. Often a set of labels is defined with some initial intuition but these are not the best match for the task. For example, labels have been assigned for land cover classification of the Earth but it has been suspected that these labels are not ideal and some classes may be best split into subclasses whereas others should be merged. This paper formalizes this problem using three ingredients: the existing class labels, the underlying separability in the data, and a special type of input from the domain expert. We require a domain expert to specify an $L \times L$ matrix of pairwise probabilistic constraints expressing their beliefs as to whether the $L$ classes should be kept separate, merged, or split. This type of input is intuitive and easy for experts to supply. We then show that the problem can be solved by casting it as an instance of penalized probabilistic clustering (PPC). Our method, Class-Level PPC (CPPC) extends PPC showing how its time complexity can be reduced from $O(N^2)$ to $O(NL)$ for the problem of class re-definition. We further extend the algorithm by presenting a heuristic to measure adherence to constraints, and providing a criterion for determining the model complexity (number of classes) for constraint-based clustering. We demonstrate and evaluate CPPC on artificial data and on our motivating domain of land cover classification. For the latter, an evaluation by domain experts shows that the algorithm discovers novel class definitions that are better suited to land cover classification than the original set of labels.

#### Quality

• User Rating
• Learning Exercises
• Bookmark Collections
• Course ePortfolios
• Accessibility Info