Material Detail

What is the Optimal Number of Features? A learning theoretic perspective

What is the Optimal Number of Features? A learning theoretic perspective

This video was recorded at Workshop on Subspace, Latent Structure and Feature Selection Techniques: Statistical and Optimisation Perspectives, Bohinj 2005. In this paper we discuss the problem of feature selection for supervised learning from the standpoint of statistical machine learning. We inquire what subset of features will lead to the best classification accuracy. It is clear that if the statistical model is known, or if there are an unlimited number of training samples, any additional feature can only improve the accuracy. However, we explicitly show that when the training set is finite, using all the features may be suboptimal, even if all the features are independent and carry information on the label. We analyze one setting analytically and show how feature selection can increase accuracy. We also find the optimal number of features as a function of the training set size for a few specific examples. This perspective on feature selection is different from the common approach that focuses on the probability that a specific algorithm will pick a completely irrelevant or redundant feature.

Quality

  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material

Comments

Log in to participate in the discussions or sign up if you are not already a MERLOT member.