Material Detail

Support vector machines loss with l1 penalty

Support vector machines loss with l1 penalty

This video was recorded at Notions of Complexity: Information-theoretic, Computational and Statistical Approaches Workshop, Eindhoven 2004. We consider an i.i.d. sample from (X,Y), where X is a feature and Y a binary label, say with values +1 or -1. We use a high-dimensional linear approximation of the regression of Y on X and support vector machine loss with l1 penalty on the regression coefficients. This procedure does not depend on the (unknown) noise level or on the (unknown) sparseness of approximations of Bayes rule, but nevertheless its prediction error is smaller for smaller noise levels and/or sparser approximations. Thus, it adapts to unknown properties of the underlying distribution. In an example, we show that up to terms logarithmic in the sample size, the procedure yields minimax rates for the excess risk.


  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material


Log in to participate in the discussions or sign up if you are not already a MERLOT member.