Material Detail

Name-Ethnicity Classification from Open Sources

Name-Ethnicity Classification from Open Sources

This video was recorded at 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Paris 2009. The problem of ethnicity identification from names has a variety of important applications, including biomedical research, demographic studies, and marketing. Here we report on the development of an ethnicity classifier where all training data is extracted from public, non-confidential (and hence somewhat unreliable) sources. Our classifier uses hidden Markov models (HMMs) and decision trees to classify names into 13 cultural/ethnic groups with individual group accuracy comparable accuracy to earlier binary (e.g., Spanish/non-Spanish) classifiers. We have applied this classifier to over 20 million names from a large-scale news corpus, identifying interesting temporal and spatial trends on the representation of particular cultural/ethnic groups.


  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material


Disciplines with similar materials as Name-Ethnicity Classification from Open Sources
Other materials like Name-Ethnicity Classification from Open Sources


Log in to participate in the discussions or sign up if you are not already a MERLOT member.