Material Detail

The DANTE database: what it is, how it was created, and what it can contribute to the dictionaries and lexicons of the future

The DANTE database: what it is, how it was created, and what it can contribute to the dictionaries and lexicons of the future

This video was recorded at Electronic lexicography in the 21st century: new applications for new users (eLex2011). Dante ( is a lexical database which provides a fine-grained, corpus-based description of the core vocabulary of English. Every fact recorded in the database is derived from, and explicitly supported by, evidence from a 1.7 billion-word corpus of current English. Almost all of these facts are machine-retrievable. Dante – the Database of ANalysed Texts of English – was designed and created for Foras na Gaeilge by the Lexicography Master Class and an 18-strong team of skilled lexicographers, using the Sketch Engine ( for corpus-querying, and IDM's Dictionary Production System (DPS: for entry-building. The resulting database records the semantic, grammatical, combinatorial, and text-type characteristics of over 42,000 single-word lemmas and 23,000 compounds and phrasal verbs, and includes over 27,000 idioms and phrases, underpinned by over 600,000 sentence examples from the corpus. The project pioneered new approaches in project management, software customisation, text origination, and quality control. Collectively, these initiatives enabled us to achieve significant levels of automation (hence cost saving) in the lexicographic process, as well as greater systematicity. Most of these innovations are transferable, so our experience on the Dante project has implications for lexicographic methodology as a whole. Though Dante began life as an 'English framework' destined for the development of a new English-Irish dictionary ( it was designed to be a linguistic resource beyond this primary function. It offers publishers a launchpad for the development or updating of monolingual or bilingual dictionaries, and provides rich data for researchers, software developers, and materials writers. In this talk we will discuss the project's methodological innovations, demonstrate the wealth and range of data in Dante, and reflect on the long-term potential of this unique database.


  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material


Log in to participate in the discussions or sign up if you are not already a MERLOT member.