Material Detail

A Statistical NLG Framework for Aggregated Planning and Realization

This video was recorded at Association for Computational Linguistics (ACL), Sofia 2013. We present a hybrid natural language generation (NLG) system that consolidates macro and micro planning and surface realization tasks into one statistical learning process. Our novel approach is based on deriving a template bank automatically from a corpus of texts from a target domain. First, we identify domain specific entity tags and Discourse Representation Structures on a per sentence basis. Each sentence is then organized into semantically similar groups (representing a domain specific concept) by k-means clustering. After this semi-automatic processing (human review of cluster assignments), a number of corpus-level statistics are compiled and used as features by a ranking SVM to develop model weights from a training corpus. At generation time, a set of input data, the collection of semantically organized templates, and the model weights are used to select optimal templates. Our system is evaluated with automatic, non-expert crowdsourced and expert evaluation metrics. We also introduce a novel automatic metric - syntactic variability - that represents linguistic variation as a measure of unique template sequences across a collection of automatically generated documents. The metrics for generated weather and biography texts fall within acceptable ranges. In sum, we argue that our statistical approach to NLG reduces the need for complicated knowledge-based architectures and readily adapts to different domains with reduced development time.

Keywords:: videolectures, ocwc, oec

Disciplines:

Science and Technology / Computer Science

Go to Material

Bookmark / Add to Course ePortfolio

Create a Learning Exercise

Add Accessibility Information

Rate

Add a Comment

Quality

User Rating
Comments
Learning Exercises
Bookmark Collections
Course ePortfolios
Accessibility Info

Report Broken Link
Report as Inappropriate

More about this material

Material Type:: Presentation
Date Added to MERLOT:: February 8, 2015
Date Modified in MERLOT:: February 8, 2015
Author:: Frank Schilder, Thomson Reuters
Submitter:: The Open Education Consortium
Primary Audience:: College General Ed, College Lower Division, College Upper Division
Technical Format:: Video

Mobile Compatibility:: Not specified at this time
Language:: English
Cost Involved:: No
Source Code Available:: No
Creative Commons:: This work is licensed under a Attribution-NonCommercial-NoDerivs 3.0 United States