Cross-language headline generation for Hindi

TitleCross-language headline generation for Hindi
Publication TypeJournal Articles
Year of Publication2003
AuthorsDorr BJ, Zajic D, Schwartz R
JournalACM Transactions on Asian Language Information Processing (TALIP)
Pagination270 - 289
Date Published2003/09//
ISBN Number1530-0226

This paper presents new approaches to headline generation for English newspaper texts, with an eye toward the production of document surrogates for document selection in cross-language information retrieval. This task is difficult because the user must make decisions about relevance based on (often poor) translations of retrieved documents. To facilitate the decision-making process we need translations that can be assessed rapidly and accurately; our approach is to provide an English headline for the non-English document. We describe two approaches to headline generation and their application to the recent DARPA TIDES-2003 Surprise Language Exercise for Hindi. For comparison, we also implemented an alternative method for surrogate generation: a system that produces topic lists for (Hindi) articles. We present the results of a series of experiments comparing each of these approaches. We demonstrate in both automatic and human evaluations that our linguistically motivated approach outperforms two other surrogate-generation methods: a statistical system and a topic discovery system.