Acquisition of bilingual MT lexicons from OCRed dictionaries

Publication TypeJournal Articles
Year of Publication2003
AuthorsKaragol-Ayan B, Doermann D, Dorr BJ
JournalProceedings of the 9th MT Summit
Pagination208 - 215
Date Published2003///

This paper describes an approach to analyzing the lexical structure of OCRed bilingual dictionaries to construct re-sources suited for machine translation of low-density languages, where online resources are limited. A rule-based, an
HMM-based, and a post-processed HMM-based method are used for rapid construction of MT lexicons based on sys-
tematic structural clues provided in the original dictionary. We evaluate the effectiveness of our techniques, concluding
that: (1) the rule-based method performs better with dictionaries where the font is not an important distinguishing fea-
ture for determining information types; (2) the post-processed stochastic method improves the results of the stochastic
method for phrasal entries; and (3) Our resulting bilingual lexicons are comprehensive enough to provide the basis for
reasonable translation results when compared to human translations.