Geotagging: using proximity, sibling, and prominence clues to understand comma groups

Publication TypeConference Papers
Year of Publication2010
AuthorsLieberman MD, Samet H, Sankaranayananan J
Conference NameProceedings of the 6th Workshop on Geographic Information Retrieval
Date Published2010///
Conference LocationNew York, NY, USA
ISBN Number978-1-60558-826-1
Keywordscomma groups, geotagging, toponyms

Geotagging is the process of recognizing textual references to geographic locations, known as toponyms, and resolving these references by assigning each lat/long values. Typical geotagging algorithms use a variety of heuristic evidence to select the correct interpretation for each toponym. A study is presented of one such heuristic which aids in recognizing and resolving lists of toponyms, referred to as comma groups. Comma groups of toponyms are recognized and resolved by inferring the common threads that bind them together, based on the toponyms' shared geographic attributes. Three such common threads are proposed and studied --- population-based prominence, distance-based proximity, and sibling relationships in a geographic hierarchy --- and examples of each are noted. In addition, measurements are made of these comma groups' usage and variety in a large dataset of news articles, indicating that the proposed heuristics, and in particular the proximity and sibling heuristics, are useful for resolving comma group toponyms.