Similarity Criteria issues in Similarity Retrieval

The wide use of the internet coupled with the steadily decreasing cost in computing and storage has led to an expansion of the data that users expect to retrieve from simple numeric and alphanumeric, to include images, audio, video, where the retrieval criterion is one of similarity. An inherent difficulty with similarity retrieval is deciding on a criterion for the similarity.

Intellectual Merit

This proposal explores issues involved in retrieval that is based on several criteria of similarity:

(1) In the spatial data context, similarity is usually defined in terms of proximity in spatial position. Instead, this research will examine similarity in relative spatial orientation. Thus there will be a focus on groups of objects, in particular, attention will be paid to position-independent indexes that find use in applications involving pictorially-specified queries to symbolic image databases.

(2) Issues involved in finding the k nearest types of objects rather than the k nearest objects will be explored. At present, it is the type of the objects that is of interest rather than their identity. This differentiation is similar to the classic distinction made in spatial databases between location-based and feature-based queries.

(3) Also of interest will be finding similarity between sets of objects where the similarity measure is the maximum of the minimum distance between objects in the two sets (the Hausdorff distance). Of particular interest will be methods that are incremental so that data is obtained as quickly as possible without waiting for the algorithms to complete.

Broad Impacts

Position-independent indexing has application to pictorially-specified image databases as well as to databases of moving objects which can be used for searches that include traffic data. The Hausdorff distance is applicable to multimedia databases as is the focus on retrieval by type rather than individual objects. Via a collaboration with the National Cancer Institute, the research team will investigate what makes a good similarity measure for bioinformatics applications.

Principal Investigators