TitleAudio visual scene analysis using spherical arrays and cameras.
Publication TypeJournal Articles
Year of Publication2010
AuthorsO'donovan A, Duraiswami R, Zotkin DN, Gumerov NA
JournalThe Journal of the Acoustical Society of America
Pagination1979 - 1979
Date Published2010///

While audition and vision are used together by living beings to make sense of the world, the observation of the world using machines in applications such as surveillance and robotics has proceeded largely separately. We describe the use of spherical microphone arrays as “audio cameras” and spherical array of video cameras as a tool to perform multi‐modal scene analysis that attempts to answer questions such as “Who?,”, “What?,” “Where?,” and “Why?.” Signal processing algorithms to identify the number of people and their identities and to isolate and dereverberate their speech using multi‐modal processing will be described. The use of graphics processor based signal processing allows for real‐time implementation of these algorithms. [Work supported by ONR.]