%0 Conference Paper
%B Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on
%D 2011
%T Kernel PLS regression for robust monocular pose estimation
%A Dondera,R.
%A Davis, Larry S.
%K (computer
%K 3D
%K analysis;rendering
%K correlations;projection
%K detection;monocular
%K detection;pose
%K estimation;Gaussian
%K estimation;nonlinear
%K estimation;regression
%K GP
%K graphics);
%K images;rendering
%K latent
%K monocular
%K PLS
%K pose
%K process;Kernel
%K processes;object
%K regression;Gaussian
%K regression;human
%K software;robust
%K structures;realistic
%K to
%X We evaluate the robustness of five regression techniques for monocular 3D pose estimation. While most of the discriminative pose estimation methods focus on overcoming the fundamental problem of insufficient training data, we are interested in characterizing performance improvement for increasingly large training sets. Commercially available rendering software allows us to efficiently generate large numbers of realistic images of poses from diverse actions. Inspired by recent work in human detection, we apply PLS and kPLS regression to pose estimation. We observe that kPLS regression incrementally approximates GP regression using the strongest nonlinear correlations between image features and pose. This provides robustness, and our experiments show kPLS regression is more robust than two GP-based state-of-the-art methods for pose estimation.
%B Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on
%P 24 - 30
%8 2011/06//
%G eng
%R 10.1109/CVPRW.2011.5981750
%0 Journal Article
%J Computer Graphics and Applications, IEEE
%D 2011
%T Social Snapshot: A System for Temporally Coupled Social Photography
%A Patro,R.
%A Ip, Cheuk Yiu
%A Bista,S.
%A Varshney, Amitabh
%K 3D
%K acquisition;data
%K acquisition;photography;social
%K computing;
%K coupled
%K data
%K photography;data
%K photography;temporally
%K reconstruction;social
%K sciences
%K snapshot;spatiotemporal
%K social
%X Social Snapshot actively acquires and reconstructs temporally dynamic data. The system enables spatiotemporal 3D photography using commodity devices, assisted by their auxiliary sensors and network functionality. It engages users, making them active rather than passive participants in data acquisition.
%B Computer Graphics and Applications, IEEE
%V 31
%P 74 - 84
%8 2011/02//jan
%@ 0272-1716
%G eng
%N 1
%R 10.1109/MCG.2010.107
%0 Conference Paper
%B Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on
%D 2011
%T Trainable 3D recognition using stereo matching
%A Castillo,C. D
%A Jacobs, David W.
%K 2D
%K 3D
%K class
%K classification
%K classification;image
%K data
%K dataset;CMU
%K dataset;face
%K descriptor;occlusion;pose
%K estimation;solid
%K image
%K image;3D
%K matching;pose
%K matching;trainable
%K modelling;stereo
%K object
%K PIE
%K processing;
%K recognition;face
%K recognition;image
%K set;3D
%K variation;stereo
%X Stereo matching has been used for face recognition in the presence of pose variation. In this approach, stereo matching is used to compare two 2-D images based on correspondences that reflect the effects of viewpoint variation and allow for occlusion. We show how to use stereo matching to derive image descriptors that can be used to train a classifier. This improves face recognition performance, producing the best published results on the CMU PIE dataset. We also demonstrate that classification based on stereo matching can be used for general object classification in the presence of pose variation. In preliminary experiments we show promising results on the 3D object class dataset, a standard, challenging 3D classification data set.
%B Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on
%P 625 - 631
%8 2011///
%G eng
%R 10.1109/ICCVW.2011.6130301
%0 Conference Paper
%B Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on
%D 2010
%T Fast directional chamfer matching
%A Ming-Yu Liu
%A Tuzel, O.
%A Veeraraghavan,A.
%A Chellapa, Rama
%K 3D
%K algorithm;edge
%K algorithms;single
%K chamfer
%K cost
%K detection;image
%K directional
%K distance
%K distribution;cost
%K example;smooth
%K function;sublinear
%K hand-drawn
%K images;edge
%K information;fast
%K integral
%K localization
%K MATCHING
%K matching;gallery
%K matching;transforms;
%K model;piecewise
%K of
%K orientation
%K problem;object
%K score;directional
%K shapes;object
%K smooth;shape
%K TIME
%K time;cost
%K transforms;computational
%K variation;directional
%X We study the object localization problem in images given a single hand-drawn example or a gallery of shapes as the object model. Although many shape matching algorithms have been proposed for the problem over the decades, chamfer matching remains to be the preferred method when speed and robustness are considered. In this paper, we significantly improve the accuracy of chamfer matching while reducing the computational time from linear to sublinear (shown empirically). Specifically, we incorporate edge orientation information in the matching algorithm such that the resulting cost function is piecewise smooth and the cost variation is tightly bounded. Moreover, we present a sublinear time algorithm for exact computation of the directional chamfer matching score using techniques from 3D distance transforms and directional integral images. In addition, the smooth cost function allows to bound the cost distribution of large neighborhoods and skip the bad hypotheses within. Experiments show that the proposed approach improves the speed of the original chamfer matching upto an order of 45 #x00D7;, and it is much faster than many state of art techniques while the accuracy is comparable.
%B Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on
%P 1696 - 1703
%8 2010/06//
%G eng
%R 10.1109/CVPR.2010.5539837
%0 Conference Paper
%B Robotics and Automation (ICRA), 2010 IEEE International Conference on
%D 2010
%T Pose estimation in heavy clutter using a multi-flash camera
%A Ming-Yu Liu
%A Tuzel, O.
%A Veeraraghavan,A.
%A Chellapa, Rama
%A Agrawal,A.
%A Okuda, H.
%K 3D
%K algorithm;object
%K based
%K camera;multiview
%K depth
%K detection;object
%K detection;pose
%K distance
%K edge
%K edges;cameras;image
%K edges;integral
%K estimation;binary
%K estimation;multiflash
%K estimation;robot
%K function;depth
%K images;location
%K localization;pose
%K maps
%K matching;cost
%K matching;image
%K pose-refinement
%K texture;object
%K transforms;angular
%K vision;texture
%K vision;transforms;
%X We propose a novel solution to object detection, localization and pose estimation with applications in robot vision. The proposed method is especially applicable when the objects of interest may not be richly textured and are immersed in heavy clutter. We show that a multi-flash camera (MFC) provides accurate separation of depth edges and texture edges in such scenes. Then, we reformulate the problem, as one of finding matches between the depth edges obtained in one or more MFC images to the rendered depth edges that are computed offline using 3D CAD model of the objects. In order to facilitate accurate matching of these binary depth edge maps, we introduce a novel cost function that respects both the position and the local orientation of each edge pixel. This cost function is significantly superior to traditional Chamfer cost and leads to accurate matching even in heavily cluttered scenes where traditional methods are unreliable. We present a sub-linear time algorithm to compute the cost function using techniques from 3D distance transforms and integral images. Finally, we also propose a multi-view based pose-refinement algorithm to improve the estimated pose. We implemented the algorithm on an industrial robot arm and obtained location and angular estimation accuracy of the order of 1 mm and 2 #x00B0; respectively for a variety of parts with minimal texture.
%B Robotics and Automation (ICRA), 2010 IEEE International Conference on
%P 2028 - 2035
%8 2010/05//
%G eng
%R 10.1109/ROBOT.2010.5509897
%0 Conference Paper
%B Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on
%D 2010
%T Pose-robust albedo estimation from a single image
%A Biswas,S.
%A Chellapa, Rama
%K 3D
%K albedo
%K estimation;
%K estimation;shape
%K Face
%K filtering;face
%K image;single
%K image;stochastic
%K information;pose-robust
%K matching;pose
%K nonfrontal
%K pose;class-specific
%K recognition;filtering
%K recovery;single
%K statistics;computer
%K theory;pose
%K vision;illumination-insensitive
%X We present a stochastic filtering approach to perform albedo estimation from a single non-frontal face image. Albedo estimation has far reaching applications in various computer vision tasks like illumination-insensitive matching, shape recovery, etc. We extend the formulation proposed in that assumes face in known pose and present an algorithm that can perform albedo estimation from a single image even when pose information is inaccurate. 3D pose of the input face image is obtained as a byproduct of the algorithm. The proposed approach utilizes class-specific statistics of faces to iteratively improve albedo and pose estimates. Illustrations and experimental results are provided to show the effectiveness of the approach. We highlight the usefulness of the method for the task of matching faces across variations in pose and illumination. The facial pose estimates obtained are also compared against ground truth.
%B Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on
%P 2683 - 2690
%8 2010/06//
%G eng
%R 10.1109/CVPR.2010.5539987
%0 Conference Paper
%B Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on
%D 2010
%T Robust RVM regression using sparse outlier model
%A Mitra, K.
%A Veeraraghavan,A.
%A Chellapa, Rama
%K 3D
%K analysis;
%K approach;Gaussian
%K denoising;computer
%K denoising;lighting
%K denoising;regression
%K estimation;relevance
%K human
%K machine;robust
%K model;Gaussian
%K noise;basis
%K noise;computer
%K outlier
%K pose;Bayesian
%K pursuit
%K regression;sparse
%K RVM
%K vector
%K vision;image
%X Kernel regression techniques such as Relevance Vector Machine (RVM) regression, Support Vector Regression and Gaussian processes are widely used for solving many computer vision problems such as age, head pose, 3D human pose and lighting estimation. However, the presence of outliers in the training dataset makes the estimates from these regression techniques unreliable. In this paper, we propose robust versions of the RVM regression that can handle outliers in the training dataset. We decompose the noise term in the RVM formulation into a (sparse) outlier noise term and a Gaussian noise term. We then estimate the outlier noise along with the model parameters. We present two approaches for solving this estimation problem: (1) a Bayesian approach, which essentially follows the RVM framework and (2) an optimization approach based on Basis Pursuit Denoising. In the Bayesian approach, the robust RVM problem essentially becomes a bigger RVM problem with the advantage that it can be solved efficiently by a fast algorithm. Empirical evaluations, and real experiments on image de-noising and age estimation demonstrate the better performance of the robust RVM algorithms over that of the RVM reg ression.
%B Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on
%P 1887 - 1894
%8 2010/06//
%G eng
%R 10.1109/CVPR.2010.5539861
%0 Conference Paper
%B Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on
%D 2009
%T Visibility constraints on features of 3D objects
%A Basri,R.
%A Felzenszwalb,P. F
%A Girshick,R. B
%A Jacobs, David W.
%A Klivans,C. J
%K 3D
%K algorithms;synthetic
%K complexity;iterative
%K constraints;computational
%K data;synthetic
%K dataset;NP-hard;image-based
%K features;COIL
%K framework;iterative
%K images;three-dimensional
%K methods;object
%K object
%K recognition;
%K recognition;viewing
%K sphere;visibility
%X To recognize three-dimensional objects it is important to model how their appearances can change due to changes in viewpoint. A key aspect of this involves understanding which object features can be simultaneously visible under different viewpoints. We address this problem in an image-based framework, in which we use a limited number of images of an object taken from unknown viewpoints to determine which subsets of features might be simultaneously visible in other views. This leads to the problem of determining whether a set of images, each containing a set of features, is consistent with a single 3D object. We assume that each feature is visible from a disk of viewpoints on the viewing sphere. In this case we show the problem is NP-hard in general, but can be solved efficiently when all views come from a circle on the viewing sphere. We also give iterative algorithms that can handle noisy data and converge to locally optimal solutions in the general case. Our techniques can also be used to recover viewpoint information from the set of features that are visible in different images. We show that these algorithms perform well both on synthetic data and images from the COIL dataset.
%B Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on
%P 1231 - 1238
%8 2009/06//
%G eng
%R 10.1109/CVPR.2009.5206726
%0 Conference Paper
%B Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on
%D 2008
%T Compressed sensing for multi-view tracking and 3-D voxel reconstruction
%A Reddy, D.
%A Sankaranarayanan,A. C
%A Cevher, V.
%A Chellapa, Rama
%K 3D
%K background-subtracted
%K coding;
%K Estimation
%K image
%K problems;multi-view
%K projections;silhouette
%K reconstruction;CS
%K reconstruction;video
%K sensing;multi-view
%K silhouettes;image
%K sparsity;sparse
%K theory;compressed
%K tracking;random
%K voxel
%X Compressed sensing (CS) suggests that a signal, sparse in some basis, can be recovered from a small number of random projections. In this paper, we apply the CS theory on sparse background-subtracted silhouettes and show the usefulness of such an approach in various multi-view estimation problems. The sparsity of the silhouette images corresponds to sparsity of object parameters (location, volume etc.) in the scene. We use random projections (compressed measurements) of the silhouette images for directly recovering object parameters in the scene coordinates. To keep the computational requirements of this recovery procedure reasonable, we tessellate the scene into a bunch of non-overlapping lines and perform estimation on each of these lines. Our method is scalable in the number of cameras and utilizes very few measurements for transmission among cameras. We illustrate the usefulness of our approach for multi-view tracking and 3-D voxel reconstruction problems.
%B Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on
%P 221 - 224
%8 2008/10//
%G eng
%R 10.1109/ICIP.2008.4711731
%0 Conference Paper
%B Computational Sciences and Its Applications, 2008. ICCSA '08. International Conference on
%D 2008
%T Multi-Scale 3D Morse Complexes
%A Comic,L.
%A De Floriani, Leila
%K (mathematics);inverse
%K 3D
%K analysis;duality
%K complexes;topology;data
%K computing;topology;
%K critical
%K expansion
%K lines;inverse
%K morphology;mathematics
%K Morse
%K operations;morphology;multi-scale
%K points;duality;integral
%K problems;mathematical
%X Morse theory studies the relationship between the topology of a manifold M and the critical points of a scalar function f defined over M. Morse and Morse-Smale complexes, defined by critical points and integral lines of f, induce a subdivision of M into regions of uniform gradient flow, representing the morphology of M in a compact way. Function f can be simplified by canceling its critical points in pairs, thus simplifying the morphological representation of M, given by Morse and Morse-Smale complexes of f. Here, we propose a compact representation for the two Morse complexes in 3D, which is based on encoding the incidence relations of their cells, and on exploiting the duality among the complexes. We define cancellation operations, and their inverse expansion operations, on the Morse complexes and on their dual representation. We propose a multi-scale representation of the Morse complexes which provides a description of such complexes, and thus of the morphology of a 3D scalar field, at different levels of abstraction. This representation allows us also to perform selective refinement operations to extract description of the complexes which varies in different parts of the domain, thus improving efficiency on large data sets, and eliminating the noise in the data through topology simplification.
%B Computational Sciences and Its Applications, 2008. ICCSA '08. International Conference on
%P 441 - 451
%8 2008/07/30/3
%G eng
%R 10.1109/ICCSA.2008.10
%0 Conference Paper
%B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on
%D 2007
%T Multimodal Tracking for Smart Videoconferencing and Video Surveillance
%A Zotkin,Dmitry N
%A Raykar,V.C.
%A Duraiswami, Ramani
%A Davis, Larry S.
%K (numerical
%K 3D
%K algorithm;smart
%K analysis;least
%K approximations;particle
%K arrays;nonlinear
%K cameras;multiple
%K Carlo
%K estimator;multimodal
%K filter;self-calibration
%K Filtering
%K least
%K likelihood
%K methods);teleconferencing;video
%K methods;image
%K microphone
%K MOTION
%K motion;Monte-Carlo
%K problem;particle
%K processing;video
%K signal
%K simulations;maximum
%K squares
%K surveillance;
%K surveillance;Monte
%K tracking;multiple
%K videoconferencing;video
%X Many applications require the ability to track the 3-D motion of the subjects. We build a particle filter based framework for multimodal tracking using multiple cameras and multiple microphone arrays. In order to calibrate the resulting system, we propose a method to determine the locations of all microphones using at least five loudspeakers and under assumption that for each loudspeaker there exists a microphone very close to it. We derive the maximum likelihood (ML) estimator, which reduces to the solution of the non-linear least squares problem. We verify the correctness and robustness of the multimodal tracker and of the self-calibration algorithm both with Monte-Carlo simulations and on real data from three experimental setups.
%B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on
%P 1 - 2
%8 2007/06//
%G eng
%R 10.1109/CVPR.2007.383525
%0 Conference Paper
%B Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
%D 2007
%T Simulation and Analysis of Human Walking Motion
%A Nandy, K.
%A Chellapa, Rama
%K 3D
%K algorithm;revolute
%K analysis;image
%K chain;mechanical
%K DYNAMICS
%K Euler
%K extraction;image
%K geometry;angle
%K inverse
%K joints;rigid
%K links;surveillance;time
%K method;feature
%K model;recursive
%K models;torque
%K MOTION
%K motion;kinematic
%K Newton
%K patterns;Newton
%K problems;time
%K sequences;dynamic
%K sequences;inverse
%K sequences;walking
%K series
%K series;time
%K simulation;torque;
%K TIME
%K walking
%K warp
%K warping;healthcare;human
%X Simulation and analysis of human walking motion has applications in surveillance and healthcare. In this paper we discuss an approach for modeling human walking motion using a mechanical model in the form of a kinematic chain consisting of rigid links and revolute joints. Our goal is to discriminate different types of walking motions using information such as joint torque and angle sequences extracted from the model. The angle sequences are initially extracted using 3D geometry. From these angle sequences we extract the torque sequences using a recursive Newton Euler inverse dynamics algorithm. Time series models and dynamic time warping of the torque and angle sequences are used to characterize and discriminate different walking patterns. A forward dynamics algorithm is also presented for synthesizing different walking sequences like limping from a normal walking torque sequence
%B Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
%V 1
%P I-797 -I-800 - I-797 -I-800
%8 2007/04//
%G eng
%R 10.1109/ICASSP.2007.366028
%0 Conference Paper
%B Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
%D 2006
%T Headphone-Based Reproduction of 3D Auditory Scenes Captured by Spherical/Hemispherical Microphone Arrays
%A Zhiyun Li
%A Duraiswami, Ramani
%K 3D
%K analysis;headphones;microphone
%K arrays;orthogonal
%K arrays;spatial
%K auditory
%K beam-space;spatial
%K filter;spherical
%K filters;
%K function;headphone-based
%K harmonics;array
%K microphone
%K processing;audio
%K processing;harmonic
%K related
%K reproduction;hemispherical
%K scenes;head
%K signal
%K transfer
%X We propose a method to reproduce 3D auditory scenes captured by spherical microphone arrays over headphones. This algorithm employs expansions of the captured sound and the head related transfer function over the sphere and uses the orthonormality of the spherical harmonics. Using a spherical microphone array, we first record the 3D auditory scene, then the recordings are spatially filtered and reproduced through headphones in the orthogonal beam-space of the head related transfer functions (HRTFs). We use the KEMAR HRTF measurements to verify our algorithm
%B Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
%V 5
%P V - V
%8 2006/05//
%G eng
%R 10.1109/ICASSP.2006.1661281
%0 Conference Paper
%B Image Processing, 2006 IEEE International Conference on
%D 2006
%T Invariant Geometric Representation of 3D Point Clouds for Registration and Matching
%A Biswas,S.
%A Aggarwal,G.
%A Chellapa, Rama
%K 3D
%K cloud;computer
%K function
%K geometric
%K graphics;geophysical
%K graphics;image
%K Interpolation
%K matching;image
%K point
%K processing;image
%K reconstruction;image
%K registration;image
%K registration;implicit
%K representation;interpolation;
%K representation;variational
%K signal
%K technique;clouds;computer
%K value;invariant
%X Though implicit representations of surfaces have often been used for various computer graphics tasks like modeling and morphing of objects, it has rarely been used for registration and matching of 3D point clouds. Unlike in graphics, where the goal is precise reconstruction, we use isosurfaces to derive a smooth and approximate representation of the underlying point cloud which helps in generalization. Implicit surfaces are generated using a variational interpolation technique. Implicit function values on a set of concentric spheres around the 3D point cloud of object are used as features for matching. Geometric-invariance is achieved by decomposing implicit values based feature set into various spherical harmonics. The decomposition provides a compact representation of 3D point clouds while achieving rotation invariance
%B Image Processing, 2006 IEEE International Conference on
%P 1209 - 1212
%8 2006/10//
%G eng
%R 10.1109/ICIP.2006.312542
%0 Conference Paper
%B Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
%D 2006
%T Motion Based Correspondence for 3D Tracking of Multiple Dim Objects
%A Veeraraghavan,A.
%A Srinivasan, M.
%A Chellapa, Rama
%A Baird, E.
%A Lamont, R.
%K 3D
%K analysis;motion
%K analysis;video
%K based
%K cameras;feature
%K correspondence;motion
%K dim
%K extraction;image
%K extraction;multiple
%K features
%K MOTION
%K objects;video
%K processing;
%K signal
%K tracking;motion
%X Tracking multiple objects in a video is a demanding task that is frequently encountered in several systems such as surveillance and motion analysis. Ability to track objects in 3D requires the use of multiple cameras. While tracking multiple objects using multiples video cameras, establishing correspondence between objects in the various cameras is a nontrivial task. Specifically, when the targets are dim or are very far away from the camera, appearance cannot be used in order to establish this correspondence. Here, we propose a technique to establish correspondence across cameras using the motion features extracted from the targets, even when the relative position of the cameras is unknown. Experimental results are provided for the problem of tracking multiple bees in natural flight using two cameras. The reconstructed 3D flight paths of the bees show some interesting flight patterns
%B Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
%V 2
%P II - II
%8 2006/05//
%G eng
%R 10.1109/ICASSP.2006.1660431
%0 Journal Article
%J Magnetics, IEEE Transactions on
%D 2006
%T Numerical analysis of plasmon resonances in nanoparticles
%A Mayergoyz, Issak D
%A Zhang,Zhenyu
%K 3D
%K analysis;plasmon
%K and
%K boundary
%K eigenfunctions;electrostatics;nanoparticles;permittivity;surface
%K equation;boundary
%K equations;eigenvalues
%K integral
%K nanoparticles;eigenvalue
%K plasmon
%K problem;numerical
%K resonance;
%K resonances;specific
%X Plasmon (electrostatic) resonances in nanoparticles are treated as an eigenvalue problem for a specific boundary integral equation. This leads to direct calculation of resonance values of permittivity and resonance frequency. The numerical technique is illustrated by examples of calculation of resonance frequencies for three-dimensional nanoparticles
%B Magnetics, IEEE Transactions on
%V 42
%P 759 - 762
%8 2006/04//
%@ 0018-9464
%G eng
%N 4
%R 10.1109/TMAG.2006.870976
%0 Conference Paper
%B Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on
%D 2005
%T On the equivalence of common approaches to lighting insensitive recognition
%A Osadchy,M.
%A Jacobs, David W.
%A Lindenbaum,M.
%K 3D
%K conditions;lighting
%K cosine
%K difference;image
%K direction
%K filters;gradient
%K function;image
%K insensitive
%K intensity;lighting
%K recognition;image
%K recognition;lighting
%K scenes;Gaussian
%K segmentation;
%K variation;monotonic
%X Lighting variation is commonly handled by methods invariant to additive and multiplicative changes in image intensity. It has been demonstrated that comparing images using the direction of the gradient can produce broader insensitivity to changes in lighting conditions, even for 3D scenes. We analyze two common approaches to image comparison that are invariant, normalized correlation using small correlation windows, and comparison based on a large set of oriented difference of Gaussian filters. We show analytically that these methods calculate a monotonic (cosine) function of the gradient direction difference and hence are equivalent to the direction of gradient method. Our analysis is supported with experiments on both synthetic and real scenes
%B Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on
%V 2
%P 1721 -1726 Vol. 2 - 1721 -1726 Vol. 2
%8 2005/10//
%G eng
%R 10.1109/ICCV.2005.179
%0 Conference Paper
%B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on
%D 2005
%T Moving Object Segmentation and Dynamic Scene Reconstruction Using Two Frames
%A Agrawala, Ashok K.
%A Chellapa, Rama
%K 3D
%K analysis;
%K constraints;
%K dynamic
%K ego-motion
%K estimation;
%K flow
%K image
%K images;
%K independent
%K INTENSITY
%K least
%K mean
%K median
%K method;
%K methods;
%K model;
%K MOTION
%K motion;
%K moving
%K object
%K of
%K parallax
%K parallax;
%K parametric
%K processing;
%K reconstruction;
%K scene
%K segmentation;
%K signal
%K squares
%K squares;
%K static
%K structure;
%K subspace
%K surface
%K translational
%K two-frame
%K unconstrained
%K video
%B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on
%V 2
%P 705 - 708
%8 2005//18/23
%G eng
%R 10.1109/ICASSP.2005.1415502
%0 Conference Paper
%B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on
%D 2005
%T A robust and self-reconfigurable design of spherical microphone array for multi-resolution beamforming
%A Zhiyun Li
%A Duraiswami, Ramani
%K 3D
%K anti-terrorism;
%K array
%K array;
%K arrays;
%K audio
%K beam
%K beamforming;
%K beampattern
%K directivity
%K Frequency
%K microphone
%K multiresolution
%K omnidirectional
%K optimisation;
%K optimization;
%K processing;
%K reorganization
%K response;
%K robustness;
%K sampling;
%K self-reconfigurable
%K signal
%K soundfield
%K spherical
%K steering;
%X We describe a robust and self-reconfigurable design of a spherical microphone array for beamforming. Our approach achieves a multi-resolution spherical beamformer with performance that is either optimal in the approximation of desired beampattern or is optimal in the directivity achieved, both robustly. Our implementation converges to the optimal performance quickly while exactly satisfying the specified frequency response and robustness constraint in each iteration step without accumulated round-off errors. The advantage of this design lies in its robustness and self-reconfiguration in microphone array reorganization, such as microphone failure, which is highly desirable in online maintenance and anti-terrorism. Design examples and simulation results are presented.
%B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on
%V 4
%P iv/1137 - iv/1140 Vol. 4 - iv/1137 - iv/1140 Vol. 4
%8 2005/03//
%G eng
%R 10.1109/ICASSP.2005.1416214
%0 Conference Paper
%B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
%D 2004
%T 3D model refinement using surface-parallax
%A Agrawala, Ashok K.
%A Chellapa, Rama
%K 3D
%K adaptive
%K arbitrary
%K camera
%K coarse
%K compensation;
%K Computer
%K DEM;
%K depth
%K digital
%K ELEVATION
%K environments;
%K epipolar
%K estimation;
%K field;
%K image
%K incomplete
%K INTENSITY
%K map;
%K model
%K MOTION
%K parallax;
%K plane-parallax
%K reconstruction;
%K recovery;
%K refinement;
%K sequence;
%K sequences;
%K surface
%K surfaces;
%K urban
%K vision;
%K windowing;
%X We present an approach to update and refine coarse 3D models of urban environments from a sequence of intensity images using surface parallax. This generalizes the plane-parallax recovery methods to surface-parallax using arbitrary surfaces. A coarse and potentially incomplete depth map of the scene obtained from a digital elevation map (DEM) is used as a reference surface which is refined and updated using this approach. The reference depth map is used to estimate the camera motion and the motion of the 3D points on the reference surface is compensated. The resulting parallax, which is an epipolar field, is estimated using an adaptive windowing technique and used to obtain the refined depth map.
%B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
%V 3
%P iii - 285-8 vol.3 - iii - 285-8 vol.3
%8 2004/05//
%G eng
%R 10.1109/ICASSP.2004.1326537
%0 Conference Paper
%B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
%D 2004
%T Appearance-based tracking and recognition using the 3D trilinear tensor
%A Jie Shao
%A Zhou,S. K
%A Chellapa, Rama
%K 3D
%K adaptive
%K affine-transformation
%K airborne
%K algorithm;
%K appearance
%K appearance-based
%K based
%K estimation;
%K geometrical
%K image
%K mathematical
%K novel
%K object
%K operator;
%K operators;
%K perspective
%K prediction;
%K processing;
%K recognition;
%K representation;
%K signal
%K structure
%K synthesis;
%K template
%K tensor
%K tensor;
%K tensors;
%K tracking;
%K transformation;
%K trilinear
%K updating;
%K video
%K video-based
%K video;
%K view
%X The paper presents an appearance-based adaptive algorithm for simultaneous tracking and recognition by generalizing the transformation model to 3D perspective transformation. A trilinear tensor operator is used to represent the 3D geometrical structure. The tensor is estimated by predicting the corresponding points using the existing affine-transformation based algorithm. The estimated tensor is used to synthesize novel views to update the appearance templates. Some experimental results using airborne video are presented.
%B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
%V 3
%P iii - 613-16 vol.3 - iii - 613-16 vol.3
%8 2004/05//
%G eng
%R 10.1109/ICASSP.2004.1326619
%0 Conference Paper
%B Global Telecommunications Conference, 2004. GLOBECOM '04. IEEE
%D 2004
%T Dynamic distortion control for 3-D embedded wavelet video over multiuser OFDM networks
%A Su,Guan-Ming
%A Han,Zhu
%A M. Wu
%A Liu,K. J.R
%K 3D
%K 802.11a;
%K channels;
%K codec;
%K codecs;
%K communication;
%K control;
%K deviation;
%K distortion
%K diversity
%K diversity;
%K downlink
%K dynamic
%K embedded
%K fairness;
%K Frequency
%K IEEE
%K LAN;
%K maximal
%K minimax
%K minimization;
%K modulation;
%K multimedia
%K multiuser
%K OFDM
%K OFDM;
%K PSNR
%K rate
%K reception;
%K streaming;
%K systems;
%K techniques;
%K theory;
%K TIME
%K transforms;
%K video
%K video;
%K wavelet
%K wireless
%X In this paper, we propose a system to transmit multiple 3D embedded wavelet video programs over downlink multiuser OFDM. We consider the fairness among users and formulate the problem as minimizing the users' maximal distortion subject to power, rate, and subcarrier constraints. By exploring frequency, time, and multiuser diversity in OFDM and flexibility of the 3D embedded wavelet video codec, the proposed algorithm can achieve fair video qualities among all users. Compared to a scheme similar to the current multiuser OFDM standard (IEEE 802.11a), the proposed scheme outperforms it by 1-5 dB on the worst received PSNR among all users and has much smaller PSNR deviation.
%B Global Telecommunications Conference, 2004. GLOBECOM '04. IEEE
%V 2
%P 650 - 654 Vol.2 - 650 - 654 Vol.2
%8 2004/12/03/nov
%G eng
%R 10.1109/GLOCOM.2004.1378042
%0 Conference Paper
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%D 2004
%T Multiple view tracking of humans modelled by kinematic chains
%A Sundaresan, A.
%A Chellapa, Rama
%A RoyChowdhury, R.
%K 3D
%K algorithm;
%K analysis;
%K body
%K calibrated
%K cameras;
%K chain
%K displacement;
%K error
%K estimation;
%K human
%K image
%K iterative
%K kinematic
%K kinematics;
%K methods;
%K model;
%K MOTION
%K motion;
%K multiple
%K parameters;
%K perspective
%K Pixel
%K processing;
%K projection
%K sequences;
%K signal
%K tracking;
%K video
%K view
%X We use a kinematic chain to model human body motion. We estimate the kinematic chain motion parameters using pixel displacements calculated from video sequences obtained from multiple calibrated cameras to perform tracking. We derive a linear relation between the 2D motion of pixels in terms of the 3D motion parameters of various body parts using a perspective projection model for the cameras, a rigid body motion model for the base body and the kinematic chain model for the body parts. An error analysis of the estimator is provided, leading to an iterative algorithm for calculating the motion parameters from the pixel displacements. We provide experimental results to demonstrate the accuracy of our formulation. We also compare our iterative algorithm to the noniterative algorithm and discuss its robustness in the presence of noise.
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%V 2
%P 1009 - 1012 Vol.2 - 1009 - 1012 Vol.2
%8 2004/10//
%G eng
%R 10.1109/ICIP.2004.1419472
%0 Conference Paper
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%D 2004
%T Robust Bayesian cameras motion estimation using random sampling
%A Qian, G.
%A Chellapa, Rama
%A Qinfen Zheng
%K 3D
%K baseline
%K Bayesian
%K CAMERAS
%K cameras;
%K coarse-to-fine
%K consensus
%K density
%K estimation;
%K feature
%K function;
%K hierarchy
%K image
%K images;
%K importance
%K matching;
%K MOTION
%K posterior
%K probability
%K probability;
%K processing;
%K random
%K RANSAC;
%K real
%K realistic
%K sample
%K sampling;
%K scheme;
%K sequences;
%K stereo
%K strategy;
%K synthetic
%K wide
%X In this paper, we propose an algorithm for robust 3D motion estimation of wide baseline cameras from noisy feature correspondences. The posterior probability density function of the camera motion parameters is represented by weighted samples. The algorithm employs a hierarchy coarse-to-fine strategy. First, a coarse prior distribution of camera motion parameters is estimated using the random sample consensus scheme (RANSAC). Based on this estimate, a refined posterior distribution of camera motion parameters can then be obtained through importance sampling. Experimental results using both synthetic and real image sequences indicate the efficacy of the proposed algorithm.
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%V 2
%P 1361 - 1364 Vol.2 - 1361 - 1364 Vol.2
%8 2004/10//
%G eng
%R 10.1109/ICIP.2004.1419754
%0 Conference Paper
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%D 2004
%T Robust ego-motion estimation and 3D model refinement using depth based parallax model
%A Agrawala, Ashok K.
%A Chellapa, Rama
%K 3D
%K algorithm;
%K analysis;
%K and
%K based
%K camera;
%K coarse
%K compensation;
%K DEM;
%K depth
%K digital
%K ego-motion
%K eigen-value
%K eigenfunctions;
%K eigenvalues
%K ELEVATION
%K epipolar
%K estimation;
%K extraction;
%K feature
%K field;
%K iteration
%K iterative
%K map;
%K method;
%K methods;
%K model
%K model;
%K MOTION
%K parallax
%K partial
%K range-finding;
%K refinement;
%K refining;
%K surface
%X We present an iterative algorithm for robustly estimating the ego-motion and refining and updating a coarse, noisy and partial depth map using a depth based parallax model and brightness derivatives extracted from an image pair. Given a coarse, noisy and partial depth map acquired by a range-finder or obtained from a Digital Elevation Map (DFM), we first estimate the ego-motion by combining a global ego-motion constraint and a local brightness constancy constraint. Using the estimated camera motion and the available depth map estimate, motion of the 3D points is compensated. We utilize the fact that the resulting surface parallax field is an epipolar field and knowing its direction from the previous motion estimates, estimate its magnitude and use it to refine the depth map estimate. Instead of assuming a smooth parallax field or locally smooth depth models, we locally model the parallax magnitude using the depth map, formulate the problem as a generalized eigen-value analysis and obtain better results. In addition, confidence measures for depth estimates are provided which can be used to remove regions with potentially incorrect (and outliers in) depth estimates for robustly estimating ego-motion in the next iteration. Results on both synthetic and real examples are presented.
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%V 4
%P 2483 - 2486 Vol. 4 - 2483 - 2486 Vol. 4
%8 2004/10//
%G eng
%R 10.1109/ICIP.2004.1421606
%0 Conference Paper
%B Intelligent Transportation Systems, 2004. Proceedings. The 7th International IEEE Conference on
%D 2004
%T A spherical microphone array system for traffic scene analysis
%A Zhiyun Li
%A Duraiswami, Ramani
%A Grassi,E.
%A Davis, Larry S.
%K -6
%K 3D
%K analysis;
%K array
%K arrays;
%K audio;
%K auditory
%K beamformer;
%K capture;
%K dB;
%K environment;
%K gain;
%K microphone
%K NOISE
%K noise;
%K processing;
%K real
%K robust
%K scene
%K signal
%K spherical
%K system;
%K traffic
%K traffic;
%K virtual
%K white
%K World
%X This paper describes a practical spherical microphone array system for traffic auditory scene capture and analysis. Our system uses 60 microphones positioned on the rigid surface of a sphere. We then propose an optimal design of a robust spherical beamformer with minimum white noise gain (WNG) of -6 dB. We test this system in a real-world traffic environment. Some preliminary simulation and experimental results are presented to demonstrate its performance. This system may also find applications in broader areas such as 3D audio, virtual environment, etc.
%B Intelligent Transportation Systems, 2004. Proceedings. The 7th International IEEE Conference on
%P 338 - 342
%8 2004/10//
%G eng
%R 10.1109/ITSC.2004.1398921
%0 Conference Paper
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%D 2004
%T Uncalibrated stereo rectification for automatic 3D surveillance
%A Lim,S.-N.
%A Mittal,A.
%A Davis, Larry S.
%A Paragios,N.
%K 3D
%K AUTOMATIC
%K conjugate
%K epipolar
%K image
%K lines;
%K matching;
%K method;
%K processing;
%K rectification
%K scene;
%K stereo
%K surveillance;
%K uncalibrated
%K urban
%X We describe a stereo rectification method suitable for automatic 3D surveillance. We take advantage of the fact that in a typical urban scene, there is ordinarily a small number of dominant planes. Given two views of the scene, we align a dominant plane in one view with the other. Conjugate epipolar lines between the reference view and plane-aligned image become geometrically identical and can be added to the rectified image pair line by line. Selecting conjugate epipolar lines to cover the whole image is simplified since they are geometrically identical. In addition, the polarities of conjugate epipolar lines are automatically preserved by plane alignment, which simplifies stereo matching.
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%V 2
%P 1357 - 1360 Vol.2 - 1357 - 1360 Vol.2
%8 2004/10//
%G eng
%R 10.1109/ICIP.2004.1419753
%0 Conference Paper
%B Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on
%D 2004
%T View independent human body pose estimation from a single perspective image
%A Parameswaran, V.
%A Chellapa, Rama
%K 3D
%K analysis;
%K biomechanics;
%K body
%K body-centric
%K camera;
%K capture
%K coordinate
%K coordinates;
%K detection;
%K epipolar
%K equation
%K estimation;
%K geometry;
%K human
%K image
%K image;
%K images;
%K model-based
%K models;
%K MOTION
%K object
%K optical
%K perspective
%K physiological
%K polynomial
%K polynomials;
%K pose
%K real
%K single
%K synthetic
%K system;
%K systems;
%K torso
%K tracking;
%K twist;
%K uncalibrated
%X Recovering the 3D coordinates of various joints of the human body from an image is a critical first step for several model-based human tracking and optical motion capture systems. Unlike previous approaches that have used a restrictive camera model or assumed a calibrated camera, our work deals with the general case of a perspective uncalibrated camera and is thus well suited for archived video. The input to the system is an image of the human body and correspondences of several body landmarks, while the output is the set of 3D coordinates of the landmarks in a body-centric coordinate system. Using ideas from 3D model based invariants, we set up a polynomial system of equations in the unknown head pitch, yaw and roll angles. If we are able to make the often-valid assumption that the torso twist is small, there are finite numbers of solutions to the head-orientation that can be computed readily. Once the head orientation is computed, the epipolar geometry of the camera is recovered, leading to solutions to the 3D joint positions. Results are presented on synthetic and real images.
%B Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on
%V 2
%P II-16 - II-22 Vol.2 - II-16 - II-22 Vol.2
%8 2004/07/02/june
%G eng
%R 10.1109/CVPR.2004.1315139
%0 Journal Article
%J Multimedia, IEEE Transactions on
%D 2004
%T Wide baseline image registration with application to 3-D face modeling
%A Roy-Chowdhury, A.K.
%A Chellapa, Rama
%A Keaton, T.
%K 2D
%K 3D
%K algorithm;
%K baseline
%K biometrics;
%K Computer
%K configuration;
%K correspondence
%K doubly
%K error
%K extraction;
%K Face
%K feature
%K holistic
%K image
%K matching;
%K matrix;
%K minimization;
%K modeling;
%K models;
%K normalization
%K probability
%K probability;
%K procedure;
%K processes;
%K processing;
%K recognition;
%K registration;
%K representation;
%K sequences;
%K shapes;
%K Sinkhorn
%K spatial
%K statistics;
%K Stochastic
%K video
%K vision;
%K wide
%X Establishing correspondence between features in two images of the same scene taken from different viewing angles is a challenging problem in image processing and computer vision. However, its solution is an important step in many applications like wide baseline stereo, three-dimensional (3-D) model alignment, creation of panoramic views, etc. In this paper, we propose a technique for registration of two images of a face obtained from different viewing angles. We show that prior information about the general characteristics of a face obtained from video sequences of different faces can be used to design a robust correspondence algorithm. The method works by matching two-dimensional (2-D) shapes of the different features of the face (e.g., eyes, nose etc.). A doubly stochastic matrix, representing the probability of match between the features, is derived using the Sinkhorn normalization procedure. The final correspondence is obtained by minimizing the probability of error of a match between the entire constellation of features in the two sets, thus taking into account the global spatial configuration of the features. The method is applied for creating holistic 3-D models of a face from partial representations. Although this paper focuses primarily on faces, the algorithm can also be used for other objects with small modifications.
%B Multimedia, IEEE Transactions on
%V 6
%P 423 - 434
%8 2004/06//
%@ 1520-9210
%G eng
%N 3
%R 10.1109/TMM.2004.827511
%0 Journal Article
%J Image Processing, IEEE Transactions on
%D 2003
%T Accurate dense optical flow estimation using adaptive structure tensors and a parametric model
%A Liu,Haiying
%A Chellapa, Rama
%A Rosenfeld, A.
%K 3D
%K accuracy;
%K adaptive
%K and
%K coherent
%K confidence
%K dense
%K eigenfunctions;
%K eigenvalue
%K eigenvalues
%K eigenvectors;
%K estimation;
%K flow
%K generalized
%K ground
%K image
%K measure;
%K model;
%K MOTION
%K optical
%K parameter
%K parametric
%K problem;
%K real
%K region;
%K sequences;
%K structure
%K synthetic
%K tensor;
%K tensors;
%K three-dimensional
%K truth;
%X An accurate optical flow estimation algorithm is proposed in this paper. By combining the three-dimensional (3D) structure tensor with a parametric flow model, the optical flow estimation problem is converted to a generalized eigenvalue problem. The optical flow can be accurately estimated from the generalized eigenvectors. The confidence measure derived from the generalized eigenvalues is used to adaptively adjust the coherent motion region to further improve the accuracy. Experiments using both synthetic sequences with ground truth and real sequences illustrate our method. Comparisons with classical and recently published methods are also given to demonstrate the accuracy of our algorithm.
%B Image Processing, IEEE Transactions on
%V 12
%P 1170 - 1180
%8 2003/10//
%@ 1057-7149
%G eng
%N 10
%R 10.1109/TIP.2003.815296
%0 Conference Paper
%B Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on
%D 2003
%T Camera calibration using spheres: a semi-definite programming approach
%A Agrawal,M.
%A Davis, Larry S.
%K 3D
%K algorithms;calibration;cameras;computer
%K approach;sphere
%K calibration;camera
%K contours;semidefinite
%K extraction;
%K field;ellipse;intrinsic
%K location;spheres;vision
%K networks;common
%K parameters;occluding
%K Programming
%K target;camera
%K view
%K vision;feature
%X Vision algorithms utilizing camera networks with a common field of view are becoming increasingly feasible and important. Calibration of such camera networks is a challenging and cumbersome task. The current approaches for calibration using planes or a known 3D target may not be feasible as these objects may not be simultaneously visible in all the cameras. In this paper, we present a new algorithm to calibrate cameras using occluding contours of spheres. In general, an occluding contour of a sphere projects to an ellipse in the image. Our algorithm uses the projection of the occluding contours of three spheres and solves for the intrinsic parameters and the locations of the spheres. The problem is formulated in the dual space and the parameters are solved for optimally and efficiently using semidefinite programming. The technique is flexible, accurate and easy to use. In addition, since the contour of a sphere is simultaneously visible in all the cameras, our approach can greatly simplify calibration of multiple cameras with a common field of view. Experimental results from computer simulated data and real world data, both for a single camera and multiple cameras, are presented.
%B Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on
%P 782 -789 vol.2 - 782 -789 vol.2
%8 2003/10//
%G eng
%R 10.1109/ICCV.2003.1238428
%0 Conference Paper
%B Proceedings. IEEE Conference on Advanced Video and Signal Based Surveillance, 2003.
%D 2003
%T Human body pose estimation using silhouette shape analysis
%A Mittal,A.
%A Liang Zhao
%A Davis, Larry S.
%K 3D
%K analysis;
%K body
%K classification;
%K clutter;
%K detection;
%K estimation;
%K extraction;
%K feature
%K function;
%K human
%K image
%K likelihood
%K multiple
%K object
%K parameter
%K parameters;
%K Pixel
%K pose
%K probability;
%K segmentation;
%K segmentations;
%K SHAPE
%K silhouette
%K structure;
%K surveillance;
%K views;
%X We describe a system for human body pose estimation from multiple views that is fast and completely automatic. The algorithm works in the presence of multiple people by decoupling the problems of pose estimation of different people. The pose is estimated based on a likelihood function that integrates information from multiple views and thus obtains a globally optimal solution. Other characteristics that make our method more general than previous work include: (1) no manual initialization; (2) no specification of the dimensions of the 3D structure; (3) no reliance on some learned poses or patterns of activity; (4) insensitivity to edges and clutter in the background and within the foreground. The algorithm has applications in surveillance and promising results have been obtained.
%B Proceedings. IEEE Conference on Advanced Video and Signal Based Surveillance, 2003.
%P 263 - 270
%8 2003/07//
%G eng
%R 10.1109/AVSS.2003.1217930
%0 Conference Paper
%B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
%D 2003
%T Shape and motion driven particle filtering for human body tracking
%A Yamamoto, T.
%A Chellapa, Rama
%K 3D
%K body
%K broadcast
%K camera;
%K cameras;
%K estimation;
%K Filtering
%K framework;
%K human
%K image
%K MOTION
%K motion;
%K particle
%K processing;
%K rotational
%K sequence;
%K sequences;
%K signal
%K single
%K static
%K theory;
%K tracking;
%K TV
%K video
%X In this paper, we propose a method to recover 3D human body motion from a video acquired by a single static camera. In order to estimate the complex state distribution of a human body, we adopt the particle filtering framework. We present the human body using several layers of representation and compose the whole body step by step. In this way, more effective particles are generated and ineffective particles are removed as we process each layer. In order to deal with the rotational motion, the frequency of rotation is obtained using a preprocessing operation. In the preprocessing step, the variance of the motion field at each image is computed, and the frequency of rotation is estimated. The estimated frequency is used for the state update in the algorithm. We successfully track the movement of figure skaters in TV broadcast image sequence, and recover the 3D shape and motion of the skater.
%B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
%V 3
%P III - 61-4 vol.3 - III - 61-4 vol.3
%8 2003/07//
%G eng
%R 10.1109/ICME.2003.1221248
%0 Conference Paper
%B Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on
%D 2003
%T Using specularities for recognition
%A Osadchy,M.
%A Jacobs, David W.
%A Ramamoorthi,R.
%K 3D
%K formation;object
%K glass;computer
%K image
%K information;specular
%K light
%K measurement;reflection;stereo
%K objects;specular
%K objects;wine
%K processing;
%K property;pottery;recognition
%K recognition;object
%K recognition;position
%K reflectance
%K reflection;compact
%K reflection;transparent
%K shape;Lambertian
%K source;highlight
%K systems;shiny
%K vision;lighting;object
%X Recognition systems have generally treated specular highlights as noise. We show how to use these highlights as a positive source of information that improves recognition of shiny objects. This also enables us to recognize very challenging shiny transparent objects, such as wine glasses. Specifically, we show how to find highlights that are consistent with a hypothesized pose of an object of known 3D shape. We do this using only a qualitative description of highlight formation that is consistent with most models of specular reflection, so no specific knowledge of an object's reflectance properties is needed. We first present a method that finds highlights produced by a dominant compact light source, whose position is roughly known. We then show how to estimate the lighting automatically for objects whose reflection is part specular and part Lambertian. We demonstrate this method for two classes of objects. First, we show that specular information alone can suffice to identify objects with no Lambertian reflectance, such as transparent wine glasses. Second, we use our complete system to recognize shiny objects, such as pottery.
%B Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on
%P 1512 -1519 vol.2 - 1512 -1519 vol.2
%8 2003/10//
%G eng
%R 10.1109/ICCV.2003.1238669
%0 Conference Paper
%B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
%D 2003
%T Video based rendering of planar dynamic scenes
%A Kale, A.
%A Chowdhury, A.K.R.
%A Chellapa, Rama
%K (computer
%K 3D
%K analysis;
%K approximation;
%K based
%K camera;
%K cameras;
%K direction;
%K dynamic
%K graphics);
%K image
%K monocular
%K MOTION
%K perspective
%K planar
%K processing;
%K rendering
%K rendering;
%K scenes;
%K sequence;
%K sequences;
%K signal
%K video
%K weak
%X In this paper, we propose a method to synthesize arbitrary views of a planar scene from a monocular video sequence of it. The 3-D direction of motion of the object is robustly estimated from the video sequence. Given this direction any other view of the object can be synthesized through a perspective projection approach, under assumptions of planarity. If the distance of the object from the camera is large, a planar approximation is reasonable even for non-planar scenes. Such a method has many important applications, one of them being gait recognition where a side view of the person is required. Our method can be used to synthesize the side-view of the person in case he/she does not present a side view to the camera. Since the planarity assumption is often an approximation, the effects of non-planarity can lead to inaccuracies in rendering and needs to be corrected for. Regions where this happens are examined and a simple technique based on weak perspective approximation is proposed to offset rendering inaccuracies. Examples of synthesized views using our method and performance evaluation are presented.
%B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
%V 1
%P I - 477-80 vol.1 - I - 477-80 vol.1
%8 2003/07//
%G eng
%R 10.1109/ICME.2003.1220958
%0 Conference Paper
%B Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
%D 2003
%T Video synthesis of arbitrary views for approximately planar scenes
%A Chowdhury, A.K.R.
%A Kale, A.
%A Chellapa, Rama
%K (access
%K 3D
%K applications;
%K approach;
%K approximately
%K approximation;
%K arbitrary
%K Biometrics
%K control);
%K data;
%K direction
%K estimation;
%K evaluation;
%K Gait
%K image
%K monocular
%K MOTION
%K performance
%K perspective
%K planar
%K processing;
%K projection
%K recognition;
%K recovery;
%K scenes;
%K sequence;
%K sequences;
%K side
%K signal
%K structure;
%K Surveillance
%K surveillance;
%K synthesis;
%K synthesized
%K video
%K view
%K views;
%X In this paper, we propose a method to synthesize arbitrary views of a planar scene, given a monocular video sequence. The method is based on the availability of knowledge of the angle between the original and synthesized views. Such a method has many important applications, one of them being gait recognition. Gait recognition algorithms rely on the availability of an approximate side-view of the person. From a realistic viewpoint, such an assumption is impractical in surveillance applications and it is of interest to develop methods to synthesize a side view of the person, given an arbitrary view. For large distances from the camera, a planar approximation for the individual can be assumed. In this paper, we propose a perspective projection approach for recovering the direction of motion of the person purely from the video data, followed by synthesis of a new video sequence at a different angle. The algorithm works purely in the image and video domain, though 3D structure plays an implicit role in its theoretical justification. Examples of synthesized views using our method and performance evaluation are presented.
%B Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
%V 3
%P III - 497-500 vol.3 - III - 497-500 vol.3
%8 2003/04//
%G eng
%R 10.1109/ICASSP.2003.1199520
%0 Conference Paper
%B Multimedia and Expo, 2002. ICME '02. Proceedings. 2002 IEEE International Conference on
%D 2002
%T 3D face reconstruction from video using a generic model
%A Chowdhury, A.R.
%A Chellapa, Rama
%A Krishnamurthy, S.
%A Vo, T.
%K 3D
%K algorithm;
%K algorithms;
%K analysis;
%K Carlo
%K chain
%K Computer
%K Face
%K from
%K function;
%K generic
%K human
%K image
%K Markov
%K MCMC
%K methods;
%K model;
%K Monte
%K MOTION
%K optimisation;
%K OPTIMIZATION
%K processes;
%K processing;
%K recognition;
%K reconstruction
%K reconstruction;
%K sampling;
%K sequence;
%K sequences;
%K SfM
%K signal
%K structure
%K surveillance;
%K video
%K vision;
%X Reconstructing a 3D model of a human face from a video sequence is an important problem in computer vision, with applications to recognition, surveillance, multimedia etc. However, the quality of 3D reconstructions using structure from motion (SfM) algorithms is often not satisfactory. One common method of overcoming this problem is to use a generic model of a face. Existing work using this approach initializes the reconstruction algorithm with this generic model. The problem with this approach is that the algorithm can converge to a solution very close to this initial value, resulting in a reconstruction which resembles the generic model rather than the particular face in the video which needs to be modeled. We propose a method of 3D reconstruction of a human face from video in which the 3D reconstruction algorithm and the generic model are handled separately. A 3D estimate is obtained purely from the video sequence using SfM algorithms without use of the generic model. The final 3D model is obtained after combining the SfM estimate and the generic model using an energy function that corrects for the errors in the estimate by comparing local regions in the two models. The optimization is done using a Markov chain Monte Carlo (MCMC) sampling strategy. The main advantage of our algorithm over others is that it is able to retain the specific features of the face in the video sequence even when these features are different from those of the generic model. The evolution of the 3D model through the various stages of the algorithm is presented.
%B Multimedia and Expo, 2002. ICME '02. Proceedings. 2002 IEEE International Conference on
%V 1
%P 449 - 452 vol.1 - 449 - 452 vol.1
%8 2002///
%G eng
%R 10.1109/ICME.2002.1035815
%0 Conference Paper
%B Image Processing. 2002. Proceedings. 2002 International Conference on
%D 2002
%T Bayesian structure from motion using inertial information
%A Qian,Gang
%A Chellapa, Rama
%A Qinfen Zheng
%K 3D
%K analysis;
%K Bayes
%K Bayesian
%K camera
%K estimation;
%K image
%K images;
%K importance
%K inertial
%K information;
%K methods;
%K MOTION
%K motion;
%K parameter
%K processing;
%K real
%K reconstruction;
%K sampling;
%K scene
%K sensors;
%K sequence;
%K sequences;
%K sequential
%K signal
%K structure-from-motion;
%K synthetic
%K systems;
%K video
%X A novel approach to Bayesian structure from motion (SfM) using inertial information and sequential importance sampling (SIS) is presented. The inertial information is obtained from camera-mounted inertial sensors and is used in the Bayesian SfM approach as prior knowledge of the camera motion in the sampling algorithm. Experimental results using both synthetic and real images show that, when inertial information is used, more accurate results can be obtained or the same estimation accuracy can be obtained at a lower cost.
%B Image Processing. 2002. Proceedings. 2002 International Conference on
%V 3
%P III-425 - III-428 vol.3 - III-425 - III-428 vol.3
%8 2002///
%G eng
%R 10.1109/ICIP.2002.1038996
%0 Conference Paper
%B Multimedia Signal Processing, 2002 IEEE Workshop on
%D 2002
%T Wide baseline image registration using prior information
%A Chowdhury, AM
%A Chellapa, Rama
%A Keaton, T.
%K 2D
%K 3D
%K algorithm;
%K alignment;
%K angles;
%K baseline
%K Computer
%K configuration;
%K constellation;
%K correspondence
%K creation;
%K doubly
%K error
%K extraction;
%K Face
%K feature
%K global
%K holistic
%K image
%K images;
%K matching;
%K matrix;
%K model
%K models;
%K normalization
%K panoramic
%K probability;
%K procedure;
%K processes;
%K processing;
%K registration;
%K robust
%K sequences;
%K SHAPE
%K signal
%K Sinkhorn
%K spatial
%K statistics;
%K stereo;
%K Stochastic
%K video
%K view
%K viewing
%K vision;
%K wide
%X Establishing correspondence between features in two images of the same scene taken from different viewing angles in a challenging problem in image processing and computer vision. However, its solution is an important step in many applications like wide baseline stereo, 3D model alignment, creation of panoramic views etc. In this paper, we propose a technique for registration of two images of a face obtained from different viewing angles. We show that prior information about the general characteristics of a face obtained from video sequences of different faces can be used to design a robust correspondence algorithm. The method works by matching 2D shapes of the different features of the face. A doubly stochastic matrix, representing the probability of match between the features, is derived using the Sinkhorn normalization procedure. The final correspondence is obtained by minimizing the probability of error of a match between the entire constellations of features in the two sets, thus taking into account the global spatial configuration of the features. The method is applied for creating holistic 3D models of a face from partial representations. Although this paper focuses primarily on faces, the algorithm can also be used for other objects with small modifications.
%B Multimedia Signal Processing, 2002 IEEE Workshop on
%P 37 - 40
%8 2002/12//
%G eng
%R 10.1109/MMSP.2002.1203242
%0 Conference Paper
%B Computer Vision and Pattern Recognition, 1998. Proceedings. 1998 IEEE Computer Society Conference on
%D 1998
%T Clustering appearances of 3D objects
%A Basri,R.
%A Roth,D.
%A Jacobs, David W.
%K 3D
%K clustering;image
%K clustering;sequences
%K images;unsupervised
%K objects;local
%K of
%K properties;reliable
%K recognition;
%K sequences;object
%X We introduce a method for unsupervised clustering of images of 3D objects. Our method examines the space of all images and partitions the images into sets that form smooth and parallel surfaces in this space. It further uses sequences of images to obtain more reliable clustering. Finally, since our method relies on a non-Euclidean similarity measure we introduce algebraic techniques for estimating local properties of these surfaces without first embedding the images in a Euclidean space. We demonstrate our method by applying it to a large database of images
%B Computer Vision and Pattern Recognition, 1998. Proceedings. 1998 IEEE Computer Society Conference on
%P 414 - 420
%8 1998/06//
%G eng
%R 10.1109/CVPR.1998.698639
%0 Journal Article
%J Computer Graphics and Applications, IEEE
%D 1994
%T Computing smooth molecular surfaces
%A Varshney, Amitabh
%A Brooks, F.P.,Jr.
%A Wright,W. V
%K 3D
%K algorithm;smooth
%K algorithms;physics
%K analytical
%K atoms;three
%K complexity;computational
%K complexity;parallel
%K computing;
%K computing;surface
%K dimensional
%K geometry;interactive
%K geometry;parallel
%K improvements;computation
%K molecular
%K rates;linear
%K regular
%K surface
%K time;computational
%K triangulation;algorithmic
%K triangulation;computational
%X We consider how we set out to formulate a parallel analytical molecular surface algorithm that has expected linear complexity with respect to the total number of atoms in a molecule. To achieve this goal, we avoided computing the complete 3D regular triangulation over the entire set of atoms, a process that takes time O(n log n), where n is the number of atoms in the molecule. We aim to compute and display these surfaces at interactive rates, by taking advantage of advances in computational geometry, making further algorithmic improvements and parallelizing the computations.<>
%B Computer Graphics and Applications, IEEE
%V 14
%P 19 - 25
%8 1994/09//
%@ 0272-1716
%G eng
%N 5
%R 10.1109/38.310720
%0 Conference Paper
%B Pattern Recognition, 1994. Vol. 1 - Conference A: Computer Vision Image Processing., Proceedings of the 12th IAPR International Conference on
%D 1994
%T Finding structurally consistent motion correspondences
%A Jacobs, David W.
%A Chennubhotla,C.
%K 3D
%K boundaries;
%K common
%K consistent
%K correspondences;
%K estimation;
%K features;
%K image
%K independent
%K linear
%K MOTION
%K motion;
%K occlusion
%K programming;
%K scene
%K segmentation;
%K specularities;
%K structurally
%K structure;
%K tracked
%X Much work on deriving scene structure and motion from features assumes as input a set of tracked image features that share a common 3D motion. Producing this input requires segmenting independent motions, and detecting image features that do not correspond to 3D features, originating instead, for example in occlusion boundaries or specularities. We derive a linear program that tells when a set of tracked points might have come from 3D points that share a single motion, assuming affine motion and bounded error. We can also use linear programming to place conservative bounds on the structure of the scene that corresponds to tracked points. We implement and test this algorithm on real images
%B Pattern Recognition, 1994. Vol. 1 - Conference A: Computer Vision Image Processing., Proceedings of the 12th IAPR International Conference on
%V 1
%P 650 -653 vol.1 - 650 -653 vol.1
%8 1994/10//
%G eng
%R 10.1109/ICPR.1994.576388
%0 Conference Paper
%B Motion of Non-Rigid and Articulated Objects, 1994., Proceedings of the 1994 IEEE Workshop on
%D 1994
%T Segmenting independently moving, noisy points
%A Jacobs, David W.
%A Chennubhotla,C.
%K 3D
%K common
%K consistent
%K estimation;
%K features;
%K image
%K independently
%K linear
%K MOTION
%K motion;
%K moving
%K noisy
%K point
%K points;
%K programming;
%K real
%K segmentation;
%K sequence;
%K sequences;
%K video
%X There has been much work on using point features tracked through a video sequence to determine structure and motion. In many situations, to use this work, we must first isolate subsets of points that share a common motion. This is hard because we must distinguish between independent motions and apparent deviations from a single motion due to noise. We propose several methods of searching for point-sets with consistent 3D motions. We analyze the potential sensitivity of each method for detecting independent motions, and experiment with each method on a real image sequence
%B Motion of Non-Rigid and Articulated Objects, 1994., Proceedings of the 1994 IEEE Workshop on
%P 96 - 103
%8 1994/11//
%G eng
%R 10.1109/MNRAO.1994.346249
%0 Journal Article
%J Magnetics, IEEE Transactions on
%D 1993
%T RF scattering and radiation by using a decoupled Helmholtz equation approach
%A D'Angelo,J.
%A Mayergoyz, Issak D
%K 3D
%K analysis;
%K approach;
%K computer-efficient
%K computing;
%K decoupled
%K domain;
%K electrical
%K electromagnetic
%K element
%K engineering
%K equation
%K finite
%K finite-element
%K formulation;
%K Frequency
%K frequency-domain
%K Helmholtz
%K method;
%K Physics
%K problems;
%K propagation;
%K radiation
%K radiowave
%K RF
%K scattering;
%K wave
%X A novel finite-element formulation for the solution of 3-D RF scattering and radiation problems is presented. This formulation is based on the solution of a set of decoupled Helmholtz equations for the Cartesian components of the field vectors. This results in a robust, computer-efficient method that eliminates previous difficulties associated with `curl-curl' type partial differential equations. Although it is presented in the frequency domain, the method is easily extendible to the time domain
%B Magnetics, IEEE Transactions on
%V 29
%P 2040 - 2042
%8 1993/03//
%@ 0018-9464
%G eng
%N 2
%R 10.1109/20.250811
%0 Conference Paper
%B Computer Vision and Pattern Recognition, 1991. Proceedings CVPR '91., IEEE Computer Society Conference on
%D 1991
%T Optimal matching of planar models in 3D scenes
%A Jacobs, David W.
%K 3D
%K approximation;flat
%K error;close
%K error;model
%K features;computerised
%K features;optimal
%K matching;planar
%K models;point
%K object;image;maximum
%K pattern
%K picture
%K processing;
%K recognition;computerised
%K scenes;bounded
%K sensing
%X The problem of matching a model consisting of the point features of a flat object to point features found in an image that contains the object in an arbitrary three-dimensional pose is addressed. Once three points are matched, it is possible to determine the pose of the object. Assuming bounded sensing error, the author presents a solution to the problem of determining the range of possible locations in the image at which any additional model points may appear. This solution leads to an algorithm for determining the largest possible matching between image and model features that includes this initial hypothesis. The author implements a close approximation to this algorithm, which is O( nm isin;^{6}), where n is the number of image points, m is the number of model points, and isin; is the maximum sensing error. This algorithm is compared to existing methods, and it is shown that it produces more accurate results
%B Computer Vision and Pattern Recognition, 1991. Proceedings CVPR '91., IEEE Computer Society Conference on
%P 269 - 274
%8 1991/06//
%G eng
%R 10.1109/CVPR.1991.139700