%0 Journal Article
%J Pattern Analysis and Machine Intelligence, IEEE Transactions on
%D 2012
%T Recognizing Human Actions by Learning and Matching Shape-Motion Prototype Trees
%A Zhuolin Jiang
%A Zhe Lin
%A Davis, Larry S.
%K action prototype
%K actor location
%K brute-force computation
%K CMU action data set
%K distance measures
%K dynamic backgrounds
%K dynamic prototype sequence matching
%K flexible action matching
%K frame-to-frame distances
%K frame-to-prototype correspondence
%K hierarchical k-means clustering
%K human action recognition
%K Image matching
%K image recognition
%K Image sequences
%K joint probability model
%K joint shape
%K KTH action data set
%K large gesture data set
%K learning
%K learning (artificial intelligence)
%K look-up table indexing
%K motion space
%K moving cameras
%K pattern clustering
%K prototype-to-prototype distances
%K shape-motion prototype-based approach
%K table lookup
%K training sequence
%K UCF sports data set
%K Video sequences
%K video signal processing
%K Weizmann action data set
%X A shape-motion prototype-based approach is introduced for action recognition. The approach represents an action as a sequence of prototypes for efficient and flexible action matching in long video sequences. During training, an action prototype tree is learned in a joint shape and motion space via hierarchical K-means clustering and each training sequence is represented as a labeled prototype sequence; then a look-up table of prototype-to-prototype distances is generated. During testing, based on a joint probability model of the actor location and action prototype, the actor is tracked while a frame-to-prototype correspondence is established by maximizing the joint probability, which is efficiently performed by searching the learned prototype tree; then actions are recognized using dynamic prototype sequence matching. Distance measures used for sequence matching are rapidly obtained by look-up table indexing, which is an order of magnitude faster than brute-force computation of frame-to-frame distances. Our approach enables robust action matching in challenging situations (such as moving cameras, dynamic backgrounds) and allows automatic alignment of action sequences. Experimental results demonstrate that our approach achieves recognition rates of 92.86 percent on a large gesture data set (with dynamic backgrounds), 100 percent on the Weizmann action data set, 95.77 percent on the KTH action data set, 88 percent on the UCF sports data set, and 87.27 percent on the CMU action data set.
%B Pattern Analysis and Machine Intelligence, IEEE Transactions on
%V 34
%P 533 - 547
%8 2012/03//
%@ 0162-8828
%G eng
%N 3
%R 10.1109/TPAMI.2011.147
%0 Conference Paper
%B 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops)
%D 2011
%T The evolution of stochastic grammars for representation and recognition of activities in videos
%A Chellapa, Rama
%K activity recognition
%K activity representation
%K Conferences
%K Educational institutions
%K Grammar
%K Image analysis
%K image recognition
%K image representation
%K image understanding
%K pattern recognition
%K stochastic image grammars
%K syntactic pattern recognition methods
%K video signal processing
%K video understanding
%K Videos
%X The speaker is one of the privileged many to have been taught syntactic pattern recognition methods by the Late Prof. K.S. Fu. In this talk, I will discuss the evolution of stochastic image grammars from the early seventies to now with a focus on image and video understanding applications.
%B 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops)
%I IEEE
%P 688 - 688
%8 2011/11/06/13
%@ 978-1-4673-0062-9
%G eng
%R 10.1109/ICCVW.2011.6130309
%0 Conference Paper
%B 2011 18th IEEE International Conference on Image Processing (ICIP)
%D 2011
%T Face tracking in low resolution videos under illumination variations
%A Zou, W.W.W.
%A Chellapa, Rama
%A Yuen, P.C.
%K Adaptation models
%K Computational modeling
%K Face
%K face recognition
%K face tracking
%K GLF-based tracker
%K gradient methods
%K gradient-logarithmic field feature
%K illumination variations
%K lighting
%K low resolution videos
%K low-resolution
%K particle filter
%K particle filter framework
%K particle filtering (numerical methods)
%K Robustness
%K tracking
%K video signal processing
%K Videos
%K Visual face tracking
%X In practical face tracking applications, the face region is often small and affected by illumination variations. We address this problem by using a new feature, namely the Gradient-Logarithmic Field (GLF) feature, in the particle filter framework. The GLF feature is robust under illumination variations and the GLF-based tracker does not assume any model for the face being tracked and is effective in low-resolution video. Experimental results show that the proposed GFL-based tracker works well under significant illumination changes and outperforms some of the state-of-the-art algorithms.
%B 2011 18th IEEE International Conference on Image Processing (ICIP)
%I IEEE
%P 781 - 784
%8 2011/09/11/14
%@ 978-1-4577-1304-0
%G eng
%R 10.1109/ICIP.2011.6116672
%0 Journal Article
%J IEEE Transactions on Pattern Analysis and Machine Intelligence
%D 2011
%T A Fast Bilinear Structure from Motion Algorithm Using a Video Sequence and Inertial Sensors
%A Ramachandran, M.
%A Veeraraghavan,A.
%A Chellapa, Rama
%K 3D urban modeling
%K algorithms
%K Artificial intelligence
%K CAMERAS
%K computer vision.
%K Convergence
%K fast bilinear structure
%K Google StreetView research data set
%K Image Interpretation, Computer-Assisted
%K Image reconstruction
%K Image sensors
%K Image sequences
%K Imaging, Three-Dimensional
%K inertial sensors
%K Information Storage and Retrieval
%K Linear systems
%K minimization
%K MOTION
%K motion algorithm
%K Motion estimation
%K multiple view geometry
%K Pattern Recognition, Automated
%K Sensors
%K SfM equations
%K sparse bundle adjustment algorithm
%K structure from motion
%K Three dimensional displays
%K vertical direction
%K Video Recording
%K video sequence
%K video signal processing
%X In this paper, we study the benefits of the availability of a specific form of additional information-the vertical direction (gravity) and the height of the camera, both of which can be conveniently measured using inertial sensors and a monocular video sequence for 3D urban modeling. We show that in the presence of this information, the SfM equations can be rewritten in a bilinear form. This allows us to derive a fast, robust, and scalable SfM algorithm for large scale applications. The SfM algorithm developed in this paper is experimentally demonstrated to have favorable properties compared to the sparse bundle adjustment algorithm. We provide experimental evidence indicating that the proposed algorithm converges in many cases to solutions with lower error than state-of-art implementations of bundle adjustment. We also demonstrate that for the case of large reconstruction problems, the proposed algorithm takes lesser time to reach its solution compared to bundle adjustment. We also present SfM results using our algorithm on the Google StreetView research data set.
%B IEEE Transactions on Pattern Analysis and Machine Intelligence
%V 33
%P 186 - 193
%8 2011/01//
%@ 0162-8828
%G eng
%N 1
%R 10.1109/TPAMI.2010.163
%0 Conference Paper
%B 2011 IEEE International Conference on Automatic Face & Gesture Recognition and Workshops (FG 2011)
%D 2011
%T Recent advances in age and height estimation from still images and video
%A Chellapa, Rama
%A Turaga,P.
%K age estimation
%K biometrics (access control)
%K Calibration
%K Estimation
%K Geometry
%K height estimation
%K HUMANS
%K image fusion
%K image-formation model fusion
%K Legged locomotion
%K multiview-geometry
%K Robustness
%K SHAPE
%K shape-space geometry
%K soft-biometrics
%K statistical analysis
%K statistical methods
%K video signal processing
%X Soft-biometrics such as gender, age, race, etc have been found to be useful characterizations that enable fast pre-filtering and organization of data for biometric applications. In this paper, we focus on two useful soft-biometrics - age and height. We discuss their utility and the factors involved in their estimation from images and videos. In this context, we highlight the role that geometric constraints such as multiview-geometry, and shape-space geometry play. Then, we present methods based on these geometric constraints for age and height-estimation. These methods provide a principled means by fusing image-formation models, multi-view geometric constraints, and robust statistical methods for inference.
%B 2011 IEEE International Conference on Automatic Face & Gesture Recognition and Workshops (FG 2011)
%I IEEE
%P 91 - 96
%8 2011/03/21/25
%@ 978-1-4244-9140-7
%G eng
%R 10.1109/FG.2011.5771367
%0 Journal Article
%J IEEE Transactions on Pattern Analysis and Machine Intelligence
%D 2011
%T Statistical Computations on Grassmann and Stiefel Manifolds for Image and Video-Based Recognition
%A Turaga,P.
%A Veeraraghavan,A.
%A Srivastava, A.
%A Chellapa, Rama
%K activity based video clustering
%K activity recognition
%K computational geometry
%K Computational modeling
%K Data models
%K face recognition
%K feature representation
%K finite dimensional linear subspaces
%K geometric properties
%K Geometry
%K Grassmann Manifolds
%K Grassmann.
%K HUMANS
%K Image and video models
%K image recognition
%K linear dynamic models
%K linear subspace structure
%K Manifolds
%K maximum likelihood classification
%K maximum likelihood estimation
%K Object recognition
%K Riemannian geometry
%K Riemannian metrics
%K SHAPE
%K statistical computations
%K statistical models
%K Stiefel
%K Stiefel Manifolds
%K unsupervised clustering
%K video based face recognition
%K video based recognition
%K video signal processing
%X In this paper, we examine image and video-based recognition applications where the underlying models have a special structure-the linear subspace structure. We discuss how commonly used parametric models for videos and image sets can be described using the unified framework of Grassmann and Stiefel manifolds. We first show that the parameters of linear dynamic models are finite-dimensional linear subspaces of appropriate dimensions. Unordered image sets as samples from a finite-dimensional linear subspace naturally fall under this framework. We show that an inference over subspaces can be naturally cast as an inference problem on the Grassmann manifold. To perform recognition using subspace-based models, we need tools from the Riemannian geometry of the Grassmann manifold. This involves a study of the geometric properties of the space, appropriate definitions of Riemannian metrics, and definition of geodesics. Further, we derive statistical modeling of inter and intraclass variations that respect the geometry of the space. We apply techniques such as intrinsic and extrinsic statistics to enable maximum-likelihood classification. We also provide algorithms for unsupervised clustering derived from the geometry of the manifold. Finally, we demonstrate the improved performance of these methods in a wide variety of vision applications such as activity recognition, video-based face recognition, object recognition from image sets, and activity-based video clustering.
%B IEEE Transactions on Pattern Analysis and Machine Intelligence
%V 33
%P 2273 - 2286
%8 2011/11//
%@ 0162-8828
%G eng
%N 11
%R 10.1109/TPAMI.2011.52
%0 Journal Article
%J IEEE Transactions on Image Processing
%D 2010
%T Robust Height Estimation of Moving Objects From Uncalibrated Videos
%A Jie Shao
%A Zhou,S. K
%A Chellapa, Rama
%K algorithms
%K Biometry
%K Calibration
%K EM algorithm
%K geometric properties
%K Geometry
%K Image Enhancement
%K Image Interpretation, Computer-Assisted
%K Imaging, Three-Dimensional
%K least median of squares
%K least squares approximations
%K MOTION
%K motion information
%K multiframe measurements
%K Pattern Recognition, Automated
%K Reproducibility of results
%K Robbins-Monro stochastic approximation
%K robust height estimation
%K Sensitivity and Specificity
%K Signal Processing, Computer-Assisted
%K stochastic approximation
%K Subtraction Technique
%K tracking data
%K uncalibrated stationary camera
%K uncalibrated videos
%K uncertainty analysis
%K vanishing point
%K video metrology
%K Video Recording
%K video signal processing
%X This paper presents an approach for video metrology. From videos acquired by an uncalibrated stationary camera, we first recover the vanishing line and the vertical point of the scene based upon tracking moving objects that primarily lie on a ground plane. Using geometric properties of moving objects, a probabilistic model is constructed for simultaneously grouping trajectories and estimating vanishing points. Then we apply a single view mensuration algorithm to each of the frames to obtain height measurements. We finally fuse the multiframe measurements using the least median of squares (LMedS) as a robust cost function and the Robbins-Monro stochastic approximation (RMSA) technique. This method enables less human supervision, more flexibility and improved robustness. From the uncertainty analysis, we conclude that the method with auto-calibration is robust in practice. Results are shown based upon realistic tracking data from a variety of scenes.
%B IEEE Transactions on Image Processing
%V 19
%P 2221 - 2232
%8 2010/08//
%@ 1057-7149
%G eng
%N 8
%R 10.1109/TIP.2010.2046368
%0 Journal Article
%J IEEE Transactions on Multimedia
%D 2010
%T Video Précis: Highlighting Diverse Aspects of Videos
%A Shroff, N.
%A Turaga,P.
%A Chellapa, Rama
%K $K$-means
%K CAMERAS
%K combinatorial mathematics
%K combinatorial optimization
%K Cost function
%K data compression
%K Exemplar selection
%K Image segmentation
%K Internet
%K Iron
%K Length measurement
%K multimedia systems
%K Ncut
%K optimisation
%K Optimization methods
%K original video
%K Permission
%K shot segmentation
%K Surveillance
%K user specified summary length
%K video précis
%K Video sharing
%K video signal processing
%K Video summarization
%X Summarizing long unconstrained videos is gaining importance in surveillance, web-based video browsing, and video-archival applications. Summarizing a video requires one to identify key aspects that contain the essence of the video. In this paper, we propose an approach that optimizes two criteria that a video summary should embody. The first criterion, “coverage,” requires that the summary be able to represent the original video well. The second criterion, “diversity,” requires that the elements of the summary be as distinct from each other as possible. Given a user-specified summary length, we propose a cost function to measure the quality of a summary. The problem of generating a précis is then reduced to a combinatorial optimization problem of minimizing the proposed cost function. We propose an efficient method to solve the optimization problem. We demonstrate through experiments (on KTH data, unconstrained skating video, a surveillance video, and a YouTube home video) that optimizing the proposed criterion results in meaningful video summaries over a wide range of scenarios. Summaries thus generated are then evaluated using both quantitative measures and user studies.
%B IEEE Transactions on Multimedia
%V 12
%P 853 - 868
%8 2010/12//
%@ 1520-9210
%G eng
%N 8
%R 10.1109/TMM.2010.2058795
%0 Journal Article
%J IEEE Transactions on Pattern Analysis and Machine Intelligence
%D 2009
%T Robust Wavelet-Based Super-Resolution Reconstruction: Theory and Algorithm
%A Hui Ji
%A Fermüller, Cornelia
%K batch algorithm
%K better-conditioned iterative back projection scheme
%K Enhancement
%K homography estimation
%K image denoising
%K image denoising scheme
%K image frame alignment
%K Image processing software
%K Image reconstruction
%K image resolution
%K image sequence
%K Image sequences
%K iterative methods
%K regularization criteria
%K robust wavelet-based iterative super-resolution reconstruction
%K surface normal vector
%K video formation analysis
%K video sequence
%K video signal processing
%K Wavelet transforms
%X We present an analysis and algorithm for the problem of super-resolution imaging, that is the reconstruction of HR (high-resolution) images from a sequence of LR (low-resolution) images. Super-resolution reconstruction entails solutions to two problems. One is the alignment of image frames. The other is the reconstruction of a HR image from multiple aligned LR images. Both are important for the performance of super-resolution imaging. Image alignment is addressed with a new batch algorithm, which simultaneously estimates the homographies between multiple image frames by enforcing the surface normal vectors to be the same. This approach can handle longer video sequences quite well. Reconstruction is addressed with a wavelet-based iterative reconstruction algorithm with an efficient de-noising scheme. The technique is based on a new analysis of video formation. At a high level our method could be described as a better-conditioned iterative back projection scheme with an efficient regularization criteria in each iteration step. Experiments with both simulated and real data demonstrate that our approach has better performance than existing super-resolution methods. It can remove even large amounts of mixed noise without creating artifacts.
%B IEEE Transactions on Pattern Analysis and Machine Intelligence
%V 31
%P 649 - 660
%8 2009/04//
%@ 0162-8828
%G eng
%N 4
%R 10.1109/TPAMI.2008.103
%0 Conference Paper
%B Biometrics: Theory, Applications, and Systems, 2009. BTAS '09. IEEE 3rd International Conference on
%D 2009
%T Sparsity inspired selection and recognition of iris images
%A Pillai,J.K.
%A Patel, Vishal M.
%A Chellapa, Rama
%K biometrics (access control)
%K image recognition
%K Image segmentation
%K iris image recognition
%K iris image selection
%K iris video stream
%K sparsity inspired selection
%K video signal processing
%K video streaming
%X Iris images acquired from a partially cooperating subject often suffer from blur, occlusion due to eyelids, and specular reflections. The performance of existing iris recognition systems degrade significantly on these images. Hence it is essential to select good images from the incoming iris video stream, before they are input to the recognition algorithm. In this paper, we propose a sparsity based algorithm for selection of good iris images and their subsequent recognition. Unlike most existing algorithms for iris image selection, our method can handle segmentation errors and a wider range of acquisition artifacts common in iris image capture. We perform selection and recognition in a single step which is more efficient than devising separate specialized algorithms for the two. Recognition from partially cooperating users is a significant step towards deploying iris systems in a wide variety of applications.
%B Biometrics: Theory, Applications, and Systems, 2009. BTAS '09. IEEE 3rd International Conference on
%P 1 - 6
%8 2009/09//
%G eng
%R 10.1109/BTAS.2009.5339067
%0 Conference Paper
%B IEEE/RSJ International Conference on Intelligent Robots and Systems, 2007. IROS 2007
%D 2007
%T Combining motion from texture and lines for visual navigation
%A Bitsakos,K.
%A Li Yi
%A Fermüller, Cornelia
%K 3D structure information
%K CAMERAS
%K Computer vision
%K extended Kalman filter
%K Frequency
%K image frequencies
%K Image motion analysis
%K Image texture
%K Kalman filters
%K Layout
%K motion control
%K Motion estimation
%K Navigation
%K Optical computing
%K phase correlation
%K piecewise planar scene
%K Robustness
%K Simultaneous localization and mapping
%K Speech processing
%K textured plane
%K video signal processing
%K visual navigation
%X Two novel methods for computing 3D structure information from video for a piecewise planar scene are presented. The first method is based on a new line constraint, which clearly separates the estimation of distance from the estimation of slant. The second method exploits the concepts of phase correlation to compute from the change of image frequencies of a textured plane, distance and slant information. The two different estimates together with structure estimates from classical image motion are combined and integrated over time using an extended Kalman filter. The estimation of the scene structure is demonstrated experimentally in a motion control algorithm that allows the robot to move along a corridor. We demonstrate the efficacy of each individual method and their combination and show that the method allows for visual navigation in textured as well as un-textured environments.
%B IEEE/RSJ International Conference on Intelligent Robots and Systems, 2007. IROS 2007
%I IEEE
%P 232 - 239
%8 2007/11/29/Oct.
%@ 978-1-4244-0912-9
%G eng
%R 10.1109/IROS.2007.4399568
%0 Journal Article
%J IEEE Transactions on Pattern Analysis and Machine Intelligence
%D 2006
%T A 3D shape constraint on video
%A Hui Ji
%A Fermüller, Cornelia
%K 3D motion estimation
%K algorithms
%K Artificial intelligence
%K CAMERAS
%K decoupling translation from rotation
%K Estimation error
%K Fluid flow measurement
%K Image Enhancement
%K Image Interpretation, Computer-Assisted
%K Image reconstruction
%K Imaging, Three-Dimensional
%K Information Storage and Retrieval
%K integration of motion fields
%K Layout
%K minimisation
%K Minimization methods
%K Motion estimation
%K multiple motion fields
%K parameter estimation
%K Pattern Recognition, Automated
%K Photography
%K practical constrained minimization
%K SHAPE
%K shape and rotation.
%K shape vectors
%K stability
%K structure estimation
%K surface normals
%K Three-dimensional motion estimation
%K video 3D shape constraint
%K Video Recording
%K video signal processing
%X We propose to combine the information from multiple motion fields by enforcing a constraint on the surface normals (3D shape) of the scene in view. The fact that the shape vectors in the different views are related only by rotation can be formulated as a rank = 3 constraint. This constraint is implemented in an algorithm which solves 3D motion and structure estimation as a practical constrained minimization. Experiments demonstrate its usefulness as a tool in structure from motion providing very accurate estimates of 3D motion.
%B IEEE Transactions on Pattern Analysis and Machine Intelligence
%V 28
%P 1018 - 1023
%8 2006/06//
%@ 0162-8828
%G eng
%N 6
%R 10.1109/TPAMI.2006.109
%0 Journal Article
%J Image Processing, IEEE Transactions on
%D 2005
%T Statistical bias in 3-D reconstruction from a monocular video
%A Roy-Chowdhury, A.K.
%A Chellapa, Rama
%K 3D face models
%K 3D video reconstruction
%K algorithms
%K artifacts
%K Artificial intelligence
%K Automated;Signal Processing
%K bias compensation
%K bundle adjustment
%K camera motion estimation
%K Computer simulation
%K Computer-Assisted;Imaging
%K Computer-Assisted;Subtraction Technique;Video Recording;
%K depth estimate
%K error covariance estimation
%K error statistics
%K generalized Cramer-Rao lower bound
%K Image Enhancement
%K Image Interpretation
%K Image reconstruction
%K initialization procedures
%K least squares approximations
%K linear least-squares framework
%K monocular video
%K motion compensation
%K Motion estimation
%K statistical bias
%K Statistical;Pattern Recognition
%K structure from motion algorithms
%K Three-Dimensional;Information Storage and Retrieval;Models
%K video signal processing
%X The present state-of-the-art in computing the error statistics in three-dimensional (3-D) reconstruction from video concentrates on estimating the error covariance. A different source of error which has not received much attention is the fact that the reconstruction estimates are often significantly statistically biased. In this paper, we derive a precise expression for the bias in the depth estimate, based on the continuous (differentiable) version of structure from motion (SfM). Many SfM algorithms, or certain portions of them, can be posed in a linear least-squares (LS) framework Ax=b. Examples include initialization procedures for bundle adjustment or algorithms that alternately estimate depth and camera motion. It is a well-known fact that the LS estimate is biased if the system matrix A is noisy. In SfM, the matrix A contains point correspondences, which are always difficult to obtain precisely; thus, it is expected that the structure and motion estimates in such a formulation of the problem would be biased. Existing results on the minimum achievable variance of the SfM estimator are extended by deriving a generalized Cramer-Rao lower bound. A detailed analysis of the effect of various camera motion parameters on the bias is presented. We conclude by presenting the effect of bias compensation on reconstructing 3-D face models from rendered images.
%B Image Processing, IEEE Transactions on
%V 14
%P 1057 - 1062
%8 2005/08//
%@ 1057-7149
%G eng
%N 8
%R 10.1109/TIP.2005.849775
%0 Conference Paper
%B 2004 IEEE International Conference on Communications
%D 2004
%T Automatic video summarization for wireless and mobile environments
%A Yong Rao
%A Mundur, Padma
%A Yesha,Y.
%K automatic video summarization
%K batch processing
%K batch processing (computers)
%K Clustering algorithms
%K Clustering methods
%K clustering scheme
%K Computer science
%K Delaunay diagram
%K graph theory
%K Gunshot detection systems
%K Image sequences
%K mesh generation
%K Mobile computing
%K mobile radio
%K multidimensional point data cluster
%K Multidimensional systems
%K Multimedia communication
%K video clip
%K video frame content
%K Video sequences
%K video signal processing
%K wireless mobile environment
%X In this paper, we propose a novel video summarization technique using which we can automatically generate high quality video summaries suitable for wireless and mobile environments. The significant contribution of this paper lies in the proposed clustering scheme. We use Delaunay diagrams to cluster multidimensional point data corresponding to the frame contents of the video. In contrast to the existing clustering techniques used for summarization, our clustering algorithm is fully automatic and well suited for batch processing. We illustrate the quality of our clustering and summarization scheme in an experiment using several video clips.
%B 2004 IEEE International Conference on Communications
%I IEEE
%V 3
%P 1532- 1536 Vol.3 - 1532- 1536 Vol.3
%8 2004/06/20/24
%@ 0-7803-8533-0
%G eng
%R 10.1109/ICC.2004.1312767
%0 Journal Article
%J Image Processing, IEEE Transactions on
%D 2004
%T An information theoretic criterion for evaluating the quality of 3-D reconstructions from video
%A Roy-Chowdhury, A.K.
%A Chellapa, Rama
%K 3D reconstruction
%K algorithms
%K Artificial intelligence
%K Automated;Reproducibility of Results;Sensitivity and Specificity;Signal Processing
%K Computer Graphics
%K Computer-Assisted;Imaging
%K Computer-Assisted;Software Validation;Subtraction Technique;Video Recording;
%K Image Enhancement
%K Image Interpretation
%K Image reconstruction
%K Image sequences
%K information theoretic criterion
%K Mutual information
%K NOISE
%K noise distribution
%K optical flow equations
%K second order moments
%K statistical analysis
%K Three-Dimensional;Information Storage and Retrieval;Information Theory;Movement;Pattern Recognition
%K Video sequences
%K video signal processing
%X Even though numerous algorithms exist for estimating the three-dimensional (3-D) structure of a scene from its video, the solutions obtained are often of unacceptable quality. To overcome some of the deficiencies, many application systems rely on processing more data than necessary, thus raising the question: how is the accuracy of the solution related to the amount of data processed by the algorithm? Can we automatically recognize situations where the quality of the data is so bad that even a large number of additional observations will not yield the desired solution? Previous efforts to answer this question have used statistical measures like second order moments. They are useful if the estimate of the structure is unbiased and the higher order statistical effects are negligible, which is often not the case. This paper introduces an alternative information-theoretic criterion for evaluating the quality of a 3-D reconstruction. The accuracy of the reconstruction is judged by considering the change in mutual information (MI) (termed as the incremental MI) between a scene and its reconstructions. An example of 3-D reconstruction from a video sequence using optical flow equations and known noise distribution is considered and it is shown how the MI can be computed from first principles. We present simulations on both synthetic and real data to demonstrate the effectiveness of the proposed criterion.
%B Image Processing, IEEE Transactions on
%V 13
%P 960 - 973
%8 2004/07//
%@ 1057-7149
%G eng
%N 7
%R 10.1109/TIP.2004.827240
%0 Conference Paper
%B IEEE Workshop on Detection and Recognition of Events in Video, 2001. Proceedings
%D 2001
%T Multimodal 3-D tracking and event detection via the particle filter
%A Zotkin,Dmitry N
%A Duraiswami, Ramani
%A Davis, Larry S.
%K algorithms
%K APPROACH
%K audio data collection
%K audio signal processing
%K Bayesian inference
%K Bayesian methods
%K belief networks
%K CAMERAS
%K capture
%K conversation
%K echo
%K Educational institutions
%K Event detection
%K event occurrence
%K filtering theory
%K flying echo locating bat behaviour
%K Image motion analysis
%K inference mechanisms
%K Laboratories
%K microphone arrays
%K moving object tracking
%K moving participants
%K moving prey
%K multimodal 3D tracking
%K multiple cameras
%K Object detection
%K particle filter
%K Particle filters
%K Particle tracking
%K Robustness
%K search
%K smart video conferencing setup
%K target tracking
%K Teleconferencing
%K tracking filters
%K turn-taking detection
%K video data collection
%K video signal processing
%X Determining the occurrence of an event is fundamental to developing systems that can observe and react to them. Often, this determination is based on collecting video and/or audio data and determining the state or location of a tracked object. We use Bayesian inference and the particle filter for tracking moving objects, using both video data obtained from multiple cameras and audio data obtained using arrays of microphones. The algorithms developed are applied to determining events arising in two fields of application. In the first, the behavior of a flying echo locating bat as it approaches a moving prey is studied, and the events of search, approach and capture are detected. In a second application we describe detection of turn-taking in a conversation between possibly moving participants recorded using a smart video conferencing setup
%B IEEE Workshop on Detection and Recognition of Events in Video, 2001. Proceedings
%I IEEE
%P 20 - 27
%8 2001///
%@ 0-7695-1293-3
%G eng
%R 10.1109/EVENT.2001.938862
%0 Conference Paper
%B Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. CVPR 2001
%D 2001
%T A spherical eye from multiple cameras (makes better models of the world)
%A Baker, P.
%A Fermüller, Cornelia
%A Aloimonos, J.
%A Pless, R.
%K 3D motion estimation
%K Calibration
%K camera network
%K CAMERAS
%K Computer vision
%K egomotion recovery
%K geometric configuration
%K geometric constraint
%K image gradients
%K image sampling
%K imaging system
%K Laboratories
%K Layout
%K Motion estimation
%K multiple cameras
%K Pixel
%K Robot vision systems
%K SHAPE
%K shape models
%K Space technology
%K spherical eye
%K system calibration
%K video
%K video cameras
%K video signal processing
%K visual sphere sampling
%X The paper describes an imaging system that has been designed specifically for the purpose of recovering egomotion and structure from video. The system consists of six cameras in a network arranged so that they sample different parts of the visual sphere. This geometric configuration has provable advantages compared to small field of view cameras for the estimation of the system's own motion and consequently the estimation of shape models from the individual cameras. The reason is that inherent ambiguities of confusion between translation and rotation disappear. We provide algorithms for the calibration of the system and 3D motion estimation. The calibration is based on a new geometric constraint that relates the images of lines parallel in space to the rotation between the cameras. The 3D motion estimation uses a constraint relating structure directly to image gradients.
%B Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. CVPR 2001
%I IEEE
%V 1
%P I-576- I-583 vol.1 - I-576- I-583 vol.1
%8 2001///
%@ 0-7695-1272-0
%G eng
%R 10.1109/CVPR.2001.990525