%0 Journal Article
%J Image Processing, IEEE Transactions on
%D 2011
%T Example-Driven Manifold Priors for Image Deconvolution
%A Ni,Jie
%A Turaga,P.
%A Patel, Vishal M.
%A Chellapa, Rama
%K Bayesian
%K cross-validation
%K data;Bayes
%K deconvolution;image
%K determination;deblurring
%K estimation;
%K framework;GCV
%K function;automatic
%K function;image
%K image
%K image;patch-manifold
%K manifold
%K method;iteration
%K method;natural
%K methods;deconvolution;image
%K methods;natural
%K parameter
%K prior;generalized
%K prior;unlabeled
%K problem
%K regularization
%K regularization;example-driven
%K restoration
%K restoration;iterative
%K scenes;parameter
%X Image restoration methods that exploit prior information about images to be estimated have been extensively studied, typically using the Bayesian framework. In this paper, we consider the role of prior knowledge of the object class in the form of a patch manifold to address the deconvolution problem. Specifically, we incorporate unlabeled image data of the object class, say natural images, in the form of a patch-manifold prior for the object class. The manifold prior is implicitly estimated from the given unlabeled data. We show how the patch-manifold prior effectively exploits the available sample class data for regularizing the deblurring problem. Furthermore, we derive a generalized cross-validation (GCV) function to automatically determine the regularization parameter at each iteration without explicitly knowing the noise variance. Extensive experiments show that this method performs better than many competitive image deconvolution methods.
%B Image Processing, IEEE Transactions on
%V 20
%P 3086 - 3096
%8 2011/11//
%@ 1057-7149
%G eng
%N 11
%R 10.1109/TIP.2011.2145386
%0 Journal Article
%J Image Processing, IEEE Transactions on
%D 2011
%T Illumination Recovery From Image With Cast Shadows Via Sparse Representation
%A Mei,Xue
%A Ling,Haibin
%A Jacobs, David W.
%K #x2113;_{1}-regularized
%K approximations;
%K coding;image
%K compression;image
%K constraints;sparse
%K formulation;Lambertian
%K illumination
%K image
%K least-square
%K light
%K linear
%K reconstruction;image
%K recovery;low-dimensional
%K representation;data
%K representation;least
%K scene;cast
%K sensing;directional
%K shadows;compressive
%K sources;image
%K squares
%K subspaces;nonnegativity
%X In this paper, we propose using sparse representation for recovering the illumination of a scene from a single image with cast shadows, given the geometry of the scene. The images with cast shadows can be quite complex and, therefore, cannot be well approximated by low-dimensional linear subspaces. However, it can be shown that the set of images produced by a Lambertian scene with cast shadows can be efficiently represented by a sparse set of images generated by directional light sources. We first model an image with cast shadows composed of a diffusive part (without cast shadows) and a residual part that captures cast shadows. Then, we express the problem in an * #x2113;*_{1}-regularized least-squares formulation, with nonnegativity constraints (as light has to be non-negative at any point in space). This sparse representation enjoys an effective and fast solution thanks to recent advances in compressive sensing. In experiments on synthetic and real data, our approach performs favorably in comparison with several previously proposed methods.
%B Image Processing, IEEE Transactions on
%V 20
%P 2366 - 2377
%8 2011/08//
%@ 1057-7149
%G eng
%N 8
%R 10.1109/TIP.2011.2118222
%0 Conference Paper
%B Image Processing (ICIP), 2011 18th IEEE International Conference on
%D 2011
%T No-reference image quality assessment based on visual codebook
%A Ye,Peng
%A David Doermann
%K assessment;quality
%K codebook;Gabor
%K descriptors;complex
%K estimation;visual
%K extraction;image
%K filter;appearance
%K filters;feature
%K gabor
%K image
%K image;no-reference
%K patches;natural
%K QUALITY
%K statistics;local
%K texture;
%X In this paper, we propose a new learning based No-Reference Image Quality Assessment (NR-IQA) algorithm, which uses a visual codebook consisting of robust appearance descriptors extracted from local image patches to capture complex statistics of natural image for quality estimation. We use Gabor filter based local features as appearance descriptors and the codebook method encodes the statistics of natural image classes by vector quantizing the feature space and accumulating histograms of patch appearances based on this coding. This method does not assume any specific types of distortion and experimental results on the LIVE image quality assessment database show that this method provides consistent and reliable performance in quality estimation that exceeds other state-of-the-art NR-IQA approaches and is competitive with the full reference measure PSNR.
%B Image Processing (ICIP), 2011 18th IEEE International Conference on
%P 3089 - 3092
%8 2011/09//
%G eng
%R 10.1109/ICIP.2011.6116318
%0 Conference Paper
%B Privacy, security, risk and trust (passat), 2011 ieee third international conference on and 2011 ieee third international conference on social computing (socialcom)
%D 2011
%T Odd Leaf Out: Improving Visual Recognition with Games
%A Hansen,D. L
%A Jacobs, David W.
%A Lewis,D.
%A Biswas,A.
%A Preece,J.
%A Rotman,D.
%A Stevens,E.
%K algorithm;educational
%K classification;object
%K computational
%K computing;botany;computer
%K datasets;misclassification
%K errors;scientific
%K feedback;labeled
%K game;human
%K games;computer
%K games;image
%K image
%K leaf
%K Odd
%K Out;complex
%K recognition;
%K recognition;biology
%K tags;visual
%K tasks;computer
%K tasks;textual
%K VISION
%X A growing number of projects are solving complex computational and scientific tasks by soliciting human feedback through games. Many games with a purpose focus on generating textual tags for images. In contrast, we introduce a new game, Odd Leaf Out, which provides players with an enjoyable and educational game that serves the purpose of identifying misclassification errors in a large database of labeled leaf images. The game uses a novel mechanism to solicit useful information from players' incorrect answers. A study of 165 players showed that game data can be used to identify mislabeled leaves much more quickly than would have been possible using a computer vision algorithm alone. Domain novices and experts were equally good at identifying mislabeled images, although domain experts enjoyed the game more. We discuss the successes and challenges of this new game, which can be applied to other domains with labeled image datasets.
%B Privacy, security, risk and trust (passat), 2011 ieee third international conference on and 2011 ieee third international conference on social computing (socialcom)
%P 87 - 94
%8 2011/10//
%G eng
%R 10.1109/PASSAT/SocialCom.2011.225
%0 Journal Article
%J Visualization and Computer Graphics, IEEE Transactions on
%D 2011
%T Saliency-Assisted Navigation of Very Large Landscape Images
%A Ip, Cheuk Yiu
%A Varshney, Amitabh
%K acquisition;data
%K acquisition;saliency
%K analysis;
%K assisted
%K image
%K images;robotic
%K Internet;camera
%K navigation;statistical
%K processing;image
%K resolution;image
%K resolution;interactive
%K sensors;image
%K sensors;statistical
%K signatures;data
%K visualisation;geophysical
%K visualization;landscape
%X The field of visualization has addressed navigation of very large datasets, usually meshes and volumes. Significantly less attention has been devoted to the issues surrounding navigation of very large images. In the last few years the explosive growth in the resolution of camera sensors and robotic image acquisition techniques has widened the gap between the display and image resolutions to three orders of magnitude or more. This paper presents the first steps towards navigation of very large images, particularly landscape images, from an interactive visualization perspective. The grand challenge in navigation of very large images is identifying regions of potential interest. In this paper we outline a three-step approach. In the first step we use multi-scale saliency to narrow down the potential areas of interest. In the second step we outline a method based on statistical signatures to further cull out regions of high conformity. In the final step we allow a user to interactively identify the exceptional regions of high interest that merit further attention. We show that our approach of progressive elicitation is fast and allows rapid identification of regions of interest. Unlike previous work in this area, our approach is scalable and computationally reasonable on very large images. We validate the results of our approach by comparing them to user-tagged regions of interest on several very large landscape images from the Internet.
%B Visualization and Computer Graphics, IEEE Transactions on
%V 17
%P 1737 - 1746
%8 2011/12//
%@ 1077-2626
%G eng
%N 12
%R 10.1109/TVCG.2011.231
%0 Conference Paper
%B Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on
%D 2011
%T Trainable 3D recognition using stereo matching
%A Castillo,C. D
%A Jacobs, David W.
%K 2D
%K 3D
%K class
%K classification
%K classification;image
%K data
%K dataset;CMU
%K dataset;face
%K descriptor;occlusion;pose
%K estimation;solid
%K image
%K image;3D
%K matching;pose
%K matching;trainable
%K modelling;stereo
%K object
%K PIE
%K processing;
%K recognition;face
%K recognition;image
%K set;3D
%K variation;stereo
%X Stereo matching has been used for face recognition in the presence of pose variation. In this approach, stereo matching is used to compare two 2-D images based on correspondences that reflect the effects of viewpoint variation and allow for occlusion. We show how to use stereo matching to derive image descriptors that can be used to train a classifier. This improves face recognition performance, producing the best published results on the CMU PIE dataset. We also demonstrate that classification based on stereo matching can be used for general object classification in the presence of pose variation. In preliminary experiments we show promising results on the 3D object class dataset, a standard, challenging 3D classification data set.
%B Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on
%P 625 - 631
%8 2011///
%G eng
%R 10.1109/ICCVW.2011.6130301
%0 Conference Paper
%B Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on
%D 2011
%T Wide-baseline stereo for face recognition with large pose variation
%A Castillo,C. D
%A Jacobs, David W.
%K 2D
%K algorithm;frontal
%K dataset;dynamic
%K estimation;stereo
%K Face
%K image
%K image;near
%K image;pose
%K MATCHING
%K matching;pose
%K matching;surface
%K method;dynamic
%K performance;stereo
%K PIE
%K processing;
%K profile
%K Programming
%K programming;face
%K recognition;CMU
%K recognition;image
%K slant;wide-baseline
%K stereo
%K stereo;window-based
%K variation;recognition
%X 2-D face recognition in the presence of large pose variations presents a significant challenge. When comparing a frontal image of a face to a near profile image, one must cope with large occlusions, non-linear correspondences, and significant changes in appearance due to viewpoint. Stereo matching has been used to handle these problems, but performance of this approach degrades with large pose changes. We show that some of this difficulty is due to the effect that foreshortening of slanted surfaces has on window-based matching methods, which are needed to provide robustness to lighting change. We address this problem by designing a new, dynamic programming stereo algorithm that accounts for surface slant. We show that on the CMU PIE dataset this method results in significant improvements in recognition performance.
%B Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on
%P 537 - 544
%8 2011/06//
%G eng
%R 10.1109/CVPR.2011.5995559
%0 Conference Paper
%B Information Forensics and Security (WIFS), 2010 IEEE International Workshop on
%D 2010
%T Semi non-intrusive training for cell-phone camera model linkage
%A Chuang,Wei-Hong
%A M. Wu
%K accuracy;training
%K analysis;cameras;cellular
%K analysis;image
%K camera
%K Color
%K colour
%K complexity;training
%K content
%K dependency;variance
%K feature;cell
%K forensics;digital
%K forensics;image
%K image
%K Interpolation
%K linkage;component
%K matching;interpolation;
%K matching;semi
%K model
%K nonintrusive
%K phone
%K radio;computer
%K training;testing
%X This paper presents a study of cell-phone camera model linkage that matches digital images against potential makes / models of cell-phone camera sources using camera color interpolation features. The matching performance is examined and the dependency on the content of training image collection is evaluated via variance analysis. Training content dependency can be dealt with under the framework of component forensics, where cell-phone camera model linkage is seen as a combination of semi non-intrusive training and completely non-intrusive testing. Such a viewpoint suggests explicitly the goodness criterion of testing accuracy for training data selection. It also motivates other possible alternative training procedures based on different criteria, such as the training complexity, for which preliminary but promising experiment designs and results have been obtained.
%B Information Forensics and Security (WIFS), 2010 IEEE International Workshop on
%P 1 - 6
%8 2010/12//
%G eng
%R 10.1109/WIFS.2010.5711468
%0 Journal Article
%J Signal Processing Magazine, IEEE
%D 2009
%T Component forensics
%A Swaminathan,A.
%A M. Wu
%A Liu,K. J.R
%K ACQUISITION
%K component
%K data;
%K forensics;digital
%K image
%K of
%K processing;image
%K sensor
%K sensors;security
%K technique;visual
%K technology;image
%X Visual sensor technologies have experienced tremendous growth in recent decades, and digital devices are becoming ubiquitous. Digital images taken by various imaging devices have been used in a growing number of applications, from military and reconnaissance to medical diagnosis and consumer photography. Consequently, a series of new forensic issues arise amidst such rapid advancement and widespread adoption of imaging technologies. For example, one can readily ask what kinds of hardware and software components as well as their parameters have been employed inside these devices? Given a digital image, which imaging sensor or which brand of sensor was used to acquire the image? How was the image acquired? Was it captured using a digital camera, cell phone camera, image scanner, or was it created artificially using an imageediting software? Has the image undergone any manipulation after capture? Is it authentic, or has it been tampered in any way? Does it contain any hidden information or steganographic data? Many of these forensic questions are related to tracing the origin of the digital image to its creation process. Evidence obtained from such analysis would provide useful forensic information to law enforcement, security, and intelligence agencies. Knowledge of image acquisition techniques can also help answer further forensic questions regarding the nature of additional processing that the image might have undergone after capture.
%B Signal Processing Magazine, IEEE
%V 26
%P 38 - 48
%8 2009/03//
%@ 1053-5888
%G eng
%N 2
%R 10.1109/MSP.2008.931076
%0 Conference Paper
%B Image Processing (ICIP), 2009 16th IEEE International Conference on
%D 2009
%T How would you look as you age ?
%A Ramanathan,N.
%A Chellapa, Rama
%K age-separated
%K appearances;facial
%K database;face
%K Face
%K growth
%K image
%K model;face
%K model;facial
%K models;facial
%K recognition;image
%K SHAPE
%K TEXTURE
%K texture;
%K transformation
%K verification;facial
%X Facial appearances change with increase in age. While generic growth patterns that are characteristic of different age groups can be identified, facial growth is also observed to be influenced by individual-specific attributes such as one's gender, ethnicity, life-style etc. In this paper, we propose a facial growth model that comprises of transformation models for facial shape and texture. We collected empirical data pertaining to facial growth from a database of age-separated face images of adults and used the same in developing the aforementioned transformation models. The proposed model finds applications in predicting one's appearance across ages and in performing face verification across ages.
%B Image Processing (ICIP), 2009 16th IEEE International Conference on
%P 53 - 56
%8 2009/11//
%G eng
%R 10.1109/ICIP.2009.5413998
%0 Conference Paper
%B Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
%D 2009
%T Tampering identification using Empirical Frequency Response
%A Chuang,Wei-Hong
%A Swaminathan,A.
%A M. Wu
%K aproach;JPEG
%K coding;multimedia
%K compression;frequency
%K compression;LSI
%K computing;security
%K data;
%K editing
%K EFR
%K forensics;tampering-type
%K Frequency
%K identification;data
%K image
%K invariant;multimedia
%K of
%K response;image
%K response;linear
%K shift
%K software;empirical
%K system;digital
%X With the widespread popularity of digital images and the presence of easy-to-use image editing software, content integrity can no longer be taken for granted, and there is a strong need for techniques that not only detect the presence of tampering but also identify its type. This paper focusses on tampering-type identification and introduces a new approach based on the empirical frequency response (EFR) to address this problem. We show that several types of tampering operations, both linear shift invariant (LSI) and non-LSI, can be characterized consistently and distinctly by their EFRs. We then extend the approach to estimate the EFR for scenarios where only the final image is available. Theoretical reasoning supported by experimental results verify the effectiveness of this method for identifying the type of a tampering operation.
%B Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on
%P 1517 - 1520
%8 2009/04//
%G eng
%R 10.1109/ICASSP.2009.4959884
%0 Conference Paper
%B Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on
%D 2008
%T Action recognition using ballistic dynamics
%A Vitaladevuni,S.N.
%A Kellokumpu,V.
%A Davis, Larry S.
%K analysis;image
%K Bayesian
%K dynamics;gesture
%K feature;person-centric
%K framework;action
%K History
%K image
%K labels;psycho-kinesiological
%K morphological
%K MOTION
%K Movement
%K movements;motion
%K planning;interactive
%K processing;
%K recognition
%K recognition;ballistic
%K recognition;image
%K segmentation;video
%K signal
%K studies;image
%K task;human
%X We present a Bayesian framework for action recognition through ballistic dynamics. Psycho-kinesiological studies indicate that ballistic movements form the natural units for human movement planning. The framework leads to an efficient and robust algorithm for temporally segmenting videos into atomic movements. Individual movements are annotated with person-centric morphological labels called ballistic verbs. This is tested on a dataset of interactive movements, achieving high recognition rates. The approach is also applied on a gesture recognition task, improving a previously reported recognition rate from 84% to 92%. Consideration of ballistic dynamics enhances the performance of the popular Motion History Image feature. We also illustrate the approachpsilas general utility on real-world videos. Experiments indicate that the method is robust to view, style and appearance variations.
%B Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on
%P 1 - 8
%8 2008/06//
%G eng
%R 10.1109/CVPR.2008.4587806
%0 Conference Paper
%B Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on
%D 2008
%T Compressed sensing for multi-view tracking and 3-D voxel reconstruction
%A Reddy, D.
%A Sankaranarayanan,A. C
%A Cevher, V.
%A Chellapa, Rama
%K 3D
%K background-subtracted
%K coding;
%K Estimation
%K image
%K problems;multi-view
%K projections;silhouette
%K reconstruction;CS
%K reconstruction;video
%K sensing;multi-view
%K silhouettes;image
%K sparsity;sparse
%K theory;compressed
%K tracking;random
%K voxel
%X Compressed sensing (CS) suggests that a signal, sparse in some basis, can be recovered from a small number of random projections. In this paper, we apply the CS theory on sparse background-subtracted silhouettes and show the usefulness of such an approach in various multi-view estimation problems. The sparsity of the silhouette images corresponds to sparsity of object parameters (location, volume etc.) in the scene. We use random projections (compressed measurements) of the silhouette images for directly recovering object parameters in the scene coordinates. To keep the computational requirements of this recovery procedure reasonable, we tessellate the scene into a bunch of non-overlapping lines and perform estimation on each of these lines. Our method is scalable in the number of cameras and utilizes very few measurements for transmission among cameras. We illustrate the usefulness of our approach for multi-view tracking and 3-D voxel reconstruction problems.
%B Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on
%P 221 - 224
%8 2008/10//
%G eng
%R 10.1109/ICIP.2008.4711731
%0 Journal Article
%J Information Forensics and Security, IEEE Transactions on
%D 2008
%T Digital image forensics via intrinsic fingerprints
%A Swaminathan,A.
%A M. Wu
%A Liu,K. J.R
%K ACQUISITION
%K analysis;intrinsic
%K approximation;cameras;digital
%K camera
%K deconvolution;digital
%K devices;blind
%K fingerprints;time
%K forensics;forensic
%K identification;image
%K image
%K images;digital
%K invariant
%K photography;fingerprint
%K sensors;
%X Digital imaging has experienced tremendous growth in recent decades, and digital camera images have been used in a growing number of applications. With such increasing popularity and the availability of low-cost image editing software, the integrity of digital image content can no longer be taken for granted. This paper introduces a new methodology for the forensic analysis of digital camera images. The proposed method is based on the observation that many processing operations, both inside and outside acquisition devices, leave distinct intrinsic traces on digital images, and these intrinsic fingerprints can be identified and employed to verify the integrity of digital data. The intrinsic fingerprints of the various in-camera processing operations can be estimated through a detailed imaging model and its component analysis. Further processing applied to the camera captured image is modelled as a manipulation filter, for which a blind deconvolution technique is applied to obtain a linear time-invariant approximation and to estimate the intrinsic fingerprints associated with these postcamera operations. The absence of camera-imposed fingerprints from a test image indicates that the test image is not a camera output and is possibly generated by other image production processes. Any change or inconsistencies among the estimated camera-imposed fingerprints, or the presence of new types of fingerprints suggest that the image has undergone some kind of processing after the initial capture, such as tampering or steganographic embedding. Through analysis and extensive experimental studies, this paper demonstrates the effectiveness of the proposed framework for nonintrusive digital image forensics.
%B Information Forensics and Security, IEEE Transactions on
%V 3
%P 101 - 117
%8 2008/03//
%@ 1556-6013
%G eng
%N 1
%R 10.1109/TIFS.2007.916010
%0 Conference Paper
%B Biometrics: Theory, Applications and Systems, 2008. BTAS 2008. 2nd IEEE International Conference on
%D 2008
%T A Non-generative Approach for Face Recognition Across Aging
%A Biswas,S.
%A Aggarwal,G.
%A Ramanathan,N.
%A Chellapa, Rama
%K appearance;nongenerative
%K approach;face
%K Face
%K image
%K matching;
%K recognition;facial
%K recognition;image
%K synthesis;face
%X Human faces undergo a lot of change in appearance as they age. Though facial aging has been studied for decades, it is only recently that attempts have been made to address the problem from a computational point of view. Most of these early efforts follow a simulation approach in which matching is performed by synthesizing face images at the target age. Given the innumerable different ways in which a face can potentially age, the synthesized aged image may not be similar to the actual aged image. In this paper, we bypass the synthesis step and directly analyze the drifts of facial features with aging from a purely matching perspective. Our analysis is based on the observation that facial appearance changes in a coherent manner as people age. We provide measures to capture this coherency in feature drifts. Illustrations and experimental results show the efficacy of such an approach for matching faces across age progression.
%B Biometrics: Theory, Applications and Systems, 2008. BTAS 2008. 2nd IEEE International Conference on
%P 1 - 6
%8 2008/10//undefin
%G eng
%R 10.1109/BTAS.2008.4699331
%0 Journal Article
%J Multimedia, IEEE Transactions on
%D 2008
%T Synthesis of Silhouettes and Visual Hull Reconstruction for Articulated Humans
%A Yue,Zhanfeng
%A Chellapa, Rama
%K active
%K algorithm;articulated
%K algorithm;inner
%K body
%K camera;visual
%K collection;virtual
%K computation;contour-based
%K Context
%K detection;image
%K distance
%K distance;turntable
%K estimation;shape
%K function
%K hull
%K human
%K image
%K image;approximate
%K localization
%K measurement;silhouette
%K measurement;turning
%K part
%K pose;circular
%K reality;
%K recognition;virtual
%K reconstruction;approximation
%K reconstruction;image
%K segmentation
%K segmentation;pose
%K SHAPE
%K similarity
%K synthesis;silhouette
%K technique;human
%K theory;cameras;edge
%K Trajectory
%X In this paper, we propose a complete framework for improved synthesis and understanding of the human pose from a limited number of silhouette images. It combines the active image-based visual hull (IBVH) algorithm and a contour-based body part segmentation technique. We derive a simple, approximate algorithm to decide the extrinsic parameters of a virtual camera, and synthesize the turntable image collection of the person using the IBVH algorithm by actively moving the virtual camera on a properly computed circular trajectory around the person. Using the turning function distance as the silhouette similarity measurement, this approach can be used to generate the desired pose-normalized images for recognition applications. In order to overcome the inability of the visual hull (VH) method to reconstruct concave regions, we propose a contour-based human body part localization algorithm to segment the silhouette images into convex body parts. The body parts observed from the virtual view are generated separately from the corresponding body parts observed from the input views and then assembled together for a more accurate VH reconstruction. Furthermore, the obtained turntable image collection helps to improve the body part segmentation and identification process. By using the inner distance shape context (IDSC) measurement, we are able to estimate the body part locations more accurately from a synthesized view where we can localize the body part more precisely. Experiments show that the proposed algorithm can greatly improve body part segmentation and hence shape reconstruction results.
%B Multimedia, IEEE Transactions on
%V 10
%P 1565 - 1577
%8 2008/12//
%@ 1520-9210
%G eng
%N 8
%R 10.1109/TMM.2008.2007321
%0 Conference Paper
%B Image Processing, 2007. ICIP 2007. IEEE International Conference on
%D 2007
%T Improving Embedding Payload in Binary Imageswith "Super-Pixels"
%A Gou,Hongmei
%A M. Wu
%K analysis;watermarking;
%K analysis;wet
%K authentication;embedding
%K authentication;text
%K binary
%K coding;cryptography;data
%K coding;message
%K encapsulation;document
%K hiding;document
%K image
%K paper
%K payload;steganography;text
%K processing;image
%K watermarking;data
%X Hiding data in binary images can facilitate authentication of important documents in the digital domain, which generally requires a high embedding payload. Recently, a steganography framework known as the wet paper coding has been employed in binary image watermarking to achieve high embedding payload. In this paper, we introduce a new concept of super-pixels, and study how to incorporate them in the framework of wet paper coding to further improve the embedding payload in binary images. Using binary text documents as an example, we demonstrate the effectiveness of the proposed super-pixel technique.
%B Image Processing, 2007. ICIP 2007. IEEE International Conference on
%V 3
%P III -277 -III -280 - III -277 -III -280
%8 2007/10/16/19
%G eng
%R 10.1109/ICIP.2007.4379300
%0 Conference Paper
%B Image Processing, 2007. ICIP 2007. IEEE International Conference on
%D 2007
%T Noise Features for Image Tampering Detection and Steganalysis
%A Gou,Hongmei
%A Swaminathan,A.
%A M. Wu
%K analysis;cryptography;data
%K analysis;image
%K analysis;wavelet
%K authenticity;hidden
%K computing;statistical
%K data;image
%K denoising
%K denoising;multimedia
%K detection;low-cost
%K editing
%K encapsulation;feature
%K extraction;image
%K features;steganalysis;wavelet
%K forensic
%K forensics;neighborhood
%K image
%K NOISE
%K operations;digital
%K prediction;statistical
%K softwares;multimedia
%K tampering
%K transforms;
%X With increasing availability of low-cost image editing softwares, the authenticity of digital images can no longer be taken for granted. Digital images have also been used as cover data for transmitting secret information in the field of steganography. In this paper, we introduce a new set of features for multimedia forensics to determine if a digital image is an authentic camera output or if it has been tampered or embedded with hidden data. We perform such image forensic analysis employing three sets of statistical noise features, including those from denoising operations, wavelet analysis, and neighborhood prediction. Our experimental results demonstrate that the proposed method can effectively distinguish digital images from their tampered or stego versions.
%B Image Processing, 2007. ICIP 2007. IEEE International Conference on
%V 6
%P VI -97 -VI -100 - VI -97 -VI -100
%8 2007/10/16/19
%G eng
%R 10.1109/ICIP.2007.4379530
%0 Conference Paper
%B Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on
%D 2007
%T Segmentation using Meta-texture Saliency
%A Yacoob,Yaser
%A Davis, Larry S.
%K analysis;image
%K colour
%K enhancement;image
%K image
%K image;salient
%K patches;image
%K segmentation;image
%K segmentation;meta-texture
%K surface-roughness;image
%K texture;
%X We address segmentation of an image into patches that have an underlying salient surface-roughness. Three intrinsic images are derived: reflectance, shading and meta- texture images. A constructive approach is proposed for computing a meta-texture image by preserving, equalizing and enhancing the underlying surface-roughness across color, brightness and illumination variations. We evaluate the performance on sample images and illustrate quantitatively that different patches of the same material, in an image, are normalized in their statistics despite variations in color, brightness and illumination. Finally, segmentation by line-based boundary-detection is proposed and results are provided and compared to known algorithms.
%B Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on
%P 1 - 8
%8 2007/10//
%G eng
%R 10.1109/ICCV.2007.4408930
%0 Journal Article
%J Multimedia, IEEE Transactions on
%D 2007
%T Super-Resolution of Face Images Using Kernel PCA-Based Prior
%A Chakrabarti,Ayan
%A Rajagopalan, AN
%A Chellapa, Rama
%K analysis;learning-based
%K analysis;probability;
%K component
%K Face
%K image
%K method;prior
%K model;face
%K principal
%K probability
%K recognition;image
%K reconstruction;image
%K reconstruction;kernel
%K resolution;principal
%K super-resolution;high-resolution
%X We present a learning-based method to super-resolve face images using a kernel principal component analysis-based prior model. A prior probability is formulated based on the energy lying outside the span of principal components identified in a higher-dimensional feature space. This is used to regularize the reconstruction of the high-resolution image. We demonstrate with experiments that including higher-order correlations results in significant improvements
%B Multimedia, IEEE Transactions on
%V 9
%P 888 - 892
%8 2007/06//
%@ 1520-9210
%G eng
%N 4
%R 10.1109/TMM.2007.893346
%0 Journal Article
%J Information Forensics and Security, IEEE Transactions on
%D 2007
%T Unicity Distance of Robust Image Hashing
%A Mao,Yinian
%A M. Wu
%K authentication;image
%K authentication;randomized
%K coding;image
%K compact
%K comparison;multimedia
%K content;image
%K data;
%K distance;watermarking;image
%K fingerprinting;image
%K hashing;unicity
%K image
%K initialization;robust
%K key
%K of
%K representation;randomized
%K representation;security
%K similarity
%X An image hash is a randomized compact representation of image content and finds applications in image authentication, image and video watermarking, and image similarity comparison. Usually, an image-hashing scheme is required to be robust and secure, and the security issue is particularly important in applications, such as multimedia authentication, watermarking, and fingerprinting. In this paper, we investigate the security of image hashing from the perspective of unicity distance, a concept pioneered by Shannon in one of his seminal papers. Using two recently proposed image-hashing schemes as representatives, we show that the concept of unicity distance can be adapted to evaluate the security of image hashing. Our analysis shows that the secret hashing key, or its equivalent form, can be estimated with high accuracy when the key is reused several dozen times. The estimated unicity distance determines the maximum number of key reuses in the investigated hashing schemes. A countermeasure of randomized key initialization is discussed to avoid key reuse and strengthen the security of robust image hashing.
%B Information Forensics and Security, IEEE Transactions on
%V 2
%P 462 - 467
%8 2007/09//
%@ 1556-6013
%G eng
%N 3
%R 10.1109/TIFS.2007.902260
%0 Conference Paper
%B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on
%D 2007
%T Using Stereo Matching for 2-D Face Recognition Across Pose
%A Castillo,C. D
%A Jacobs, David W.
%K 2D
%K estimation;stereo
%K Face
%K gallery
%K image
%K image;2D
%K image;dynamic
%K matching;dynamic
%K matching;pose
%K processing;
%K programming;face
%K programming;pose
%K query
%K recognition;2D
%K recognition;image
%K variation;stereo
%X We propose using stereo matching for 2-D face recognition across pose. We match one 2-D query image to one 2-D gallery image without performing 3-D reconstruction. Then the cost of this matching is used to evaluate the similarity of the two images. We show that this cost is robust to pose variations. To illustrate this idea we built a face recognition system on top of a dynamic programming stereo matching algorithm. The method works well even when the epipolar lines we use do not exactly fit the viewpoints. We have tested our approach on the PIE dataset. In all the experiments, our method demonstrates effective performance compared with other algorithms.
%B Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on
%P 1 - 8
%8 2007/06//
%G eng
%R 10.1109/CVPR.2007.383111
%0 Journal Article
%J Image Processing, IEEE Transactions on
%D 2006
%T Face Verification Across Age Progression
%A Ramanathan,N.
%A Chellapa, Rama
%K age
%K Aging
%K Bayesian
%K classification;
%K classification;face
%K classifier;age
%K difference
%K effects;preprocessing
%K Face
%K image
%K images;error
%K methods;Bayes
%K methods;error
%K progression;age
%K rate;face
%K recognition
%K recognition;image
%K separated
%K statistics;face
%K systems;face
%K verification;facial
%X Human faces undergo considerable amounts of variations with aging. While face recognition systems have been proven to be sensitive to factors such as illumination and pose, their sensitivity to facial aging effects is yet to be studied. How does age progression affect the similarity between a pair of face images of an individual? What is the confidence associated with establishing the identity between a pair of age separated face images? In this paper, we develop a Bayesian age difference classifier that classifies face images of individuals based on age differences and performs face verification across age progression. Further, we study the similarity of faces across age progression. Since age separated face images invariably differ in illumination and pose, we propose preprocessing methods for minimizing such variations. Experimental results using a database comprising of pairs of face images that were retrieved from the passports of 465 individuals are presented. The verification system for faces separated by as many as nine years, attains an equal error rate of 8.5%
%B Image Processing, IEEE Transactions on
%V 15
%P 3349 - 3361
%8 2006/11//
%@ 1057-7149
%G eng
%N 11
%R 10.1109/TIP.2006.881993
%0 Journal Article
%J Information Forensics and Security, IEEE Transactions on
%D 2006
%T Joint coding and embedding techniques for MultimediaFingerprinting
%A He,Shan
%A M. Wu
%K coding-embedding
%K coding;
%K collusion-resistant
%K complexity;
%K computational
%K data;
%K DETECTION
%K embedded
%K EMBEDDING
%K fingerprinting;
%K group-based
%K image
%K joint
%K multimedia
%K of
%K Security
%K subsegment
%K systems;
%K technique;
%K techniques;
%X Digital fingerprinting protects multimedia content from illegal redistribution by uniquely marking every copy of the content distributed to each user. The collusion attack is a powerful attack where several different fingerprinted copies of the same content are combined together to attenuate or even remove the fingerprints. One major category of collusion-resistant fingerprinting employs an explicit step of coding. Most existing works on coded fingerprinting mainly focus on the code-level issues and treat the embedding issues through abstract assumptions without examining the overall performance. In this paper, we jointly consider the coding and embedding issues for coded fingerprinting systems and examine their performance in terms of collusion resistance, detection computational complexity, and distribution efficiency. Our studies show that coded fingerprinting has efficient detection but rather low collusion resistance. Taking advantage of joint coding and embedding, we propose a permuted subsegment embedding technique and a group-based joint coding and embedding technique to improve the collusion resistance of coded fingerprinting while maintaining its efficient detection. Experimental results show that the number of colluders that the proposed methods can resist is more than three times as many as that of the conventional coded fingerprinting approaches.
%B Information Forensics and Security, IEEE Transactions on
%V 1
%P 231 - 247
%8 2006/06//
%@ 1556-6013
%G eng
%N 2
%R 10.1109/TIFS.2006.873597
%0 Journal Article
%J Information Forensics and Security, IEEE Transactions on
%D 2006
%T Robust and secure image hashing
%A Swaminathan,A.
%A Mao,Yinian
%A M. Wu
%K content-preserving
%K cryptography;
%K differential
%K distortions;
%K entropy;
%K Filtering
%K Fourier
%K functions;
%K hash
%K hashing;
%K image
%K modifications;
%K processing;
%K secure
%K theory;
%K transform;
%K transforms;
%X Image hash functions find extensive applications in content authentication, database search, and watermarking. This paper develops a novel algorithm for generating an image hash based on Fourier transform features and controlled randomization. We formulate the robustness of image hashing as a hypothesis testing problem and evaluate the performance under various image processing operations. We show that the proposed hash function is resilient to content-preserving modifications, such as moderate geometric and filtering distortions. We introduce a general framework to study and evaluate the security of image hashing systems. Under this new framework, we model the hash values as random variables and quantify its uncertainty in terms of differential entropy. Using this security framework, we analyze the security of the proposed schemes and several existing representative methods for image hashing. We then examine the security versus robustness tradeoff and show that the proposed hashing methods can provide excellent security and robustness.
%B Information Forensics and Security, IEEE Transactions on
%V 1
%P 215 - 230
%8 2006/06//
%@ 1556-6013
%G eng
%N 2
%R 10.1109/TIFS.2006.873601
%0 Conference Paper
%B Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on
%D 2005
%T An algebraic approach to surface reconstruction from gradient fields
%A Agrawal,A.
%A Chellapa, Rama
%A Raskar, R.
%K algebra;
%K algebraic
%K approach;
%K Computer
%K confinement;
%K discrete
%K domain
%K error
%K field;
%K from
%K gradient
%K graph
%K image
%K integrability;
%K linear
%K local
%K methods;
%K photometric
%K reconstruction;
%K shading;
%K SHAPE
%K stereo;
%K surface
%K system;
%K theory;
%K vision;
%X Several important problems in computer vision such as shape from shading (SFS) and photometric stereo (PS) require reconstructing a surface from an estimated gradient field, which is usually non-integrable, i.e. have non-zero curl. We propose a purely algebraic approach to enforce integrability in discrete domain. We first show that enforcing integrability can be formulated as solving a single linear system Ax =b over the image. In general, this system is under-determined. We show conditions under which the system can be solved and a method to get to those conditions based on graph theory. The proposed approach is non-iterative, has the important property of local error confinement and can be applied to several problems. Results on SFS and PS demonstrate the applicability of our method.
%B Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on
%V 1
%P 174 - 181 Vol. 1 - 174 - 181 Vol. 1
%8 2005/10//
%G eng
%R 10.1109/ICCV.2005.31
%0 Journal Article
%J Signal Processing, IEEE Transactions on
%D 2005
%T Data hiding in curves with application to fingerprinting maps
%A Gou,Hongmei
%A M. Wu
%K (mathematics);
%K algorithm;
%K alignment-minimization
%K B-spline
%K CONTROL
%K curve
%K data
%K detection;
%K edge
%K embedding;
%K encapsulation;
%K fingerprint
%K geospatial
%K hiding
%K identification;
%K image
%K iterative
%K method;
%K methods;
%K minimisation;
%K point;
%K protection;
%K registration;
%K sequence;
%K spectrum
%K splines
%K spread
%K watermarking;
%X This paper presents a new data hiding method for curves. The proposed algorithm parameterizes a curve using the B-spline model and adds a spread spectrum sequence to the coordinates of the B-spline control points. In order to achieve robust fingerprint detection, an iterative alignment-minimization algorithm is proposed to perform curve registration and to deal with the nonuniqueness of B-spline control points. Through experiments, we demonstrate the robustness of the proposed data-hiding algorithm against various attacks, such as collusion, cropping, geometric transformations, vector/raster-raster/vector conversions, printing-and-scanning, and some of their combinations. We also show the feasibility of our method for fingerprinting topographic maps as well as writings and drawings.
%B Signal Processing, IEEE Transactions on
%V 53
%P 3988 - 4005
%8 2005/10//
%@ 1053-587X
%G eng
%N 10
%R 10.1109/TSP.2005.855411
%0 Conference Paper
%B Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on
%D 2005
%T Deformation invariant image matching
%A Ling,Haibin
%A Jacobs, David W.
%K deformation
%K deformations;nonaffine
%K deformations;point
%K descriptor;geodesic
%K distances;geodesic
%K geometry;differential
%K geometry;image
%K histogram;image
%K image
%K invariant
%K local
%K matching;computational
%K matching;deformation
%K matching;image
%K morphing;
%K sampling;geodesic-intensity
%X We propose a novel framework to build descriptors of local intensity that are invariant to general deformations. In this framework, an image is embedded as a 2D surface in 3D space, with intensity weighted relative to distance in x-y. We show that as this weight increases, geodesic distances on the embedded surface are less affected by image deformations. In the limit, distances are deformation invariant. We use geodesic sampling to get neighborhood samples for interest points, and then use a geodesic-intensity histogram (GIH) as a deformation invariant local descriptor. In addition to its invariance, the new descriptor automatically finds its support region. This means it can safely gather information from a large neighborhood to improve discriminability. Furthermore, we propose a matching method for this descriptor that is invariant to affine lighting changes. We have tested this new descriptor on interest point matching for two data sets, one with synthetic deformation and lighting change, and another with real non-affine deformations. Our method shows promising matching results compared to several other approaches
%B Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on
%V 2
%P 1466 -1473 Vol. 2 - 1466 -1473 Vol. 2
%8 2005/10//
%G eng
%R 10.1109/ICCV.2005.67
%0 Conference Paper
%B Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on
%D 2005
%T Detecting rotational symmetries
%A Shiv Naga Prasad,V.
%A Davis, Larry S.
%K axial
%K computational
%K detection;
%K field;
%K flow;
%K geometry;
%K gradient
%K graph
%K graph;
%K image
%K image;
%K magnitude
%K methods;
%K multiple
%K n-sided
%K object
%K polygons;
%K recognition;
%K rotational
%K symmetries;
%K symmetry;
%K theory;
%K tire
%K tyres;
%K vector
%X We present an algorithm for detecting multiple rotational symmetries in natural images. Given an image, its gradient magnitude field is computed, and information from the gradients is spread using a diffusion process in the form of a gradient vector flow (GVF) field. We construct a graph whose nodes correspond to pixels in tire image, connecting points that are likely to be rotated versions of one another The n-cycles present in tire graph are made to vote for C_{n} symmetries, their votes being weighted by the errors in transformation between GVF in the neighborhood of the voting points, and the irregularity of the n-sided polygons formed by the voters. The votes are accumulated at tire centroids of possible rotational symmetries, generating a confidence map for each order of symmetry. We tested the method with several natural images.
%B Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on
%V 2
%P 954 - 961 Vol. 2 - 954 - 961 Vol. 2
%8 2005/10//
%G eng
%R 10.1109/ICCV.2005.71
%0 Conference Paper
%B Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on
%D 2005
%T Efficient mean-shift tracking via a new similarity measure
%A Yang,Changjiang
%A Duraiswami, Ramani
%A Davis, Larry S.
%K algorithm;
%K analysis;
%K Bhattacharyya
%K coefficient;
%K Color
%K colour
%K density
%K divergence;
%K estimates;
%K extraction;
%K fast
%K feature
%K frame-rate
%K Gauss
%K Gaussian
%K histograms;
%K image
%K Kernel
%K Kullback-Leibler
%K matching;
%K Mean-shift
%K measures;
%K nonparametric
%K processes;
%K sample-based
%K sequences;
%K similarity
%K spaces;
%K spatial-feature
%K tracking
%K tracking;
%K transform;
%X The mean shift algorithm has achieved considerable success in object tracking due to its simplicity and robustness. It finds local minima of a similarity measure between the color histograms or kernel density estimates of the model and target image. The most typically used similarity measures are the Bhattacharyya coefficient or the Kullback-Leibler divergence. In practice, these approaches face three difficulties. First, the spatial information of the target is lost when the color histogram is employed, which precludes the application of more elaborate motion models. Second, the classical similarity measures are not very discriminative. Third, the sample-based classical similarity measures require a calculation that is quadratic in the number of samples, making real-time performance difficult. To deal with these difficulties we propose a new, simple-to-compute and more discriminative similarity measure in spatial-feature spaces. The new similarity measure allows the mean shift algorithm to track more general motion models in an integrated way. To reduce the complexity of the computation to linear order we employ the recently proposed improved fast Gauss transform. This leads to a very efficient and robust nonparametric spatial-feature tracking algorithm. The algorithm is tested on several image sequences and shown to achieve robust and reliable frame-rate tracking.
%B Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on
%V 1
%P 176 - 183 vol. 1 - 176 - 183 vol. 1
%8 2005/06//
%G eng
%R 10.1109/CVPR.2005.139
%0 Conference Paper
%B Image Processing, 2005. ICIP 2005. IEEE International Conference on
%D 2005
%T Extracting regions of symmetry
%A Gupta,A.
%A Prasad,V. S.N
%A Davis, Larry S.
%K algorithm;
%K coherence;
%K DETECTION
%K detection;
%K elimination;
%K extraction;
%K feature
%K image
%K images;
%K natural
%K normalized-cut
%K object
%K region
%K segmentation
%K segmentation;
%K spatial
%K spurious
%K symmetry
%X This paper presents an approach for extending the normalized-cut (n-cut) segmentation algorithm to find symmetric regions present in natural images. We use an existing algorithm to quickly detect possible symmetries present in an image. The detected symmetries are then individually verified using the modified n-cut algorithm to eliminate spurious detections. The weights of the n-cut algorithm are modified so as to include both symmetric and spatial affinities. A global parameter is defined to model the tradeoff between spatial coherence and symmetry. Experimental results indicate that symmetric quality measure for a region segmented by our algorithm is a good indicator for the significance of the principal axis of symmetry.
%B Image Processing, 2005. ICIP 2005. IEEE International Conference on
%V 3
%P III - 133-6 - III - 133-6
%8 2005/09//
%G eng
%R 10.1109/ICIP.2005.1530346
%0 Conference Paper
%B Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on
%D 2005
%T Fast illumination-invariant background subtraction using two views: error analysis, sensor placement and applications
%A Lim,Ser-Nam
%A Mittal,A.
%A Davis, Larry S.
%A Paragios,N.
%K analysis;
%K application;
%K background
%K cameras;
%K configuration;
%K DETECTION
%K detection;
%K error
%K error;
%K extraction;
%K false
%K feature
%K handling;
%K illumination-invariance;
%K image
%K intelligent
%K matching;
%K modeling;
%K object
%K placement;
%K processing;
%K sensor
%K sensors;
%K shadow
%K stereo
%K subtraction;
%K video
%X Background modeling and subtraction to detect new or moving objects in a scene is an important component of many intelligent video applications. Compared to a single camera, the use of multiple cameras leads to better handling of shadows, specularities and illumination changes due to the utilization of geometric information. Although the result of stereo matching can be used as the feature for detection, it has been shown that the detection process can be made much faster by a simple subtraction of the intensities observed at stereo-generated conjugate pairs in the two views. The methodology however, suffers from false and missed detections due to some geometric considerations. In this paper, we perform a detailed analysis of such errors. Then, we propose a sensor configuration that eliminates false detections. Algorithms are also proposed that effectively eliminate most detection errors due to missed detections, specular reflections and objects being geometrically close to the background. Experiments on several scenes illustrate the utility and enhanced performance of the proposed approach compared to existing techniques.
%B Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on
%V 1
%P 1071 - 1078 vol. 1 - 1071 - 1078 vol. 1
%8 2005/06//
%G eng
%R 10.1109/CVPR.2005.155
%0 Conference Paper
%B Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on
%D 2005
%T Fast multiple object tracking via a hierarchical particle filter
%A Yang,Changjiang
%A Duraiswami, Ramani
%A Davis, Larry S.
%K (numerical
%K algorithm;
%K analysis;
%K Color
%K colour
%K Computer
%K Convergence
%K detection;
%K edge
%K fast
%K filter;
%K Filtering
%K hierarchical
%K histogram;
%K image
%K images;
%K integral
%K likelihood;
%K methods);
%K methods;
%K multiple
%K numerical
%K object
%K observation
%K of
%K orientation
%K particle
%K processes;
%K quasirandom
%K random
%K sampling;
%K tracking
%K tracking;
%K vision;
%K visual
%X A very efficient and robust visual object tracking algorithm based on the particle filter is presented. The method characterizes the tracked objects using color and edge orientation histogram features. While the use of more features and samples can improve the robustness, the computational load required by the particle filter increases. To accelerate the algorithm while retaining robustness we adopt several enhancements in the algorithm. The first is the use of integral images for efficiently computing the color features and edge orientation histograms, which allows a large amount of particles and a better description of the targets. Next, the observation likelihood based on multiple features is computed in a coarse-to-fine manner, which allows the computation to quickly focus on the more promising regions. Quasi-random sampling of the particles allows the filter to achieve a higher convergence rate. The resulting tracking algorithm maintains multiple hypotheses and offers robustness against clutter or short period occlusions. Experimental results demonstrate the efficiency and effectiveness of the algorithm for single and multiple object tracking.
%B Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on
%V 1
%P 212 - 219 Vol. 1 - 212 - 219 Vol. 1
%8 2005/10//
%G eng
%R 10.1109/ICCV.2005.95
%0 Conference Paper
%B Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on
%D 2005
%T Flattening curved documents in images
%A Liang,Jian
%A DeMenthon,D.
%A David Doermann
%K calibration;
%K camera
%K character
%K content;
%K curved
%K distortion;
%K document
%K document;
%K image
%K images;
%K OCR
%K optical
%K page
%K pictures;
%K printed
%K processing;
%K recognition;
%K restoration;
%K scanned
%K techniques;
%K textual
%K warping;
%X Compared to scanned images, document pictures captured by camera can suffer from distortions due to perspective and page warping. It is necessary to restore a frontal planar view of the page before other OCR techniques can be applied. In this paper we describe a novel approach for flattening a curved document in a single picture captured by an uncalibrated camera. To our knowledge this is the first reported method able to process general curved documents in images without camera calibration. We propose to model the page surface by a developable surface, and exploit the properties (parallelism and equal line spacing) of the printed textual content on the page to recover the surface shape. Experiments show that the output images are much more OCR friendly than the original ones. While our method is designed to work with any general developable surfaces, it can be adapted for typical special cases including planar pages, scans of thick books, and opened books.
%B Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on
%V 2
%P 338 - 345 vol. 2 - 338 - 345 vol. 2
%8 2005/06//
%G eng
%R 10.1109/CVPR.2005.163
%0 Conference Paper
%B Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on
%D 2005
%T Handwriting matching and its application to handwriting synthesis
%A Yefeng Zheng
%A David Doermann
%K (artificial
%K deformation
%K deformation;
%K handwriting
%K image
%K intelligence);
%K learning
%K learning;
%K matching;
%K point
%K recognition;
%K sampling;
%K SHAPE
%K synthesis;
%X Since it is extremely expensive to collect a large volume of handwriting samples, synthesized data are often used to enlarge the training set. We argue that, in order to generate good handwriting samples, a synthesis algorithm should learn the shape deformation characteristics of handwriting from real samples. In this paper, we present a point matching algorithm to learn the deformation, and apply it to handwriting synthesis. Preliminary experiments show the advantages of our approach.
%B Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on
%P 861 - 865 Vol. 2 - 861 - 865 Vol. 2
%8 2005/09/01/aug
%G eng
%R 10.1109/ICDAR.2005.122
%0 Conference Paper
%B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on
%D 2005
%T Moving Object Segmentation and Dynamic Scene Reconstruction Using Two Frames
%A Agrawala, Ashok K.
%A Chellapa, Rama
%K 3D
%K analysis;
%K constraints;
%K dynamic
%K ego-motion
%K estimation;
%K flow
%K image
%K images;
%K independent
%K INTENSITY
%K least
%K mean
%K median
%K method;
%K methods;
%K model;
%K MOTION
%K motion;
%K moving
%K object
%K of
%K parallax
%K parallax;
%K parametric
%K processing;
%K reconstruction;
%K scene
%K segmentation;
%K signal
%K squares
%K squares;
%K static
%K structure;
%K subspace
%K surface
%K translational
%K two-frame
%K unconstrained
%K video
%B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on
%V 2
%P 705 - 708
%8 2005//18/23
%G eng
%R 10.1109/ICASSP.2005.1415502
%0 Conference Paper
%B Information Fusion, 2005 8th International Conference on
%D 2005
%T A new approach to image fusion based on cokriging
%A Memarsadeghi,N.
%A Le Moigne,J.
%A Mount, Dave
%A Morisette,J.
%K ALI;
%K analysis;
%K based
%K cokriging;
%K component
%K data;
%K forecasting
%K fusion
%K fusion;
%K geophysical
%K geostatistical
%K Hyperion
%K image
%K Interpolation
%K interpolation;
%K invasive
%K ISFS
%K method;
%K metrics;
%K PCA;
%K principal
%K processing;
%K project;
%K QUALITY
%K quantitative
%K remote
%K remotely
%K sensed
%K sensing;
%K sensor
%K sensors;
%K signal
%K species
%K system;
%K techniques;
%K transforms;
%K wavelet
%K wavelet-based
%X We consider the image fusion problem involving remotely sensed data. We introduce cokriging as a method to perform fusion. We investigate the advantages of fusing Hyperion with ALI. This evaluation is performed by comparing the classification of the fused data with that of input images and by calculating well-chosen quantitative fusion quality metrics. We consider the invasive species forecasting system (ISFS) project as our fusion application. The fusion of ALI with Hyperion data is studied using PCA and wavelet-based fusion. We then propose utilizing a geostatistical based interpolation method called cokriging as a new approach for image fusion.
%B Information Fusion, 2005 8th International Conference on
%V 1
%P 8 pp. - 8 pp.
%8 2005/07//
%G eng
%R 10.1109/ICIF.2005.1591912
%0 Conference Paper
%B Image Processing, 2005. ICIP 2005. IEEE International Conference on
%D 2005
%T Pedestrian classification from moving platforms using cyclic motion pattern
%A Yang Ran
%A Qinfen Zheng
%A Weiss, I.
%A Davis, Larry S.
%A Abd-Almageed, Wael
%A Liang Zhao
%K analysis;
%K angle;
%K body
%K classification;
%K compact
%K cyclic
%K DETECTION
%K detection;
%K digital
%K Feedback
%K Gait
%K human
%K image
%K information;
%K locked
%K loop
%K loop;
%K loops;
%K module;
%K MOTION
%K object
%K oscillations;
%K pattern;
%K pedestrian
%K phase
%K Pixel
%K principle
%K representation;
%K sequence;
%K sequences;
%K SHAPE
%K system;
%X This paper describes an efficient pedestrian detection system for videos acquired from moving platforms. Given a detected and tracked object as a sequence of images within a bounding box, we describe the periodic signature of its motion pattern using a twin-pendulum model. Then a principle gait angle is extracted in every frame providing gait phase information. By estimating the periodicity from the phase data using a digital phase locked loop (dPLL), we quantify the cyclic pattern of the object, which helps us to continuously classify it as a pedestrian. Past approaches have used shape detectors applied to a single image or classifiers based on human body pixel oscillations, but ours is the first to integrate a global cyclic motion model and periodicity analysis. Novel contributions of this paper include: i) development of a compact shape representation of cyclic motion as a signature for a pedestrian, ii) estimation of gait period via a feedback loop module, and iii) implementation of a fast online pedestrian classification system which operates on videos acquired from moving platforms.
%B Image Processing, 2005. ICIP 2005. IEEE International Conference on
%V 2
%P II - 854-7 - II - 854-7
%8 2005/09//
%G eng
%R 10.1109/ICIP.2005.1530190
%0 Conference Paper
%B Image Processing, 2005. ICIP 2005. IEEE International Conference on
%D 2005
%T Robust observations for object tracking
%A Han,Bohyung
%A Davis, Larry S.
%K (numerical
%K adaptive
%K analysis;
%K component
%K enhancement;
%K filter
%K Filtering
%K framework;
%K image
%K images;
%K likelihood
%K methods);
%K object
%K observation
%K particle
%K PCA;
%K principal
%K tracking;
%X It is a difficult task to find an observation model that will perform well for long-term visual tracking. In this paper, we propose an adaptive observation enhancement technique based on likelihood images, which are derived from multiple visual features. The most discriminative likelihood image is extracted by principal component analysis (PCA) and incrementally updated frame by frame to reduce temporal tracking error. In the particle filter framework, the feasibility of each sample is computed using this most discriminative likelihood image before the observation process. Integral image is employed for efficient computation of the feasibility of each sample. We illustrate how our enhancement technique contributes to more robust observations through demonstrations.
%B Image Processing, 2005. ICIP 2005. IEEE International Conference on
%V 2
%P II - 442-5 - II - 442-5
%8 2005/09//
%G eng
%R 10.1109/ICIP.2005.1530087
%0 Conference Paper
%B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on
%D 2005
%T Security of feature extraction in image hashing
%A Swaminathan,A.
%A Mao,Yinian
%A M. Wu
%K cryptography;
%K differential
%K digital
%K entropy;
%K extraction;
%K feature
%K functions;
%K hash
%K hashing;
%K image
%K metric;
%K processing;
%K randomness;
%K robustness;
%K Security
%K signature;
%K signatures;
%X Security and robustness are two important requirements for image hash functions. We introduce "differential entropy" as a metric to quantify the amount of randomness in image hash functions and to study their security. We present a mathematical framework and derive expressions for the proposed security metric for various common image hashing schemes. Using the proposed security metric, we discuss the trade-offs between security and robustness in image hashing.
%B Acoustics, Speech, and Signal Processing, 2005. Proceedings. (ICASSP '05). IEEE International Conference on
%V 2
%P ii/1041 - ii/1044 Vol. 2 - ii/1041 - ii/1044 Vol. 2
%8 2005/03//
%G eng
%R 10.1109/ICASSP.2005.1415586
%0 Conference Paper
%B Image Processing, 2005. ICIP 2005. IEEE International Conference on
%D 2005
%T Segmentation and appearance model building from an image sequence
%A Liang Zhao
%A Davis, Larry S.
%K algorithm;
%K color-path-length
%K expectation-sampling
%K image
%K iterative
%K joint
%K kernel-based
%K methods;
%K PDF;
%K person
%K segmentation;
%K sequence;
%K sequences;
%K space;
%X In this paper we explore the problem of accurately segmenting a person from a video given only approximate location of that person. Unlike previous work which assumes that the appearance model is known in advance, we developed an iterative expectation-sampling (ES) algorithm for solving segmentation and appearance modeling simultaneously The appearance model is encoded with a kernel-based PDF defined in a joint color/path-length space. This appearance model remains unchanged during a short time period, although the object can articulate. Thus, we can perform the ES iteration not only for a single frame but also for an image sequence. The algorithm is iterative, but simple, efficient and gives visually good results.
%B Image Processing, 2005. ICIP 2005. IEEE International Conference on
%V 1
%P I - 321-4 - I - 321-4
%8 2005/09//
%G eng
%R 10.1109/ICIP.2005.1529752
%0 Conference Paper
%B Image Processing, 2005. ICIP 2005. IEEE International Conference on
%D 2005
%T Tracking objects in video using motion and appearance models
%A Sankaranarayanan,A. C
%A Chellapa, Rama
%A Qinfen Zheng
%K algorithm;
%K analysis;
%K appearance
%K background
%K estimation;
%K image
%K likelihood
%K maximum
%K model;
%K models;
%K MOTION
%K object
%K processing;
%K signal
%K target
%K tracking
%K tracking;
%K video
%K visual
%X This paper proposes a visual tracking algorithm that combines motion and appearance in a statistical framework. It is assumed that image observations are generated simultaneously from a background model and a target appearance model. This is different from conventional appearance-based tracking, that does not use motion information. The proposed algorithm attempts to maximize the likelihood ratio of the tracked region, derived from appearance and background models. Incorporation of motion in appearance based tracking provides robust tracking, even when the target violates the appearance model. We show that the proposed algorithm performs well in tracking targets efficiently over long time intervals.
%B Image Processing, 2005. ICIP 2005. IEEE International Conference on
%V 2
%P II - 394-7 - II - 394-7
%8 2005/09//
%G eng
%R 10.1109/ICIP.2005.1530075
%0 Conference Paper
%B Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on
%D 2005
%T Using the inner-distance for classification of articulated shapes
%A Ling,H.
%A Jacobs, David W.
%K articulated
%K CE-Shape-1
%K classification;
%K database;
%K databases;
%K dataset;
%K descriptor;
%K dynamic
%K human
%K image
%K inner-distance;
%K Kimia
%K landmark
%K leaf
%K matching;
%K MOTION
%K MPEG7
%K points;
%K programming;
%K SHAPE
%K silhouette
%K silhouette;
%K Swedish
%K visual
%X We propose using the inner-distance between landmark points to build shape descriptors. The inner-distance is defined as the length of the shortest path between landmark points within the shape silhouette. We show that the inner-distance is articulation insensitive and more effective at capturing complex shapes with part structures than Euclidean distance. To demonstrate this idea, it is used to build a new shape descriptor based on shape contexts. After that, we design a dynamic programming based method for shape matching and comparison. We have tested our approach on a variety of shape databases including an articulated shape dataset, MPEG7 CE-Shape-1, Kimia silhouettes, a Swedish leaf database and a human motion silhouette dataset. In all the experiments, our method demonstrates effective performance compared with other algorithms.
%B Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on
%V 2
%P 719 - 726 vol. 2 - 719 - 726 vol. 2
%8 2005/06//
%G eng
%R 10.1109/CVPR.2005.362
%0 Conference Paper
%B Advanced Video and Signal Based Surveillance, 2005. AVSS 2005. IEEE Conference on
%D 2005
%T VidMAP: video monitoring of activity with Prolog
%A Shet,V. D
%A Harwood,D.
%A Davis, Larry S.
%K activities
%K algorithms;
%K based
%K Computer
%K computerised
%K engine;
%K higher
%K image
%K level
%K Logic
%K monitoring;
%K multicamera
%K processing;
%K programming;
%K Prolog
%K PROLOG;
%K reasoning
%K recognition;
%K scenario;
%K signal
%K streaming;
%K streams;
%K Surveillance
%K surveillance;
%K system;
%K video
%K VISION
%K vision;
%K visual
%X This paper describes the architecture of a visual surveillance system that combines real time computer vision algorithms with logic programming to represent and recognize activities involving interactions amongst people, packages and the environments through which they move. The low level computer vision algorithms log primitive events of interest as observed facts, while the higher level Prolog based reasoning engine uses these facts in conjunction with predefined rules to recognize various activities in the input video streams. The system is illustrated in action on a multi-camera surveillance scenario that includes both security and safety violations.
%B Advanced Video and Signal Based Surveillance, 2005. AVSS 2005. IEEE Conference on
%P 224 - 229
%8 2005/09//
%G eng
%R 10.1109/AVSS.2005.1577271
%0 Conference Paper
%B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
%D 2004
%T 3D model refinement using surface-parallax
%A Agrawala, Ashok K.
%A Chellapa, Rama
%K 3D
%K adaptive
%K arbitrary
%K camera
%K coarse
%K compensation;
%K Computer
%K DEM;
%K depth
%K digital
%K ELEVATION
%K environments;
%K epipolar
%K estimation;
%K field;
%K image
%K incomplete
%K INTENSITY
%K map;
%K model
%K MOTION
%K parallax;
%K plane-parallax
%K reconstruction;
%K recovery;
%K refinement;
%K sequence;
%K sequences;
%K surface
%K surfaces;
%K urban
%K vision;
%K windowing;
%X We present an approach to update and refine coarse 3D models of urban environments from a sequence of intensity images using surface parallax. This generalizes the plane-parallax recovery methods to surface-parallax using arbitrary surfaces. A coarse and potentially incomplete depth map of the scene obtained from a digital elevation map (DEM) is used as a reference surface which is refined and updated using this approach. The reference depth map is used to estimate the camera motion and the motion of the 3D points on the reference surface is compensated. The resulting parallax, which is an epipolar field, is estimated using an adaptive windowing technique and used to obtain the refined depth map.
%B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
%V 3
%P iii - 285-8 vol.3 - iii - 285-8 vol.3
%8 2004/05//
%G eng
%R 10.1109/ICASSP.2004.1326537
%0 Conference Paper
%B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
%D 2004
%T Appearance-based tracking and recognition using the 3D trilinear tensor
%A Jie Shao
%A Zhou,S. K
%A Chellapa, Rama
%K 3D
%K adaptive
%K affine-transformation
%K airborne
%K algorithm;
%K appearance
%K appearance-based
%K based
%K estimation;
%K geometrical
%K image
%K mathematical
%K novel
%K object
%K operator;
%K operators;
%K perspective
%K prediction;
%K processing;
%K recognition;
%K representation;
%K signal
%K structure
%K synthesis;
%K template
%K tensor
%K tensor;
%K tensors;
%K tracking;
%K transformation;
%K trilinear
%K updating;
%K video
%K video-based
%K video;
%K view
%X The paper presents an appearance-based adaptive algorithm for simultaneous tracking and recognition by generalizing the transformation model to 3D perspective transformation. A trilinear tensor operator is used to represent the 3D geometrical structure. The tensor is estimated by predicting the corresponding points using the existing affine-transformation based algorithm. The estimated tensor is used to synthesize novel views to update the appearance templates. Some experimental results using airborne video are presented.
%B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
%V 3
%P iii - 613-16 vol.3 - iii - 613-16 vol.3
%8 2004/05//
%G eng
%R 10.1109/ICASSP.2004.1326619
%0 Conference Paper
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%D 2004
%T Background modeling and subtraction by codebook construction
%A Kim,Kyungnam
%A Chalidabhongse,T.H.
%A Harwood,D.
%A Davis, Larry S.
%K (signal);
%K analysis;
%K background
%K codebook
%K coding;
%K colour
%K compression;
%K construction;
%K data
%K image
%K modeling
%K modeling;
%K MOTION
%K multimode
%K quantisation
%K representation;
%K sequence;
%K sequences;
%K subtraction;
%K technique;
%K video
%X We present a new fast algorithm for background modeling and subtraction. Sample background values at each pixel are quantized into codebooks which represent a compressed form of background model for a long image sequence. This allows us to capture structural background variation due to periodic-like motion over a long period of time under limited memory. Our method can handle scenes containing moving backgrounds or illumination variations (shadows and highlights), and it achieves robust detection for compressed videos. We compared our method with other multimode modeling techniques.
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%V 5
%P 3061 - 3064 Vol. 5 - 3061 - 3064 Vol. 5
%8 2004/10//
%G eng
%R 10.1109/ICIP.2004.1421759
%0 Journal Article
%J Multimedia, IEEE Transactions on
%D 2004
%T Data hiding in binary image for authentication and annotation
%A M. Wu
%A Liu,Bede
%K annotation;
%K authentication;
%K binary
%K coding;
%K data
%K digital
%K digitized
%K document
%K EMBEDDING
%K encapsulation;
%K extraction;
%K feature
%K hiding;
%K image
%K image;
%K method;
%K signature;
%K unauthorized
%K user;
%K watermarking;
%X This paper proposes a new method to embed data in binary images, including scanned text, figures, and signatures. The method manipulates "flippable" pixels to enforce specific block-based relationship in order to embed a significant amount of data without causing noticeable artifacts. Shuffling is applied before embedding to equalize the uneven embedding capacity from region to region. The hidden data can be extracted without using the original image, and can also be accurately extracted after high quality printing and scanning with the help of a few registration marks. The proposed data embedding method can be used to detect unauthorized use of a digitized signature, and annotate or authenticate binary documents. The paper also presents analysis and discussions on robustness and security issues.
%B Multimedia, IEEE Transactions on
%V 6
%P 528 - 538
%8 2004/08//
%@ 1520-9210
%G eng
%N 4
%R 10.1109/TMM.2004.830814
%0 Conference Paper
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%D 2004
%T Data hiding in curves for collusion-resistant digital fingerprinting
%A Gou,Hongmei
%A M. Wu
%K (mathematics);
%K B-spline
%K coding;
%K collusion-resistant
%K CONTROL
%K data
%K devices;
%K digital
%K document
%K encapsulation;
%K extraction;
%K feature
%K fingerprinting;
%K hiding;
%K image
%K INPUT
%K maps;
%K model;
%K pen-based
%K points;
%K printing-and-scanning
%K processing;
%K robustness;
%K sequence;
%K spectrum
%K splines
%K spread
%K topographic
%K watermarking;
%X This paper presents a new data hiding method for curves. The proposed algorithm parameterizes a curve using the B-spline model and adds a spread spectrum sequence in the coordinates of the B-spline control points. We demonstrate through experiments the robustness of the proposed data hiding algorithm against printing-and-scanning and collusions, and show its feasibility for collusion-resistant fingerprinting of topographic maps as well as writings/drawings from pen-based input devices.
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%V 1
%P 51 - 54 Vol. 1 - 51 - 54 Vol. 1
%8 2004/10//
%G eng
%R 10.1109/ICIP.2004.1418687
%0 Conference Paper
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%D 2004
%T Facial similarity across age, disguise, illumination and pose
%A Ramanathan,N.
%A Chellapa, Rama
%A Roy Chowdhury, A.K.
%K Aging
%K database
%K databases;
%K disguise;
%K effect;
%K Expression
%K Face
%K half-face;
%K illumination;
%K image
%K lighting;
%K pose
%K recognition
%K recognition;
%K retrieval;
%K system;
%K variation;
%K visual
%X Illumination, pose variations, disguises, aging effects and expression variations are some of the key factors that affect the performance of face recognition systems. Face recognition systems have always been studied from a recognition perspective. Our emphasis is on deriving a measure of similarity between faces. The similarity measure provides insights into the role each of the above mentioned variations play in affecting the performance of face recognition systems. In the process of computing the similarity measure between faces, we suggest a framework to compensate for pose variations and introduce the notion of 'half-faces' to circumvent the problem of non-uniform illumination. We used the similarity measure to retrieve similar faces from a database containing multiple images of individuals. Moreover, we devised experiments to study the effect age plays in affecting facial similarity. In conclusion, the similarity measure helps in studying the significance facial features play in affecting the performance of face recognition systems.
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%V 3
%P 1999 - 2002 Vol. 3 - 1999 - 2002 Vol. 3
%8 2004/10//
%G eng
%R 10.1109/ICIP.2004.1421474
%0 Conference Paper
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%D 2004
%T A fine-structure image/video quality measure using local statistics
%A Kim,Kyungnam
%A Davis, Larry S.
%K algorithm;
%K background
%K degradation;
%K detection;
%K foreground
%K image
%K line-structure
%K local
%K measure;
%K modeling;
%K no-reference
%K object
%K objective
%K processing;
%K QUALITY
%K signal
%K statistics;
%K subtraction
%K surveillance;
%K video
%X An objective no-reference measure is presented to assess line-structure image/video quality. It was designed to measure image/video quality for video surveillance applications, especially for background modeling and foreground object detection. The proposed measure using local statistics reflects image degradation well in terms of noise and blur. The experimental results on a background subtraction algorithm validate the usefulness of the proposed measure, by showing its correlation with the algorithm's performance.
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%V 5
%P 3535 - 3538 Vol. 5 - 3535 - 3538 Vol. 5
%8 2004/10//
%G eng
%R 10.1109/ICIP.2004.1421879
%0 Conference Paper
%B Multimedia Signal Processing, 2004 IEEE 6th Workshop on
%D 2004
%T Image hashing resilient to geometric and filtering operations
%A Swaminathan,A.
%A Mao,Yinian
%A M. Wu
%K compact
%K cryptographic
%K cryptography;
%K discrete
%K distortion;
%K Filtering
%K Fourier
%K function;
%K geometric
%K hash
%K image
%K key
%K key;
%K operation;
%K polar
%K PROCESSING
%K public
%K representation;
%K theory;
%K transform;
%K transforms;
%X Image hash functions provide compact representations of images, which is useful for search and authentication applications. In this work, we have identified a general three step framework and proposed a new image hashing scheme that achieves a better overall performance than the existing approaches under various kinds of image processing distortions. By exploiting the properties of discrete polar Fourier transform and incorporating cryptographic keys, the proposed image hash is resilient to geometric and filtering operations, and is secure against guessing and forgery attacks.
%B Multimedia Signal Processing, 2004 IEEE 6th Workshop on
%P 355 - 358
%8 2004/10/01/sept
%G eng
%R 10.1109/MMSP.2004.1436566
%0 Conference Paper
%B Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
%D 2004
%T Iterative figure-ground discrimination
%A Zhao, L.
%A Davis, Larry S.
%K algorithm;
%K analysis;
%K Bandwidth
%K calculation;
%K Color
%K colour
%K Computer
%K density
%K dimensional
%K discrimination;
%K distribution;
%K distributions;
%K Estimation
%K estimation;
%K expectation
%K figure
%K Gaussian
%K ground
%K image
%K initialization;
%K iterative
%K Kernel
%K low
%K methods;
%K mixture;
%K model
%K model;
%K nonparametric
%K parameter
%K parametric
%K processes;
%K sampling
%K sampling;
%K segmentation
%K segmentation;
%K statistics;
%K theory;
%K vision;
%X Figure-ground discrimination is an important problem in computer vision. Previous work usually assumes that the color distribution of the figure can be described by a low dimensional parametric model such as a mixture of Gaussians. However, such approach has difficulty selecting the number of mixture components and is sensitive to the initialization of the model parameters. In this paper, we employ non-parametric kernel estimation for color distributions of both the figure and background. We derive an iterative sampling-expectation (SE) algorithm for estimating the color, distribution and segmentation. There are several advantages of kernel-density estimation. First, it enables automatic selection of weights of different cues based on the bandwidth calculation from the image itself. Second, it does not require model parameter initialization and estimation. The experimental results on images of cluttered scenes demonstrate the effectiveness of the proposed algorithm.
%B Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
%V 1
%P 67 - 70 Vol.1 - 67 - 70 Vol.1
%8 2004/08//
%G eng
%R 10.1109/ICPR.2004.1334006
%0 Conference Paper
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%D 2004
%T Multi-level fast multipole method for thin plate spline evaluation
%A Zandifar,A.
%A Lim,S.
%A Duraiswami, Ramani
%A Gumerov, Nail A.
%A Davis, Larry S.
%K (mathematics);
%K Computer
%K deformation;
%K evaluation;
%K fast
%K image
%K MATCHING
%K matching;
%K metal
%K method;
%K multilevel
%K multipole
%K nonrigid
%K pixel;
%K plate
%K plate;
%K processing;
%K registration;
%K resolution;
%K spline
%K splines
%K thin
%K vision;
%X Image registration is an important problem in image processing and computer vision. Much recent work in image registration is on matching non-rigid deformations. Thin plate splines are an effective image registration method when the deformation between two images can be modeled as the bending of a thin metal plate on point constraints such that the topology is preserved (non-rigid deformation). However, because evaluating the computed TPS model at all the image pixels is computationally expensive, we need to speed it up. We introduce the use of multi-level fast muitipole method (MLFMM) for this purpose. Our contribution lies in the presentation of a clear and concise MLFMM framework for TPS, which will be useful for future application developments. The achieved speedup using MLFMM is an improvement from O(N^{2}) to O(N log N). We show that the fast evaluation outperforms the brute force method while maintaining acceptable error bound.
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%V 3
%P 1683 - 1686 Vol. 3 - 1683 - 1686 Vol. 3
%8 2004/10//
%G eng
%R 10.1109/ICIP.2004.1421395
%0 Conference Paper
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%D 2004
%T Multiple view tracking of humans modelled by kinematic chains
%A Sundaresan, A.
%A Chellapa, Rama
%A RoyChowdhury, R.
%K 3D
%K algorithm;
%K analysis;
%K body
%K calibrated
%K cameras;
%K chain
%K displacement;
%K error
%K estimation;
%K human
%K image
%K iterative
%K kinematic
%K kinematics;
%K methods;
%K model;
%K MOTION
%K motion;
%K multiple
%K parameters;
%K perspective
%K Pixel
%K processing;
%K projection
%K sequences;
%K signal
%K tracking;
%K video
%K view
%X We use a kinematic chain to model human body motion. We estimate the kinematic chain motion parameters using pixel displacements calculated from video sequences obtained from multiple calibrated cameras to perform tracking. We derive a linear relation between the 2D motion of pixels in terms of the 3D motion parameters of various body parts using a perspective projection model for the cameras, a rigid body motion model for the base body and the kinematic chain model for the body parts. An error analysis of the estimator is provided, leading to an iterative algorithm for calculating the motion parameters from the pixel displacements. We provide experimental results to demonstrate the accuracy of our formulation. We also compare our iterative algorithm to the noniterative algorithm and discuss its robustness in the presence of noise.
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%V 2
%P 1009 - 1012 Vol.2 - 1009 - 1012 Vol.2
%8 2004/10//
%G eng
%R 10.1109/ICIP.2004.1419472
%0 Journal Article
%J Magnetics, IEEE Transactions on
%D 2004
%T A novel approach to removing intersymbol interference from spin-stand images
%A Mayergoyz, Issak D
%A Tse,C.
%A Krafft,C.
%A McAvoy,P.
%K algorithm;
%K characterization;
%K Deconvolution
%K function
%K function;
%K giant
%K giant-magnetoresistive
%K Head
%K heads;
%K image
%K image;
%K imaging;
%K Interference
%K interference;
%K intersymbol
%K ISI-distorted
%K ISI-free
%K magnetic
%K magnetization
%K magnetoresistance;
%K patterns;
%K readback
%K removal;
%K response
%K signal;
%K spin-stand
%K suppression;
%X A novel intersymbol interference (ISI) removal technique based on the "response function" characterization of giant-magnetoresistive heads is presented. It is demonstrated that the ISI-free readback image that corresponds to the actual underlying magnetization patterns can be extracted from the ISI-distorted readback signal through deconvolution. A new image deconvolution algorithm has been implemented, and it has been shown that it effectively removes the ISI distortions.
%B Magnetics, IEEE Transactions on
%V 40
%P 2197 - 2199
%8 2004/07//
%@ 0018-9464
%G eng
%N 4
%R 10.1109/TMAG.2004.830153
%0 Conference Paper
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%D 2004
%T Object tracking by adaptive feature extraction
%A Han,Bohyung
%A Davis, Larry S.
%K adaptive
%K algorithm;
%K analysis;
%K colour
%K component
%K extraction;
%K feature
%K feature;
%K heterogeneous
%K image
%K image;
%K likelihood
%K Mean-shift
%K object
%K online
%K principal
%K tracking
%K tracking;
%X Tracking objects in the high-dimensional feature space is not only computationally expensive but also functionally inefficient. Selecting a low-dimensional discriminative feature set is a critical step to improve tracker performance. A good feature set for tracking can differ from frame to frame due to the changes in the background against the tracked object, and due to an on-line algorithm that adaptively determines a advantageous distinctive feature set. In this paper, multiple heterogeneous features are assembled, and likelihood images are constructed for various subspaces of the combined feature space. Then, the most discriminative feature is extracted by principal component analysis (PCA) based on those likelihood images. This idea is applied to the mean-shift tracking algorithm [D. Comaniciu et al., June 2000], and we demonstrate its effectiveness through various experiments.
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%V 3
%P 1501 - 1504 Vol. 3 - 1501 - 1504 Vol. 3
%8 2004/10//
%G eng
%R 10.1109/ICIP.2004.1421349
%0 Conference Paper
%B Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on
%D 2004
%T Probabilistic identity characterization for face recognition
%A Zhou,S. K
%A Chellapa, Rama
%K characterization;
%K database
%K database;
%K encoding;
%K Face
%K identity
%K identity;
%K image
%K localization
%K management
%K object
%K PIE
%K probabilistic
%K problem;
%K recognition;
%K sequence;
%K sequences;
%K subspace
%K systems;
%K video
%X We present a general framework for characterizing the object identity in a single image or a group of images with each image containing a transformed version of the object, with applications to face recognition. In terms of the transformation, the group is made of either many still images or frames of a video sequence. The object identity is either discrete- or continuous-valued. This probabilistic framework integrates all the evidence of the set and handles the localization problem, illumination and pose variations through subspace identity encoding. Issues and challenges arising in this framework are addressed and efficient computational schemes are presented. Good face recognition results using the PIE database are reported.
%B Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on
%V 2
%P II-805 - II-812 Vol.2 - II-805 - II-812 Vol.2
%8 2004/07/02/june
%G eng
%R 10.1109/CVPR.2004.1315247
%0 Conference Paper
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%D 2004
%T Robust Bayesian cameras motion estimation using random sampling
%A Qian, G.
%A Chellapa, Rama
%A Qinfen Zheng
%K 3D
%K baseline
%K Bayesian
%K CAMERAS
%K cameras;
%K coarse-to-fine
%K consensus
%K density
%K estimation;
%K feature
%K function;
%K hierarchy
%K image
%K images;
%K importance
%K matching;
%K MOTION
%K posterior
%K probability
%K probability;
%K processing;
%K random
%K RANSAC;
%K real
%K realistic
%K sample
%K sampling;
%K scheme;
%K sequences;
%K stereo
%K strategy;
%K synthetic
%K wide
%X In this paper, we propose an algorithm for robust 3D motion estimation of wide baseline cameras from noisy feature correspondences. The posterior probability density function of the camera motion parameters is represented by weighted samples. The algorithm employs a hierarchy coarse-to-fine strategy. First, a coarse prior distribution of camera motion parameters is estimated using the random sample consensus scheme (RANSAC). Based on this estimate, a refined posterior distribution of camera motion parameters can then be obtained through importance sampling. Experimental results using both synthetic and real image sequences indicate the efficacy of the proposed algorithm.
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%V 2
%P 1361 - 1364 Vol.2 - 1361 - 1364 Vol.2
%8 2004/10//
%G eng
%R 10.1109/ICIP.2004.1419754
%0 Conference Paper
%B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
%D 2004
%T Robust two-camera tracking using homography
%A Yue,Zhanfeng
%A Zhou,S. K
%A Chellapa, Rama
%K Carlo
%K filter;
%K filters;
%K frame
%K framework;
%K homography;
%K image
%K method;
%K methods;
%K Monte
%K nonlinear
%K occlusions;
%K optical
%K particle
%K processing;
%K robust
%K sequences;
%K sequential
%K signal
%K statistics;
%K tracking
%K tracking;
%K two
%K two-camera
%K video
%K view
%K visual
%X The paper introduces a two view tracking method which uses the homography relation between the two views to handle occlusions. An adaptive appearance-based model is incorporated in a particle filter to realize robust visual tracking. Occlusion is detected using robust statistics. When there is occlusion in one view, the homography from this view to other views is estimated from previous tracking results and used to infer the correct transformation for the occluded view. Experimental results show the robustness of the two view tracker.
%B Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on
%V 3
%P iii - 1-4 vol.3 - iii - 1-4 vol.3
%8 2004/05//
%G eng
%R 10.1109/ICASSP.2004.1326466
%0 Conference Paper
%B Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on
%D 2004
%T Role of shape and kinematics in human movement analysis
%A Veeraraghavan,A.
%A Chowdhury, A.R.
%A Chellapa, Rama
%K activity
%K algorithm;
%K algorithms;
%K analysis;
%K autoregressive
%K average
%K based
%K classification;
%K community;
%K Computer
%K definition;
%K dynamical
%K extraction;
%K feature
%K Gait
%K hidden
%K human
%K identification
%K image
%K Kendall
%K linear
%K manifold;
%K Markov
%K modeling;
%K models;
%K MOTION
%K Movement
%K moving
%K processes;
%K recognition
%K sequences;
%K SHAPE
%K spherical
%K system;
%K VISION
%K vision;
%X Human gait and activity analysis from video is presently attracting a lot of attention in the computer vision community. In this paper we analyze the role of two of the most important cues in human motion-shape and kinematics. We present an experimental framework whereby it is possible to evaluate the relative importance of these two cues in computer vision based recognition algorithms. In the process, we propose a new gait recognition algorithm by computing the distance between two sequences of shapes that lie on a spherical manifold. In our experiments, shape is represented using Kendall's definition of shape. Kinematics is represented using a Linear Dynamical system We place particular emphasis on human gait. Our conclusions show that shape plays a role which is more significant than kinematics in current automated gait based human identification algorithms. As a natural extension we study the role of shape and kinematics in activity recognition. Our experiments indicate that we require models that contain both shape and kinematics in order to perform accurate activity classification. These conclusions also allow us to explain the relative performance of many existing methods in computer-based human activity modeling.
%B Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on
%V 1
%P I-730 - I-737 Vol.1 - I-730 - I-737 Vol.1
%8 2004/07/02/june
%G eng
%R 10.1109/CVPR.2004.1315104
%0 Conference Paper
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%D 2004
%T Simultaneous background and foreground modeling for tracking in surveillance video
%A Shao, J.
%A Zhou,S. K
%A Chellapa, Rama
%K algorithm;
%K analysis;
%K background-foreground
%K displacement
%K estimation;
%K image
%K information;
%K INTENSITY
%K modeling;
%K MOTION
%K processes;
%K processing;
%K resolution;
%K sequences;
%K signal
%K Stochastic
%K Surveillance
%K surveillance;
%K tracking
%K tracking;
%K video
%X We present a stochastic tracking algorithm for surveillance video where targets are dim and at low resolution. The algorithm builds motion models for both background and foreground by integrating motion and intensity information. Some other merits of the algorithm include adaptive selection of feature points for scene description and defining proper cost functions for displacement estimation. The experimental results show tracking robustness and precision in a challenging video sequences.
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%V 2
%P 1053 - 1056 Vol.2 - 1053 - 1056 Vol.2
%8 2004/10//
%G eng
%R 10.1109/ICIP.2004.1419483
%0 Conference Paper
%B Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
%D 2004
%T A system identification approach for video-based face recognition
%A Aggarwal,G.
%A Chowdhury, A.K.R.
%A Chellapa, Rama
%K and
%K autoregressive
%K average
%K dynamical
%K Face
%K gallery
%K identification;
%K image
%K linear
%K model;
%K moving
%K processes;
%K processing;
%K recognition;
%K sequences;
%K signal
%K system
%K system;
%K video
%K video-based
%X The paper poses video-to-video face recognition as a dynamical system identification and classification problem. We model a moving face as a linear dynamical system whose appearance changes with pose. An autoregressive and moving average (ARMA) model is used to represent such a system. The choice of ARMA model is based on its ability to take care of the change in appearance while modeling the dynamics of pose, expression etc. Recognition is performed using the concept of sub space angles to compute distances between probe and gallery video sequences. The results obtained are very promising given the extent of pose, expression and illumination variation in the video data used for experiments.
%B Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
%V 4
%P 175 - 178 Vol.4 - 175 - 178 Vol.4
%8 2004/08//
%G eng
%R 10.1109/ICPR.2004.1333732
%0 Conference Paper
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%D 2004
%T Uncalibrated stereo rectification for automatic 3D surveillance
%A Lim,S.-N.
%A Mittal,A.
%A Davis, Larry S.
%A Paragios,N.
%K 3D
%K AUTOMATIC
%K conjugate
%K epipolar
%K image
%K lines;
%K matching;
%K method;
%K processing;
%K rectification
%K scene;
%K stereo
%K surveillance;
%K uncalibrated
%K urban
%X We describe a stereo rectification method suitable for automatic 3D surveillance. We take advantage of the fact that in a typical urban scene, there is ordinarily a small number of dominant planes. Given two views of the scene, we align a dominant plane in one view with the other. Conjugate epipolar lines between the reference view and plane-aligned image become geometrically identical and can be added to the rectified image pair line by line. Selecting conjugate epipolar lines to cover the whole image is simplified since they are geometrically identical. In addition, the polarities of conjugate epipolar lines are automatically preserved by plane alignment, which simplifies stereo matching.
%B Image Processing, 2004. ICIP '04. 2004 International Conference on
%V 2
%P 1357 - 1360 Vol.2 - 1357 - 1360 Vol.2
%8 2004/10//
%G eng
%R 10.1109/ICIP.2004.1419753
%0 Conference Paper
%B Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on
%D 2004
%T View independent human body pose estimation from a single perspective image
%A Parameswaran, V.
%A Chellapa, Rama
%K 3D
%K analysis;
%K biomechanics;
%K body
%K body-centric
%K camera;
%K capture
%K coordinate
%K coordinates;
%K detection;
%K epipolar
%K equation
%K estimation;
%K geometry;
%K human
%K image
%K image;
%K images;
%K model-based
%K models;
%K MOTION
%K object
%K optical
%K perspective
%K physiological
%K polynomial
%K polynomials;
%K pose
%K real
%K single
%K synthetic
%K system;
%K systems;
%K torso
%K tracking;
%K twist;
%K uncalibrated
%X Recovering the 3D coordinates of various joints of the human body from an image is a critical first step for several model-based human tracking and optical motion capture systems. Unlike previous approaches that have used a restrictive camera model or assumed a calibrated camera, our work deals with the general case of a perspective uncalibrated camera and is thus well suited for archived video. The input to the system is an image of the human body and correspondences of several body landmarks, while the output is the set of 3D coordinates of the landmarks in a body-centric coordinate system. Using ideas from 3D model based invariants, we set up a polynomial system of equations in the unknown head pitch, yaw and roll angles. If we are able to make the often-valid assumption that the torso twist is small, there are finite numbers of solutions to the head-orientation that can be computed readily. Once the head orientation is computed, the epipolar geometry of the camera is recovered, leading to solutions to the 3D joint positions. Results are presented on synthetic and real images.
%B Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on
%V 2
%P II-16 - II-22 Vol.2 - II-16 - II-22 Vol.2
%8 2004/07/02/june
%G eng
%R 10.1109/CVPR.2004.1315139
%0 Journal Article
%J Multimedia, IEEE Transactions on
%D 2004
%T Wide baseline image registration with application to 3-D face modeling
%A Roy-Chowdhury, A.K.
%A Chellapa, Rama
%A Keaton, T.
%K 2D
%K 3D
%K algorithm;
%K baseline
%K biometrics;
%K Computer
%K configuration;
%K correspondence
%K doubly
%K error
%K extraction;
%K Face
%K feature
%K holistic
%K image
%K matching;
%K matrix;
%K minimization;
%K modeling;
%K models;
%K normalization
%K probability
%K probability;
%K procedure;
%K processes;
%K processing;
%K recognition;
%K registration;
%K representation;
%K sequences;
%K shapes;
%K Sinkhorn
%K spatial
%K statistics;
%K Stochastic
%K video
%K vision;
%K wide
%X Establishing correspondence between features in two images of the same scene taken from different viewing angles is a challenging problem in image processing and computer vision. However, its solution is an important step in many applications like wide baseline stereo, three-dimensional (3-D) model alignment, creation of panoramic views, etc. In this paper, we propose a technique for registration of two images of a face obtained from different viewing angles. We show that prior information about the general characteristics of a face obtained from video sequences of different faces can be used to design a robust correspondence algorithm. The method works by matching two-dimensional (2-D) shapes of the different features of the face (e.g., eyes, nose etc.). A doubly stochastic matrix, representing the probability of match between the features, is derived using the Sinkhorn normalization procedure. The final correspondence is obtained by minimizing the probability of error of a match between the entire constellation of features in the two sets, thus taking into account the global spatial configuration of the features. The method is applied for creating holistic 3-D models of a face from partial representations. Although this paper focuses primarily on faces, the algorithm can also be used for other objects with small modifications.
%B Multimedia, IEEE Transactions on
%V 6
%P 423 - 434
%8 2004/06//
%@ 1520-9210
%G eng
%N 3
%R 10.1109/TMM.2004.827511
%0 Conference Paper
%B Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on
%D 2004
%T Window-based, discontinuity preserving stereo
%A Agrawal,M.
%A Davis, Larry S.
%K algorithm;
%K approach;
%K based
%K cuts;
%K dense
%K discontinuity
%K global
%K graph
%K image
%K local
%K MATCHING
%K matching;
%K minimisation;
%K optimization;
%K Pixel
%K preserving
%K processing;
%K stereo
%K theory;
%K window
%X Traditionally, the problem of stereo matching has been addressed either by a local window-based approach or a dense pixel-based approach using global optimization. In this paper we present an algorithm which combines window-based local matching into a global optimization framework. Our local matching algorithm assumes that local windows can have at most two disparities. Under this assumption, the local matching can be performed very efficiently using graph cuts. The global matching is formulated as minimization of an energy term that takes into account the matching constraints induced by the local stereo algorithm. Fast, approximate minimization of this energy is achieved through graph cuts. The key feature of our algorithm is that it preserves discontinuities both during the local as well as global matching phase.
%B Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on
%V 1
%P I-66 - I-73 Vol.1 - I-66 - I-73 Vol.1
%8 2004/07/02/june
%G eng
%R 10.1109/CVPR.2004.1315015
%0 Journal Article
%J Image Processing, IEEE Transactions on
%D 2003
%T Accurate dense optical flow estimation using adaptive structure tensors and a parametric model
%A Liu,Haiying
%A Chellapa, Rama
%A Rosenfeld, A.
%K 3D
%K accuracy;
%K adaptive
%K and
%K coherent
%K confidence
%K dense
%K eigenfunctions;
%K eigenvalue
%K eigenvalues
%K eigenvectors;
%K estimation;
%K flow
%K generalized
%K ground
%K image
%K measure;
%K model;
%K MOTION
%K optical
%K parameter
%K parametric
%K problem;
%K real
%K region;
%K sequences;
%K structure
%K synthetic
%K tensor;
%K tensors;
%K three-dimensional
%K truth;
%X An accurate optical flow estimation algorithm is proposed in this paper. By combining the three-dimensional (3D) structure tensor with a parametric flow model, the optical flow estimation problem is converted to a generalized eigenvalue problem. The optical flow can be accurately estimated from the generalized eigenvectors. The confidence measure derived from the generalized eigenvalues is used to adaptively adjust the coherent motion region to further improve the accuracy. Experiments using both synthetic sequences with ground truth and real sequences illustrate our method. Comparisons with classical and recently published methods are also given to demonstrate the accuracy of our algorithm.
%B Image Processing, IEEE Transactions on
%V 12
%P 1170 - 1180
%8 2003/10//
%@ 1057-7149
%G eng
%N 10
%R 10.1109/TIP.2003.815296
%0 Conference Paper
%B Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on
%D 2003
%T Activity recognition using the dynamics of the configuration of interacting objects
%A Vaswani, N.
%A RoyChowdhury, A.
%A Chellapa, Rama
%K 2D
%K abnormal
%K abnormality
%K abnormality;
%K acoustic
%K activity
%K analysis;
%K change;
%K Computer
%K configuration
%K configuration;
%K data;
%K DETECTION
%K detection;
%K distribution;
%K drastic
%K dynamics;
%K event;
%K filter;
%K hand-picked
%K image
%K infrared
%K interacting
%K learning;
%K location
%K low
%K mean
%K model;
%K monitoring;
%K MOTION
%K moving
%K noise;
%K noisy
%K object
%K object;
%K observation
%K observation;
%K particle
%K pattern
%K plane;
%K point
%K polygonal
%K probability
%K probability;
%K problem;
%K processing;
%K radar
%K recognition;
%K resolution
%K sensor;
%K sensors;
%K sequence;
%K SHAPE
%K shape;
%K signal
%K slow
%K statistic;
%K strategy;
%K Surveillance
%K surveillance;
%K target
%K test
%K tracking;
%K video
%K video;
%K visible
%K vision;
%X Monitoring activities using video data is an important surveillance problem. A special scenario is to learn the pattern of normal activities and detect abnormal events from a very low resolution video where the moving objects are small enough to be modeled as point objects in a 2D plane. Instead of tracking each point separately, we propose to model an activity by the polygonal 'shape' of the configuration of these point masses at any time t, and its deformation over time. We learn the mean shape and the dynamics of the shape change using hand-picked location data (no observation noise) and define an abnormality detection statistic for the simple case of a test sequence with negligible observation noise. For the more practical case where observation (point locations) noise is large and cannot be ignored, we use a particle filter to estimate the probability distribution of the shape given the noisy observations up to the current time. Abnormality detection in this case is formulated as a change detection problem. We propose a detection strategy that can detect both 'drastic' and 'slow' abnormalities. Our framework can be directly applied for object location data obtained using any type of sensors - visible, radar, infrared or acoustic.
%B Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on
%V 2
%P II - 633-40 vol.2 - II - 633-40 vol.2
%8 2003/06//
%G eng
%R 10.1109/CVPR.2003.1211526
%0 Conference Paper
%B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
%D 2003
%T Adaptive visual tracking and recognition using particle filters
%A Zhou,Shaohua
%A Chellapa, Rama
%A Moghaddam, B.
%K adaptive
%K adaptive-velocity
%K appearance
%K extra-personal
%K Filtering
%K filters;
%K image
%K intra-personal
%K model;
%K MOTION
%K particle
%K processing;
%K recognition;
%K sequence;
%K sequences;
%K series
%K signal
%K spaces;
%K theory;
%K TIME
%K tracking;
%K video
%K visual
%X This paper presents an improved method for simultaneous tracking and recognition of human faces from video, where a time series model is used to resolve the uncertainties in tracking and recognition. The improvements mainly arise from three aspects: (i) modeling the inter-frame appearance changes within the video sequence using an adaptive appearance model and an adaptive-velocity motion model; (ii) modeling the appearance changes between the video frames and gallery images by constructing intra- and extra-personal spaces; and (iii) utilization of the fact that the gallery images are in frontal views. By embedding them in a particle filter, we are able to achieve a stabilized tracker and an accurate recognizer when confronted by pose and illumination variations.
%B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
%V 2
%P II - 349-52 vol.2 - II - 349-52 vol.2
%8 2003/07//
%G eng
%R 10.1109/ICME.2003.1221625
%0 Journal Article
%J Signal Processing, IEEE Transactions on
%D 2003
%T Anti-collusion fingerprinting for multimedia
%A Trappe,W.
%A M. Wu
%A Wang,Z.J.
%A Liu,K. J.R
%K (mathematics);
%K additive
%K algorithm;
%K and
%K anti-collusion
%K attack;
%K averaging
%K binary
%K code
%K codes;
%K codevectors;
%K coding;
%K colluders
%K collusion;
%K combinatorial
%K communication;
%K compression;
%K correlation;
%K cost-effective
%K data
%K data;
%K design
%K DETECTION
%K detection;
%K digital
%K embedding;
%K fingerprinting;
%K Gaussian
%K identification;
%K image
%K images;
%K keying;
%K logical
%K mathematics;
%K Modulation
%K modulation;
%K multimedia
%K multimedia;
%K of
%K on-off
%K operation;
%K orthogonal
%K processes;
%K real
%K redistribution;
%K Security
%K signal
%K signals;
%K theory;
%K tree-structured
%K TREES
%K watermarking;
%X Digital fingerprinting is a technique for identifying users who use multimedia content for unintended purposes, such as redistribution. These fingerprints are typically embedded into the content using watermarking techniques that are designed to be robust to a variety of attacks. A cost-effective attack against such digital fingerprints is collusion, where several differently marked copies of the same content are combined to disrupt the underlying fingerprints. We investigate the problem of designing fingerprints that can withstand collusion and allow for the identification of colluders. We begin by introducing the collusion problem for additive embedding. We then study the effect that averaging collusion has on orthogonal modulation. We introduce a tree-structured detection algorithm for identifying the fingerprints associated with K colluders that requires O(Klog(n/K)) correlations for a group of n users. We next develop a fingerprinting scheme based on code modulation that does not require as many basis signals as orthogonal modulation. We propose a new class of codes, called anti-collusion codes (ACCs), which have the property that the composition of any subset of K or fewer codevectors is unique. Using this property, we can therefore identify groups of K or fewer colluders. We present a construction of binary-valued ACC under the logical AND operation that uses the theory of combinatorial designs and is suitable for both the on-off keying and antipodal form of binary code modulation. In order to accommodate n users, our code construction requires only O( radic;n) orthogonal signals for a given number of colluders. We introduce three different detection strategies that can be used with our ACC for identifying a suspect set of colluders. We demonstrate the performance of our ACC for fingerprinting multimedia and identifying colluders through experiments using Gaussian signals and real images.
%B Signal Processing, IEEE Transactions on
%V 51
%P 1069 - 1087
%8 2003/04//
%@ 1053-587X
%G eng
%N 4
%R 10.1109/TSP.2003.809378
%0 Conference Paper
%B Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on
%D 2003
%T An appearance based approach for human and object tracking
%A Capellades,M. B
%A David Doermann
%A DeMenthon,D.
%A Chellapa, Rama
%K algorithm;
%K analysis;
%K background
%K basis;
%K by
%K Color
%K colour
%K correlogram
%K detection;
%K distributions;
%K frame
%K histogram
%K human
%K image
%K information;
%K object
%K processing;
%K segmentation;
%K sequences;
%K signal
%K subtraction
%K tracking;
%K video
%X A system for tracking humans and detecting human-object interactions in indoor environments is described. A combination of correlogram and histogram information is used to model object and human color distributions. Humans and objects are detected using a background subtraction algorithm. The models are built on the fly and used to track them on a frame by frame basis. The system is able to detect when people merge into groups and segment them during occlusion. Identities are preserved during the sequence, even if a person enters and leaves the scene. The system is also able to detect when a person deposits or removes an object from the scene. In the first case the models are used to track the object retroactively in time. In the second case the objects are tracked for the rest of the sequence. Experimental results using indoor video sequences are presented.
%B Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on
%V 2
%P II - 85-8 vol.3 - II - 85-8 vol.3
%8 2003/09//
%G eng
%R 10.1109/ICIP.2003.1246622
%0 Conference Paper
%B Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on
%D 2003
%T Classification-based spatial error concealment for images
%A Chen,M.
%A Wu,M.
%A Zheng,Y.
%K algorithm;
%K classification-based
%K classification;
%K classifier;
%K complexity;
%K computational
%K concealment
%K error
%K image
%K Machine;
%K machines;
%K moderate
%K scheme;
%K spatial
%K state-of-the-art
%K Support
%K SVM
%K vector
%X This paper presents a new, classification-based spatial error concealment algorithm for images. The proposed scheme takes advantage of two state-of-the-art concealment schemes and adaptively selects a better suitable concealment scheme for each corrupted block. Using a support vector machine (SVM) classifier, our proposed approach outperforms the prior art in terms of the concealment quality and has moderate computational complexity.
%B Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on
%V 2
%P II - 675-8 vol.3 - II - 675-8 vol.3
%8 2003/09//
%G eng
%R 10.1109/ICIP.2003.1246770
%0 Conference Paper
%B Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
%D 2003
%T Combining multiple evidences for gait recognition
%A Cuntoor, N.
%A Kale, A.
%A Chellapa, Rama
%K analysis;
%K dynamic
%K evidences;
%K extraction;
%K feature
%K features;
%K frontal
%K Gait
%K height;
%K human
%K identification;
%K image
%K MIN
%K multiple
%K nonprobabilistic
%K probabilistic
%K Product
%K recognition;
%K rules;
%K sets;
%K side
%K static
%K Sum
%K sway;
%K swing;
%K techniques;
%K views;
%X In this paper, we systematically analyze different components of human gait, for the purpose of human identification. We investigate dynamic features such as the swing of the hands/legs, the sway of the upper body and static features like height, in both frontal and side views. Both probabilistic and non-probabilistic techniques are used for matching the features. Various combination strategies may be used depending upon the gait features being combined. We discuss three simple rules: the Sum, Product and MIN rules that are relevant to our feature sets. Experiments using four different datasets demonstrate that fusion can be used as an effective strategy in recognition.
%B Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
%V 3
%P III - 33-6 vol.3 - III - 33-6 vol.3
%8 2003/04//
%G eng
%R 10.1109/ICASSP.2003.1199100
%0 Journal Article
%J Image Processing, IEEE Transactions on
%D 2003
%T Data hiding in image and video .I. Fundamental issues and solutions
%A M. Wu
%A Liu,Bede
%K adaptive
%K analysis;
%K bits;
%K colour
%K condition;
%K constant
%K CONTROL
%K data
%K embedded
%K EMBEDDING
%K embedding;
%K encapsulation;
%K extractable
%K hiding;
%K image
%K Modulation
%K modulation;
%K multilevel
%K multiplexing
%K multiplexing;
%K NOISE
%K nonstationary
%K processing;
%K rate;
%K reviews;
%K shuffling;
%K signal
%K signals;
%K simulation;
%K solution;
%K techniques;
%K variable
%K video
%K visual
%X We address a number of fundamental issues of data hiding in image and video and propose general solutions to them. We begin with a review of two major types of embedding, based on which we propose a new multilevel embedding framework to allow the amount of extractable data to be adaptive according to the actual noise condition. We then study the issues of hiding multiple bits through a comparison of various modulation and multiplexing techniques. Finally, the nonstationary nature of visual signals leads to highly uneven distribution of embedding capacity and causes difficulty in data hiding. We propose an adaptive solution switching between using constant embedding rate with shuffling and using variable embedding rate with embedded control bits. We verify the effectiveness of our proposed solutions through analysis and simulation.
%B Image Processing, IEEE Transactions on
%V 12
%P 685 - 695
%8 2003/06//
%@ 1057-7149
%G eng
%N 6
%R 10.1109/TIP.2003.810588
%0 Journal Article
%J Image Processing, IEEE Transactions on
%D 2003
%T Data hiding in image and video .II. Designs and applications
%A M. Wu
%A Yu,H.
%A Liu,Bede
%K access
%K annotation;
%K authentication;
%K capacity;
%K conditions;
%K content-based
%K CONTROL
%K control;
%K copy
%K data
%K distortions;
%K EMBEDDING
%K embedding;
%K encapsulation;
%K extraction;
%K frame
%K hiding;
%K image
%K information;
%K jitter;
%K message
%K multilevel
%K NOISE
%K noise;
%K payload
%K processing;
%K robust
%K signal
%K uneven
%K user
%K video
%X For pt. I see ibid., vol.12, no.6, p.685-95 (2003). This paper applies the solutions to the fundamental issues addressed in Part I to specific design problems of embedding data in image and video. We apply multilevel embedding to allow the amount of embedded information that can be reliably extracted to be adaptive with respect to the actual noise conditions. When extending the multilevel embedding to video, we propose strategies for handling uneven embedding capacity from region to region within a frame as well as from frame to frame. We also embed control information to facilitate the accurate extraction of the user data payload and to combat such distortions as frame jitter. The proposed algorithm can be used for a variety of applications such as copy control, access control, robust annotation, and content-based authentication.
%B Image Processing, IEEE Transactions on
%V 12
%P 696 - 705
%8 2003/06//
%@ 1057-7149
%G eng
%N 6
%R 10.1109/TIP.2003.810589
%0 Journal Article
%J Pattern Analysis and Machine Intelligence, IEEE Transactions on
%D 2003
%T Efficient kernel density estimation using the fast gauss transform with applications to color modeling and tracking
%A Elgammal,A.
%A Duraiswami, Ramani
%A Davis, Larry S.
%K algorithms;
%K Color
%K Computer
%K density
%K estimation;
%K fast
%K function;
%K Gauss
%K image
%K Kernel
%K modeling;
%K segmentation;
%K tracking;
%K transform;
%K transforms;
%K VISION
%K vision;
%X Many vision algorithms depend on the estimation of a probability density function from observations. Kernel density estimation techniques are quite general and powerful methods for this problem, but have a significant disadvantage in that they are computationally intensive. In this paper, we explore the use of kernel density estimation with the fast Gauss transform (FGT) for problems in vision. The FGT allows the summation of a mixture of ill Gaussians at N evaluation points in O(M+N) time, as opposed to O(MN) time for a naive evaluation and can be used to considerably speed up kernel density estimation. We present applications of the technique to problems from image segmentation and tracking and show that the algorithm allows application of advanced statistical techniques to solve practical vision problems in real-time with today's computers.
%B Pattern Analysis and Machine Intelligence, IEEE Transactions on
%V 25
%P 1499 - 1504
%8 2003/11//
%@ 0162-8828
%G eng
%N 11
%R 10.1109/TPAMI.2003.1240123
%0 Conference Paper
%B Image and Signal Processing and Analysis, 2003. ISPA 2003. Proceedings of the 3rd International Symposium on
%D 2003
%T Exemplar-based tracking and recognition of arm gestures
%A Elgammal,A.
%A Shet,V.
%A Yacoob,Yaser
%A Davis, Larry S.
%K arm
%K constrains;
%K correspondence-free
%K edge
%K exemplar-based
%K framework;
%K gesture
%K hidden
%K HMM;
%K image
%K logic;
%K Markov
%K MATCHING
%K matching;
%K model;
%K models;
%K probabilistic
%K recognition;
%K scheme;
%K segmentation;
%K temporal
%K tracking;
%K weighted
%X This paper presents a probabilistic exemplar-based framework for recognizing gestures. The approach is based on representing each gesture as a sequence of learned body poses. The gestures are recognized through a probabilistic framework for matching these body poses and for imposing temporal constrains between different poses. Matching individual poses to image data is performed using a probabilistic formulation for edge matching to obtain a likelihood measurement for each individual pose. The paper introduces a correspondence-free weighted matching scheme for edge templates that emphasize discriminating features in the matching. The weighting does not require establishing correspondences between the different pose models. The probabilistic framework also imposes temporal constrains between different pose through a learned hidden Markov model (HMM) of each gesture.
%B Image and Signal Processing and Analysis, 2003. ISPA 2003. Proceedings of the 3rd International Symposium on
%V 2
%P 656 - 661 Vol.2 - 656 - 661 Vol.2
%8 2003/09//
%G eng
%R 10.1109/ISPA.2003.1296358
%0 Conference Paper
%B Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on
%D 2003
%T A hidden Markov model based framework for recognition of humans from gait sequences
%A Sundaresan,Aravind
%A RoyChowdhury,Amit
%A Chellapa, Rama
%K analysis;
%K background-subtracted
%K binarized
%K discrete
%K distance
%K feature
%K Gait
%K hidden
%K human
%K image
%K image;
%K Markov
%K metrics;
%K model;
%K models;
%K postures;
%K recognition;
%K sequences;
%K vector;
%X In this paper we propose a generic framework based on hidden Markov models (HMMs) for recognition of individuals from their gait. The HMM framework is suitable, because the gait of an individual can be visualized as his adopting postures from a set, in a sequence which has an underlying structured probabilistic nature. The postures that the individual adopts can be regarded as the states of the HMM and are typical to that individual and provide a means of discrimination. The framework assumes that, during gait, the individual transitions between N discrete postures or states but it is not dependent on the particular feature vector used to represent the gait information contained in the postures. The framework, thus, provides flexibility in the selection of the feature vector. The statistical nature of the HMM lends robustness to the model. In this paper we use the binarized background-subtracted image as the feature vector and use different distance metrics, such as those based on the L_{1} and L_{2} norms of the vector difference, and the normalized inner product of the vectors, to measure the similarity between feature vectors. The results we obtain are better than the baseline recognition rates reported before.
%B Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on
%V 2
%P II - 93-6 vol.3 - II - 93-6 vol.3
%8 2003/09//
%G eng
%R 10.1109/ICIP.2003.1246624
%0 Conference Paper
%B Proceedings. IEEE Conference on Advanced Video and Signal Based Surveillance, 2003.
%D 2003
%T Human body pose estimation using silhouette shape analysis
%A Mittal,A.
%A Liang Zhao
%A Davis, Larry S.
%K 3D
%K analysis;
%K body
%K classification;
%K clutter;
%K detection;
%K estimation;
%K extraction;
%K feature
%K function;
%K human
%K image
%K likelihood
%K multiple
%K object
%K parameter
%K parameters;
%K Pixel
%K pose
%K probability;
%K segmentation;
%K segmentations;
%K SHAPE
%K silhouette
%K structure;
%K surveillance;
%K views;
%X We describe a system for human body pose estimation from multiple views that is fast and completely automatic. The algorithm works in the presence of multiple people by decoupling the problems of pose estimation of different people. The pose is estimated based on a likelihood function that integrates information from multiple views and thus obtains a globally optimal solution. Other characteristics that make our method more general than previous work include: (1) no manual initialization; (2) no specification of the dimensions of the 3D structure; (3) no reliance on some learned poses or patterns of activity; (4) insensitivity to edges and clutter in the background and within the foreground. The algorithm has applications in surveillance and promising results have been obtained.
%B Proceedings. IEEE Conference on Advanced Video and Signal Based Surveillance, 2003.
%P 263 - 270
%8 2003/07//
%G eng
%R 10.1109/AVSS.2003.1217930
%0 Conference Paper
%B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
%D 2003
%T Image-based pan-tilt camera control in a multi-camera surveillance environment
%A Lim,Ser-Nam
%A Elgammal,A.
%A Davis, Larry S.
%K automated
%K camera
%K cameras;
%K control;
%K detection;
%K environment;
%K image
%K image-based
%K information;
%K multicamera
%K object
%K pan-tilt
%K position;
%K processing;
%K sensors;
%K Surveillance
%K surveillance;
%K systems;
%K vantage
%K zero-position;
%K zero-positions;
%X In automated surveillance systems with multiple cameras, the system must be able to position the cameras accurately. Each camera must be able to pan-tilt such that an object detected in the scene is in a vantage position in the camera's image plane and subsequently capture images of that object. Typically, camera calibration is required. We propose an approach that uses only image-based information. Each camera is assigned a pan-tilt zero-position. Position of an object detected in one camera is related to the other cameras by homographies between the zero-positions while different pan-tilt positions of the same camera are related in the form of projective rotations. We then derive that the trajectories in the image plane corresponding to these projective rotations are approximately circular for pan and linear for tilt. The camera control technique is subsequently tested in a working prototype.
%B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
%V 1
%P I - 645-8 vol.1 - I - 645-8 vol.1
%8 2003/07//
%G eng
%R 10.1109/ICME.2003.1221000
%0 Journal Article
%J Pattern Analysis and Machine Intelligence, IEEE Transactions on
%D 2003
%T Lambertian reflectance and linear subspaces
%A Basri,R.
%A Jacobs, David W.
%K 2D
%K 4D
%K 9D
%K analog;
%K analytic
%K characterization;
%K convex
%K convolution
%K distant
%K functions;
%K harmonics;
%K image
%K image;
%K intensities;
%K Lambertian
%K light
%K lighting
%K linear
%K methods;
%K nonnegative
%K normals;
%K object
%K optimization;
%K programming;
%K query
%K recognition;
%K reflectance;
%K reflectivity;
%K set;
%K sources;
%K space;
%K spherical
%K subspace;
%K subspaces;
%K surface
%X We prove that the set of all Lambertian reflectance functions (the mapping from surface normals to intensities) obtained with arbitrary distant light sources lies close to a 9D linear subspace. This implies that, in general, the set of images of a convex Lambertian object obtained under a wide variety of lighting conditions can be approximated accurately by a low-dimensional linear subspace, explaining prior empirical results. We also provide a simple analytic characterization of this linear space. We obtain these results by representing lighting using spherical harmonics and describing the effects of Lambertian materials as the analog of a convolution. These results allow us to construct algorithms for object recognition based on linear methods as well as algorithms that use convex optimization to enforce nonnegative lighting functions. We also show a simple way to enforce nonnegative lighting when the images of an object lie near a 4D linear space. We apply these algorithms to perform face recognition by finding the 3D model that best matches a 2D query image.
%B Pattern Analysis and Machine Intelligence, IEEE Transactions on
%V 25
%P 218 - 233
%8 2003/02//
%@ 0162-8828
%G eng
%N 2
%R 10.1109/TPAMI.2003.1177153
%0 Conference Paper
%B Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on
%D 2003
%T Learning dynamics for exemplar-based gesture recognition
%A Elgammal,A.
%A Shet,V.
%A Yacoob,Yaser
%A Davis, Larry S.
%K arbitrary
%K body
%K by
%K Computer
%K constraint;
%K detection;
%K discrete
%K distribution
%K dynamics;
%K edge
%K estimation;
%K example;
%K exemplar
%K exemplar-based
%K extraction;
%K feature
%K framework;
%K gesture
%K gesture;
%K hidden
%K HMM;
%K human
%K image
%K learning
%K Markov
%K matching;
%K model;
%K models;
%K motion;
%K nonparametric
%K pose
%K probabilistic
%K recognition;
%K sequence;
%K space;
%K state;
%K statistics;
%K system
%K temporal
%K tool;
%K view-based
%K vision;
%X This paper addresses the problem of capturing the dynamics for exemplar-based recognition systems. Traditional HMM provides a probabilistic tool to capture system dynamics and in exemplar paradigm, HMM states are typically coupled with the exemplars. Alternatively, we propose a non-parametric HMM approach that uses a discrete HMM with arbitrary states (decoupled from exemplars) to capture the dynamics over a large exemplar space where a nonparametric estimation approach is used to model the exemplar distribution. This reduces the need for lengthy and non-optimal training of the HMM observation model. We used the proposed approach for view-based recognition of gestures. The approach is based on representing each gesture as a sequence of learned body poses (exemplars). The gestures are recognized through a probabilistic framework for matching these body poses and for imposing temporal constraints between different poses using the proposed non-parametric HMM.
%B Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on
%V 1
%P I-571 - I-578 vol.1 - I-571 - I-578 vol.1
%8 2003/06//
%G eng
%R 10.1109/CVPR.2003.1211405
%0 Conference Paper
%B Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on
%D 2003
%T Mean-shift analysis using quasiNewton methods
%A Yang,C.
%A Duraiswami, Ramani
%A DeMenthon,D.
%A Davis, Larry S.
%K analysis;
%K classification;
%K clustering
%K clustering;
%K Convergence
%K density
%K estimation;
%K feature-space
%K image
%K irregular
%K iterative
%K Mean-shift
%K method;
%K methods;
%K Newton
%K nonparametric
%K pattern
%K procedure;
%K quasiNewton
%K rates;
%K segmentation;
%K technique;
%K topography;
%X Mean-shift analysis is a general nonparametric clustering technique based on density estimation for the analysis of complex feature spaces. The algorithm consists of a simple iterative procedure that shifts each of the feature points to the nearest stationary point along the gradient directions of the estimated density function. It has been successfully applied to many applications such as segmentation and tracking. However, despite its promising performance, there are applications for which the algorithm converges too slowly to be practical. We propose and implement an improved version of the mean-shift algorithm using quasiNewton methods to achieve higher convergence rates. Another benefit of our algorithm is its ability to achieve clustering even for very complex and irregular feature-space topography. Experimental results demonstrate the efficiency and effectiveness of our algorithm.
%B Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on
%V 2
%P II - 447-50 vol.3 - II - 447-50 vol.3
%8 2003/09//
%G eng
%R 10.1109/ICIP.2003.1246713
%0 Conference Paper
%B Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on
%D 2003
%T Probabilistic tracking in joint feature-spatial spaces
%A Elgammal,A.
%A Duraiswami, Ramani
%A Davis, Larry S.
%K analysis;
%K appearance
%K appearance;
%K color;
%K colour
%K Computer
%K constraint;
%K deformation;
%K detection;
%K distribution;
%K edge
%K estimation;
%K extraction;
%K feature
%K feature-spatial
%K feature;
%K function
%K gradient;
%K image
%K intensity;
%K joint
%K likelihood
%K local
%K maximization;
%K maximum
%K nonparametric
%K object
%K objective
%K occlusion;
%K optical
%K partial
%K probabilistic
%K probability;
%K region
%K representation;
%K row
%K similarity-based
%K small
%K space;
%K structure;
%K target
%K tracker;
%K tracking;
%K transformation
%K vision;
%X In this paper, we present a probabilistic framework for tracking regions based on their appearance. We exploit the feature-spatial distribution of a region representing an object as a probabilistic constraint to track that region over time. The tracking is achieved by maximizing a similarity-based objective function over transformation space given a nonparametric representation of the joint feature-spatial distribution. Such a representation imposes a probabilistic constraint on the region feature distribution coupled with the region structure, which yields an appearance tracker that is robust to small local deformations and partial occlusion. We present the approach for the general form of joint feature-spatial distributions and apply it to tracking with different types of image features including row intensity, color and image gradient.
%B Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on
%V 1
%P I-781 - I-788 vol.1 - I-781 - I-788 vol.1
%8 2003/06//
%G eng
%R 10.1109/CVPR.2003.1211432
%0 Conference Paper
%B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
%D 2003
%T Shape and motion driven particle filtering for human body tracking
%A Yamamoto, T.
%A Chellapa, Rama
%K 3D
%K body
%K broadcast
%K camera;
%K cameras;
%K estimation;
%K Filtering
%K framework;
%K human
%K image
%K MOTION
%K motion;
%K particle
%K processing;
%K rotational
%K sequence;
%K sequences;
%K signal
%K single
%K static
%K theory;
%K tracking;
%K TV
%K video
%X In this paper, we propose a method to recover 3D human body motion from a video acquired by a single static camera. In order to estimate the complex state distribution of a human body, we adopt the particle filtering framework. We present the human body using several layers of representation and compose the whole body step by step. In this way, more effective particles are generated and ineffective particles are removed as we process each layer. In order to deal with the rotational motion, the frequency of rotation is obtained using a preprocessing operation. In the preprocessing step, the variance of the motion field at each image is computed, and the frequency of rotation is estimated. The estimated frequency is used for the state update in the algorithm. We successfully track the movement of figure skaters in TV broadcast image sequence, and recover the 3D shape and motion of the skater.
%B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
%V 3
%P III - 61-4 vol.3 - III - 61-4 vol.3
%8 2003/07//
%G eng
%R 10.1109/ICME.2003.1221248
%0 Conference Paper
%B Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on
%D 2003
%T Simultaneous pose and correspondence determination using line features
%A David,P.
%A DeMenthon,D.
%A Duraiswami, Ramani
%A Samet, Hanan
%K algorithm;
%K algorithms;
%K annealing;
%K clutter;
%K cluttered
%K Computer
%K correspondence
%K detection;
%K determination;
%K deterministic
%K environment;
%K extraction;
%K feature
%K feature;
%K image
%K image;
%K imagery;
%K images;
%K joint
%K line
%K local
%K man-made
%K MATCHING
%K matching;
%K measurement;
%K model-to-image
%K noise;
%K occlusion;
%K optimum;
%K perspective
%K point
%K pose
%K position
%K problem;
%K processing;
%K real
%K realistic
%K registration
%K simulated
%K softassign;
%K SoftPOSIT
%K stereo
%K synthetic
%K vision;
%X We present a new robust line matching algorithm for solving the model-to-image registration problem. Given a model consisting of 3D lines and a cluttered perspective image of this model, the algorithm simultaneously estimates the pose of the model and the correspondences of model lines to image lines. The algorithm combines softassign for determining correspondences and POSIT for determining pose. Integrating these algorithms into a deterministic annealing procedure allows the correspondence and pose to evolve from initially uncertain values to a joint local optimum. This research extends to line features the SoftPOSIT algorithm proposed recently for point features. Lines detected in images are typically more stable than points and are less likely to be produced by clutter and noise, especially in man-made environments. Experiments on synthetic and real imagery with high levels of clutter, occlusion, and noise demonstrate the robustness of the algorithm.
%B Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on
%V 2
%P II-424 - II-431 vol.2 - II-424 - II-431 vol.2
%8 2003/06//
%G eng
%R 10.1109/CVPR.2003.1211499
%0 Conference Paper
%B Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
%D 2003
%T Statistical shape theory for activity modeling
%A Vaswani, N.
%A Chowdhury, A.R.
%A Chellapa, Rama
%K abnormal
%K activities
%K activity
%K analysis;
%K behavior;
%K classification;
%K data;
%K image
%K mass;
%K matching;
%K modeling;
%K monitoring;
%K moving
%K normal
%K particle;
%K pattern
%K pattern;
%K point
%K polygonal
%K probability;
%K problem;
%K processing;
%K sequence;
%K sequences;
%K SHAPE
%K shape;
%K signal
%K statistical
%K Surveillance
%K surveillance;
%K theory;
%K video
%X Monitoring activities in a certain region from video data is an important surveillance problem. The goal is to learn the pattern of normal activities and detect unusual ones by identifying activities that deviate appreciably from the typical ones. We propose an approach using statistical shape theory based on the shape model of D.G. Kendall et al. (see "Shape and Shape Theory", John Wiley and Sons, 1999). In a low resolution video, each moving object is best represented as a moving point mass or particle. In this case, an activity can be defined by the interactions of all or some of these moving particles over time. We model this configuration of the particles by a polygonal shape formed from the locations of the points in a frame and the activity by the deformation of the polygons in time. These parameters are learned for each typical activity. Given a test video sequence, an activity is classified as abnormal if the probability for the sequence (represented by the mean shape and the dynamics of the deviations), given the model, is below a certain threshold The approach gives very encouraging results in surveillance applications using a single camera and is able to identify various kinds of abnormal behavior.
%B Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
%V 3
%P III - 493-6 vol.3 - III - 493-6 vol.3
%8 2003/04//
%G eng
%R 10.1109/ICASSP.2003.1199519
%0 Conference Paper
%B Proceedings. IEEE Conference on Advanced Video and Signal Based Surveillance, 2003.
%D 2003
%T Towards a view invariant gait recognition algorithm
%A Kale, A.
%A Chowdhury, A.K.R.
%A Chellapa, Rama
%K (access
%K algorithm;
%K analysis;
%K Biometrics
%K biometrics;
%K Calibration
%K calibration;
%K camera
%K canonical
%K control);
%K equations;
%K flow;
%K Gait
%K gait;
%K human
%K image
%K invariant
%K model;
%K MOTION
%K optical
%K perspective
%K phenomenon;
%K projection
%K recognition
%K scheme;
%K sequences;
%K spatio-temporal
%K view
%K view;
%X Human gait is a spatio-temporal phenomenon and typifies the motion characteristics of an individual. The gait of a person is easily recognizable when extracted from a side-view of the person. Accordingly, gait-recognition algorithms work best when presented with images where the person walks parallel to the camera image plane. However, it is not realistic to expect this assumption to be valid in most real-life scenarios. Hence, it is important to develop methods whereby the side-view can be generated from any other arbitrary view in a simple, yet accurate, manner. This is the main theme of the paper. We show that if the person is far enough from the camera, it is possible to synthesize a side view (referred to as canonical view) from any other arbitrary view using a single camera. Two methods are proposed for doing this: (i) using the perspective projection model; (ii) using the optical flow based structure from motion equations. A simple camera calibration scheme for this method is also proposed. Examples of synthesized views are presented. Preliminary testing with gait recognition algorithms gives encouraging results. A by-product of this method is a simple algorithm for synthesizing novel views of a planar scene.
%B Proceedings. IEEE Conference on Advanced Video and Signal Based Surveillance, 2003.
%P 143 - 150
%8 2003/07//
%G eng
%R 10.1109/AVSS.2003.1217914
%0 Conference Paper
%B Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on
%D 2003
%T Using specularities for recognition
%A Osadchy,M.
%A Jacobs, David W.
%A Ramamoorthi,R.
%K 3D
%K formation;object
%K glass;computer
%K image
%K information;specular
%K light
%K measurement;reflection;stereo
%K objects;specular
%K objects;wine
%K processing;
%K property;pottery;recognition
%K recognition;object
%K recognition;position
%K reflectance
%K reflection;compact
%K reflection;transparent
%K shape;Lambertian
%K source;highlight
%K systems;shiny
%K vision;lighting;object
%X Recognition systems have generally treated specular highlights as noise. We show how to use these highlights as a positive source of information that improves recognition of shiny objects. This also enables us to recognize very challenging shiny transparent objects, such as wine glasses. Specifically, we show how to find highlights that are consistent with a hypothesized pose of an object of known 3D shape. We do this using only a qualitative description of highlight formation that is consistent with most models of specular reflection, so no specific knowledge of an object's reflectance properties is needed. We first present a method that finds highlights produced by a dominant compact light source, whose position is roughly known. We then show how to estimate the lighting automatically for objects whose reflection is part specular and part Lambertian. We demonstrate this method for two classes of objects. First, we show that specular information alone can suffice to identify objects with no Lambertian reflectance, such as transparent wine glasses. Second, we use our complete system to recognize shiny objects, such as pottery.
%B Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on
%P 1512 -1519 vol.2 - 1512 -1519 vol.2
%8 2003/10//
%G eng
%R 10.1109/ICCV.2003.1238669
%0 Conference Paper
%B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
%D 2003
%T Video based rendering of planar dynamic scenes
%A Kale, A.
%A Chowdhury, A.K.R.
%A Chellapa, Rama
%K (computer
%K 3D
%K analysis;
%K approximation;
%K based
%K camera;
%K cameras;
%K direction;
%K dynamic
%K graphics);
%K image
%K monocular
%K MOTION
%K perspective
%K planar
%K processing;
%K rendering
%K rendering;
%K scenes;
%K sequence;
%K sequences;
%K signal
%K video
%K weak
%X In this paper, we propose a method to synthesize arbitrary views of a planar scene from a monocular video sequence of it. The 3-D direction of motion of the object is robustly estimated from the video sequence. Given this direction any other view of the object can be synthesized through a perspective projection approach, under assumptions of planarity. If the distance of the object from the camera is large, a planar approximation is reasonable even for non-planar scenes. Such a method has many important applications, one of them being gait recognition where a side view of the person is required. Our method can be used to synthesize the side-view of the person in case he/she does not present a side view to the camera. Since the planarity assumption is often an approximation, the effects of non-planarity can lead to inaccuracies in rendering and needs to be corrected for. Regions where this happens are examined and a simple technique based on weak perspective approximation is proposed to offset rendering inaccuracies. Examples of synthesized views using our method and performance evaluation are presented.
%B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
%V 1
%P I - 477-80 vol.1 - I - 477-80 vol.1
%8 2003/07//
%G eng
%R 10.1109/ICME.2003.1220958
%0 Conference Paper
%B Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
%D 2003
%T Video synthesis of arbitrary views for approximately planar scenes
%A Chowdhury, A.K.R.
%A Kale, A.
%A Chellapa, Rama
%K (access
%K 3D
%K applications;
%K approach;
%K approximately
%K approximation;
%K arbitrary
%K Biometrics
%K control);
%K data;
%K direction
%K estimation;
%K evaluation;
%K Gait
%K image
%K monocular
%K MOTION
%K performance
%K perspective
%K planar
%K processing;
%K projection
%K recognition;
%K recovery;
%K scenes;
%K sequence;
%K sequences;
%K side
%K signal
%K structure;
%K Surveillance
%K surveillance;
%K synthesis;
%K synthesized
%K video
%K view
%K views;
%X In this paper, we propose a method to synthesize arbitrary views of a planar scene, given a monocular video sequence. The method is based on the availability of knowledge of the angle between the original and synthesized views. Such a method has many important applications, one of them being gait recognition. Gait recognition algorithms rely on the availability of an approximate side-view of the person. From a realistic viewpoint, such an assumption is impractical in surveillance applications and it is of interest to develop methods to synthesize a side view of the person, given an arbitrary view. For large distances from the camera, a planar approximation for the individual can be assumed. In this paper, we propose a perspective projection approach for recovering the direction of motion of the person purely from the video data, followed by synthesis of a new video sequence at a different angle. The algorithm works purely in the image and video domain, though 3D structure plays an implicit role in its theoretical justification. Examples of synthesized views using our method and performance evaluation are presented.
%B Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
%V 3
%P III - 497-500 vol.3 - III - 497-500 vol.3
%8 2003/04//
%G eng
%R 10.1109/ICASSP.2003.1199520
%0 Conference Paper
%B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
%D 2003
%T View synthesis of articulating humans using visual hull
%A Yue,Zhanfeng
%A Liang Zhao
%A Chellapa, Rama
%K analysis;
%K body
%K convex
%K gesture
%K hull;
%K human
%K image
%K image-based
%K image;
%K mapping;
%K MOTION
%K part
%K parts;
%K postures;
%K recognition;
%K reconstruction;
%K segmentation;
%K silhouette
%K synthesis;
%K TEXTURE
%K texture;
%K view
%K virtual
%K visual
%X In this paper, we present a method, which combines image-based visual hull and human body part segmentation for overcoming the inability of the visual hull method to reconstruct concave regions. The virtual silhouette image corresponding to the given viewing direction is first produced with image-based visual hull. Human body part localization technique is used to segment the input images and the rendered virtual silhouette image into convex body parts. The body parts in the virtual view are generated separately from the corresponding body parts in the input views and then assembled together. The previously rendered silhouette image is used to locate the corresponding body parts in input views and avoid the unconnected or squeezed regions in the assembled final view. Experiments show that this method can improve the reconstruction of concave regions for human postures and texture mapping.
%B Multimedia and Expo, 2003. ICME '03. Proceedings. 2003 International Conference on
%V 1
%P I - 489-92 vol.1 - I - 489-92 vol.1
%8 2003/07//
%G eng
%R 10.1109/ICME.2003.1220961
%0 Conference Paper
%B Multimedia and Expo, 2002. ICME '02. Proceedings. 2002 IEEE International Conference on
%D 2002
%T 3D face reconstruction from video using a generic model
%A Chowdhury, A.R.
%A Chellapa, Rama
%A Krishnamurthy, S.
%A Vo, T.
%K 3D
%K algorithm;
%K algorithms;
%K analysis;
%K Carlo
%K chain
%K Computer
%K Face
%K from
%K function;
%K generic
%K human
%K image
%K Markov
%K MCMC
%K methods;
%K model;
%K Monte
%K MOTION
%K optimisation;
%K OPTIMIZATION
%K processes;
%K processing;
%K recognition;
%K reconstruction
%K reconstruction;
%K sampling;
%K sequence;
%K sequences;
%K SfM
%K signal
%K structure
%K surveillance;
%K video
%K vision;
%X Reconstructing a 3D model of a human face from a video sequence is an important problem in computer vision, with applications to recognition, surveillance, multimedia etc. However, the quality of 3D reconstructions using structure from motion (SfM) algorithms is often not satisfactory. One common method of overcoming this problem is to use a generic model of a face. Existing work using this approach initializes the reconstruction algorithm with this generic model. The problem with this approach is that the algorithm can converge to a solution very close to this initial value, resulting in a reconstruction which resembles the generic model rather than the particular face in the video which needs to be modeled. We propose a method of 3D reconstruction of a human face from video in which the 3D reconstruction algorithm and the generic model are handled separately. A 3D estimate is obtained purely from the video sequence using SfM algorithms without use of the generic model. The final 3D model is obtained after combining the SfM estimate and the generic model using an energy function that corrects for the errors in the estimate by comparing local regions in the two models. The optimization is done using a Markov chain Monte Carlo (MCMC) sampling strategy. The main advantage of our algorithm over others is that it is able to retain the specific features of the face in the video sequence even when these features are different from those of the generic model. The evolution of the 3D model through the various stages of the algorithm is presented.
%B Multimedia and Expo, 2002. ICME '02. Proceedings. 2002 IEEE International Conference on
%V 1
%P 449 - 452 vol.1 - 449 - 452 vol.1
%8 2002///
%G eng
%R 10.1109/ICME.2002.1035815
%0 Conference Paper
%B Image Processing. 2002. Proceedings. 2002 International Conference on
%D 2002
%T Anti-collusion codes: multi-user and multimedia perspectives
%A Trappe,W.
%A M. Wu
%A Liu,K. J.R
%K and
%K anti-collusion
%K authentication;
%K binary
%K code
%K codes;
%K coding;
%K combinatorial
%K computing;
%K content;
%K data
%K designs;
%K digital
%K embedding;
%K encapsulation;
%K fingerprinting;
%K image
%K images;
%K logical
%K mathematics;
%K message
%K Modulation
%K modulation;
%K multimedia
%K operation;
%K performance;
%K watermarking;
%X Digital fingerprinting is an effective method to identify users who might try to redistribute multimedia content, such as images and video. These fingerprints are typically embedded into the content using watermarking techniques that are designed to be robust to a variety of attacks. A cheap and effective attack against such digital fingerprints is collusion, where several differently marked copies of the same content are averaged or combined to disrupt the underlying fingerprint. We present a construction of collusion-resistant fingerprints based upon anti-collusion codes (ACC) and binary code modulation. ACC have the property that the composition of any subset of K or fewer codevectors is unique. Using this property, we build fingerprints that allow for the identification of groups of K or less colluders. We present a construction of binary-valued ACC under the logical AND operation using the theory of combinatorial designs. Our code construction requires only Oscr;( radic;n) orthogonal signals to accommodate n users. We demonstrate the performance of our ACC for fingerprinting multimedia by identifying colluders through experiments using real images.
%B Image Processing. 2002. Proceedings. 2002 International Conference on
%V 2
%P II-149 - II-152 vol.2 - II-149 - II-152 vol.2
%8 2002///
%G eng
%R 10.1109/ICIP.2002.1039909
%0 Conference Paper
%B Image Processing. 2002. Proceedings. 2002 International Conference on
%D 2002
%T Bayesian structure from motion using inertial information
%A Qian,Gang
%A Chellapa, Rama
%A Qinfen Zheng
%K 3D
%K analysis;
%K Bayes
%K Bayesian
%K camera
%K estimation;
%K image
%K images;
%K importance
%K inertial
%K information;
%K methods;
%K MOTION
%K motion;
%K parameter
%K processing;
%K real
%K reconstruction;
%K sampling;
%K scene
%K sensors;
%K sequence;
%K sequences;
%K sequential
%K signal
%K structure-from-motion;
%K synthetic
%K systems;
%K video
%X A novel approach to Bayesian structure from motion (SfM) using inertial information and sequential importance sampling (SIS) is presented. The inertial information is obtained from camera-mounted inertial sensors and is used in the Bayesian SfM approach as prior knowledge of the camera motion in the sampling algorithm. Experimental results using both synthetic and real images show that, when inertial information is used, more accurate results can be obtained or the same estimation accuracy can be obtained at a lower cost.
%B Image Processing. 2002. Proceedings. 2002 International Conference on
%V 3
%P III-425 - III-428 vol.3 - III-425 - III-428 vol.3
%8 2002///
%G eng
%R 10.1109/ICIP.2002.1038996
%0 Conference Paper
%B Pattern Recognition, 2002. Proceedings. 16th International Conference on
%D 2002
%T Binarization of low quality text using a Markov random field model
%A Wolf,C.
%A David Doermann
%K analysis;
%K annealing;
%K Bayes
%K Bayesian
%K binarization;
%K computing;
%K distributions;
%K document
%K documents;
%K field;
%K Gibbs
%K image
%K low
%K Markov
%K method;
%K methods;
%K multimedia
%K optimization;
%K probability;
%K processes;
%K processing;
%K QUALITY
%K random
%K simulated
%K text
%X Binarization techniques have been developed in the document analysis community for over 30 years and many algorithms have been used successfully. On the other hand, document analysis tasks are more and more frequently being applied to multimedia documents such as video sequences. Due to low resolution and lossy compression, the binarization of text included in the frames is a non-trivial task. Existing techniques work without a model of the spatial relationships in the image, which makes them less powerful. We introduce a new technique based on a Markov random field model of the document. The model parameters (clique potentials) are learned from training data and the binary image is estimated in a Bayesian framework. The performance is evaluated using commercial OCR software.
%B Pattern Recognition, 2002. Proceedings. 16th International Conference on
%V 3
%P 160 - 163 vol.3 - 160 - 163 vol.3
%8 2002///
%G eng
%R 10.1109/ICPR.2002.1047819
%0 Conference Paper
%B Multimedia Signal Processing, 2002 IEEE Workshop on
%D 2002
%T Communication-friendly encryption of multimedia
%A M. Wu
%A Mao,Yinian
%K bit
%K bitstream
%K coding;
%K Communication
%K communication;
%K compression
%K compression;
%K COMPUTATION
%K controlled
%K cryptography;
%K data
%K encryption
%K encryption;
%K fine
%K friendliness;
%K granularity
%K image
%K index
%K intrabitplane
%K MAPPING
%K multimedia
%K operation;
%K overhead;
%K scalable
%K selective
%K sources
%K stream;
%K stuffing;
%K syntax-aware
%K syntax;
%K tool;
%X This paper discusses encryption operations that selectively encrypt content-carrying segments of multimedia data stream. We propose and analyze three techniques that work in different domains, namely, a syntax-aware selective bitstream encryption tool with bit stuffing, a generalized index mapping encryption tool with controlled overhead and an intra-bitplane encryption tool compatible with fine granularity scalable coding. The designs of these proposed encryption operations take into consideration the inherent structure and syntax of multimedia sources and have improved friendliness to communications, compression and computation.
%B Multimedia Signal Processing, 2002 IEEE Workshop on
%P 292 - 295
%8 2002/12//
%G eng
%R 10.1109/MMSP.2002.1203303
%0 Conference Paper
%B Pattern Recognition, 2002. Proceedings. 16th International Conference on
%D 2002
%T Content-based image retrieval using Fourier descriptors on a logo database
%A Folkers,A.
%A Samet, Hanan
%K abstraction;
%K analysis;
%K constraints;
%K content-based
%K contour
%K database
%K database;
%K databases;
%K descriptors;
%K detection;
%K edge
%K Fourier
%K image
%K logos;
%K pictorial
%K processing;
%K query
%K retrieval;
%K SHAPE
%K spatial
%K specification;
%K theory;
%K visual
%X A system that enables the pictorial specification of queries in an image database is described. The queries are comprised of rectangle, polygon, ellipse, and B-spline shapes. The queries specify which shapes should appear in the target image as well as spatial constraints on the distance between them and their relative position. The retrieval process makes use of an abstraction of the contour of the shape which is invariant against translation, scale, rotation, and starting point, that is based on the use of Fourier descriptors. These abstractions are used in a system to locate logos in an image database. The utility of this approach is illustrated using some sample queries.
%B Pattern Recognition, 2002. Proceedings. 16th International Conference on
%V 3
%P 521 - 524 vol.3 - 521 - 524 vol.3
%8 2002///
%G eng
%R 10.1109/ICPR.2002.1047991
%0 Conference Paper
%B IEEE 2002 Symposia on Human Centric Computing Languages and Environments, 2002. Proceedings
%D 2002
%T Dynamic layout management in a multimedia bulletin board
%A Kang,Hyunmo
%A Shneiderman, Ben
%A Wolff,G. J
%K Asynchronous communication
%K asynchronous communication system
%K Collaboration
%K data mining
%K dynamic layout management
%K dynamic layout templates
%K groupware
%K Human computer interaction
%K image
%K Information services
%K Laboratories
%K moving picture
%K Multimedia Bulletin Board system
%K multimedia computing
%K Multimedia databases
%K multimedia objects
%K multimedia systems
%K office document
%K Prototypes
%K sound
%K text
%K user interface
%K User interfaces
%K user-controlled layout strategy
%K voice
%K Web
%K Web pages
%X This paper proposes a novel user interface to manage the dynamic layout of multimedia objects in the Multimedia Bulletin Board (MBB) system. The MBB has been designed and implemented as a prototype of an asynchronous communication system that enables rich communication and collaboration among users of multimedia objects such as text, image, moving picture, sound, voice, Web, office document, and other files. The layout properties of the multimedia objects on a board (e.g. x-y position, size, z-order, explicit and implicit links, etc.) show important and useful information on the user dynamics occurring within a board. However, a fixed layout created and edited by multiple users may prevent users from recognizing and identifying other information. This paper resolves this problem with a novel user-controlled layout strategy made visible with dynamic layout templates (DLT). Users can reorganize the objects to extract meaningful information related to time, source, geographic location, or topic.
%B IEEE 2002 Symposia on Human Centric Computing Languages and Environments, 2002. Proceedings
%I IEEE
%P 51 - 53
%8 2002///
%@ 0-7695-1644-0
%G eng
%R 10.1109/HCC.2002.1046344
%0 Conference Paper
%B Applications of Computer Vision, 2002. (WACV 2002). Proceedings. Sixth IEEE Workshop on
%D 2002
%T An experimental evaluation of linear and kernel-based methods for face recognition
%A Gupta, H.
%A Agrawala, Ashok K.
%A Pruthi, T.
%A Shekhar, C.
%A Chellapa, Rama
%K analysis;
%K classification;
%K component
%K discriminant
%K Face
%K image
%K Kernel
%K linear
%K Machine;
%K nearest
%K neighbor;
%K principal
%K recognition;
%K Support
%K vector
%X In this paper we present the results of a comparative study of linear and kernel-based methods for face recognition. The methods used for dimensionality reduction are Principal Component Analysis (PCA), Kernel Principal Component Analysis (KPCA), Linear Discriminant Analysis (LDA) and Kernel Discriminant Analysis (KDA). The methods used for classification are Nearest Neighbor (NN) and Support Vector Machine (SVM). In addition, these classification methods are applied on raw images to gauge the performance of these dimensionality reduction techniques. All experiments have been performed on images from UMIST Face Database.
%B Applications of Computer Vision, 2002. (WACV 2002). Proceedings. Sixth IEEE Workshop on
%P 13 - 18
%8 2002///
%G eng
%R 10.1109/ACV.2002.1182137
%0 Journal Article
%J Image Processing, IEEE Transactions on
%D 2002
%T A generic approach to simultaneous tracking and verification in video
%A Li,Baoxin
%A Chellapa, Rama
%K approach;
%K Carlo
%K configuration;
%K correspondence
%K data;
%K density
%K density;
%K estimated
%K estimation;
%K evaluation;
%K extraction;
%K Face
%K facial
%K feature
%K generic
%K human
%K hypothesis
%K image
%K measurement
%K methods;
%K Monte
%K object
%K performance
%K posterior
%K probability
%K probability;
%K problem;
%K processing;
%K propagation;
%K recognition;
%K road
%K sequence
%K sequences;
%K sequential
%K signal
%K space;
%K stabilization;
%K state
%K synthetic
%K temporal
%K testing;
%K tracking;
%K vector;
%K vehicle
%K vehicles;
%K verification;
%K video
%K visual
%X A generic approach to simultaneous tracking and verification in video data is presented. The approach is based on posterior density estimation using sequential Monte Carlo methods. Visual tracking, which is in essence a temporal correspondence problem, is solved through probability density propagation, with the density being defined over a proper state space characterizing the object configuration. Verification is realized through hypothesis testing using the estimated posterior density. In its most basic form, verification can be performed as follows. Given a measurement vector Z and two hypotheses H_{1} and H0, we first estimate posterior probabilities P(H_{0}|Z) and P(H_{1}|Z), and then choose the one with the larger posterior probability as the true hypothesis. Several applications of the approach are illustrated by experiments devised to evaluate its performance. The idea is first tested on synthetic data, and then experiments with real video sequences are presented, illustrating vehicle tracking and verification, human (face) tracking and verification, facial feature tracking, and image sequence stabilization.
%B Image Processing, IEEE Transactions on
%V 11
%P 530 - 544
%8 2002/05//
%@ 1057-7149
%G eng
%N 5
%R 10.1109/TIP.2002.1006400
%0 Conference Paper
%B Motion and Video Computing, 2002. Proceedings. Workshop on
%D 2002
%T A hierarchical approach for obtaining structure from two-frame optical flow
%A Liu,Haiying
%A Chellapa, Rama
%A Rosenfeld, A.
%K algorithm;
%K aliasing;
%K analysis;
%K computer-rendered
%K depth
%K depth;
%K error
%K estimation;
%K extraction;
%K Face
%K feature
%K flow;
%K gesture
%K hierarchical
%K image
%K images;
%K inverse
%K iterative
%K methods;
%K MOTION
%K nonlinear
%K optical
%K parameter
%K processing;
%K real
%K recognition;
%K sequences;
%K signal
%K structure-from-motion;
%K system;
%K systems;
%K TIME
%K two-frame
%K variation;
%K video
%X A hierarchical iterative algorithm is proposed for extracting structure from two-frame optical flow. The algorithm exploits two facts: one is that in many applications, such as face and gesture recognition, the depth variation of the visible surface of an object in a scene is small compared to the distance between the optical center and the object; the other is that the time aliasing problem is alleviated at the coarse level for any two-frame optical flow estimate so that the estimate tends to be more accurate. A hierarchical representation for the relationship between the optical flow, depth, and the motion parameters is derived, and the resulting non-linear system is iteratively solved through two linear subsystems. At the coarsest level, the surface of the object tends to be flat, so that the inverse depth tends to be a constant, which is used as the initial depth map. Inverse depth and motion parameters are estimated by the two linear subsystems at each level and the results are propagated to finer levels. Error analysis and experiments using both computer-rendered images and real images demonstrate the correctness and effectiveness of our algorithm.
%B Motion and Video Computing, 2002. Proceedings. Workshop on
%P 214 - 219
%8 2002/12//
%G eng
%R 10.1109/MOTION.2002.1182239
%0 Conference Paper
%B Pattern Recognition, 2002. Proceedings. 16th International Conference on
%D 2002
%T Page classification through logical labelling
%A Liang,Jian
%A David Doermann
%A Ma,M.
%A Guo,J. K
%K article
%K attributed
%K base;
%K character
%K classification;
%K constraints;
%K document
%K document;
%K experimental
%K global
%K graph
%K graph;
%K hierarchical
%K image
%K images;
%K labelling;
%K logical
%K model
%K noise;
%K OCR;
%K optical
%K page
%K pages;
%K processing;
%K recognition;
%K relational
%K results;
%K technical
%K theory;
%K title
%K unknown
%X We propose an integrated approach to page classification and logical labelling. Layout is represented by a fully connected attributed relational graph that is matched to the graph of an unknown document, achieving classification and labelling simultaneously. By incorporating global constraints in an integrated fashion, ambiguity at the zone level can be reduced, providing robustness to noise and variation. Models are automatically trained from sample documents. Experimental results show promise for the classification and labelling of technical article title pages, and supports the idea of a hierarchical model base.
%B Pattern Recognition, 2002. Proceedings. 16th International Conference on
%V 3
%P 477 - 480 vol.3 - 477 - 480 vol.3
%8 2002///
%G eng
%R 10.1109/ICPR.2002.1047980
%0 Conference Paper
%B Pattern Recognition, 2002. Proceedings. 16th International Conference on
%D 2002
%T Performance evaluation of object detection algorithms
%A Mariano,V.Y.
%A Min,Junghye
%A Park,Jin-Hyeong
%A Kasturi,R.
%A Mihalcik,D.
%A Huiping Li
%A David Doermann
%A Drayer,T.
%K algorithms;
%K common
%K data
%K DETECTION
%K detection;
%K Evaluation
%K evaluation;
%K image
%K object
%K performance
%K recognition;
%K resource
%K set;
%K system;
%K text-detection
%K video
%X The continuous development of object detection algorithms is ushering in the need for evaluation tools to quantify algorithm performance. In this paper a set of seven metrics are proposed for quantifying different aspects of a detection algorithm's performance. The strengths and weaknesses of these metrics are described. They are implemented in the Video Performance Evaluation Resource (ViPER) system and will be used to evaluate algorithms for detecting text, faces, moving people and vehicles. Results for running two previous text-detection algorithms on a common data set are presented.
%B Pattern Recognition, 2002. Proceedings. 16th International Conference on
%V 3
%P 965 - 969 vol.3 - 965 - 969 vol.3
%8 2002///
%G eng
%R 10.1109/ICPR.2002.1048198
%0 Conference Paper
%B Image Processing. 2002. Proceedings. 2002 International Conference on
%D 2002
%T Probabilistic recognition of human faces from video
%A Chellapa, Rama
%A Kruger, V.
%A Zhou,Shaohua
%K Bayes
%K Bayesian
%K CMU;
%K distribution;
%K Face
%K faces;
%K gallery;
%K handling;
%K human
%K image
%K images;
%K importance
%K likelihood;
%K methods;
%K NIST/USF;
%K observation
%K posterior
%K probabilistic
%K probability;
%K processing;
%K propagation;
%K recognition;
%K sampling;
%K sequential
%K signal
%K still
%K Still-to-video
%K Uncertainty
%K video
%K Video-to-video
%X Most present face recognition approaches recognize faces based on still images. We present a novel approach to recognize faces in video. In that scenario, the face gallery may consist of still images or may be derived from a videos. For evidence integration we use classical Bayesian propagation over time and compute the posterior distribution using sequential importance sampling. The probabilistic approach allows us to handle uncertainties in a systematic manner. Experimental results using videos collected by NIST/USF and CMU illustrate the effectiveness of this approach in both still-to-video and video-to-video scenarios with appropriate model choices.
%B Image Processing. 2002. Proceedings. 2002 International Conference on
%V 1
%P I-41 - I-44 vol.1 - I-41 - I-44 vol.1
%8 2002///
%G eng
%R 10.1109/ICIP.2002.1037954
%0 Conference Paper
%B Pattern Recognition, 2002. Proceedings. 16th International Conference on
%D 2002
%T Quasi-invariants for human action representation and recognition
%A Parameswaran, V.
%A Chellapa, Rama
%K 2D
%K action
%K analysis;
%K body
%K canonical
%K change
%K human
%K image
%K invariance;
%K MOTION
%K poses;
%K quasi-invariants;
%K recognition;
%K representation;
%K tolerance;
%K viewpoint
%X Although human action recognition has been the subject of much research in the past, the issue of viewpoint invariance has received scarce attention. In this paper, we present an approach to detect human action with a high tolerance to viewpoint change. Canonical body poses are modeled in a view invariant manner to enable detection from a general viewpoint. While there exist no invariants for 3D to 2D projection, there exists a wealth of techniques in 2D invariance that can be used to advantage in 3D to 2D projection. We employ 2D invariants to recognize canonical poses of the human body leading to an effective way to represent and recognize human action which we evaluate theoretically and experimentally on 2D projections of publicly available human motion capture data.
%B Pattern Recognition, 2002. Proceedings. 16th International Conference on
%V 1
%P 307 - 310 vol.1 - 307 - 310 vol.1
%8 2002///
%G eng
%R 10.1109/ICPR.2002.1044699
%0 Conference Paper
%B Image Processing. 2002. Proceedings. 2002 International Conference on
%D 2002
%T A recognition algorithm for Chinese characters in diverse fonts
%A Wu,Xianli
%A M. Wu
%K algorithm;
%K central
%K character
%K Chinese
%K contributions;
%K direction
%K discriminator;
%K diverse
%K fonts;
%K image
%K interfaces;
%K LANGUAGE
%K MATCHING
%K matching;
%K multi-dictionary
%K natural
%K OCR
%K optical
%K peripheral
%K recognition
%K recognition;
%K sets;
%K software
%K system;
%X The paper proposes an algorithm for recognizing Chinese characters in many diverse fonts including Song, Fang, Kai, Hei, Yuan, Lishu, Weibei and Xingkai. The algorithm is based on features derived from peripheral direction contributions and utilizes a set of dictionaries. A 3-level matching is first performed with respect to each dictionary. The distance measures associated with these matches are then fed into a central discriminator to output the final recognition result. We propose a new multi-dictionary matching algorithm for use in the central discriminator that utilizes estimated information of neighborhood fonts. Experiments have been performed on a practical OCR software system whose recognition kernel is based on the proposed algorithm. Fast and accurate recognition has been accomplished both in title recognition, involving all of the 8 fonts, and in main-body recognition, that usually involves only the first 4 most commonly used fonts.
%B Image Processing. 2002. Proceedings. 2002 International Conference on
%V 3
%P 981 - 984 vol.3 - 981 - 984 vol.3
%8 2002/06//
%G eng
%R 10.1109/ICIP.2002.1039139
%0 Conference Paper
%B Pattern Recognition, 2002. Proceedings. 16th International Conference on
%D 2002
%T A robust algorithm for probabilistic human recognition from video
%A Zhou,Shaohua
%A Chellapa, Rama
%K algorithm;
%K algorithms;
%K Carlo
%K continuity;
%K human
%K image
%K methods;
%K model;
%K Monte
%K parameterized
%K probabilistic
%K recognition;
%K robust
%K sequential
%K series
%K series;
%K space
%K state
%K state-space
%K temporal
%K TIME
%X Human recognition from video requires solving the two tasks, recognition and tracking, simultaneously. This leads to a parameterized time series state space model, representing both motion and identity of the human. Sequential Monte Carlo (SMC) algorithms, like Condensation, can be developed to offer numerical solutions to this model. However in outdoor environments, the solution is more likely to diverge from the foreground, causing failures in both recognition and tracking. In this paper we propose an approach for tackling this problem by incorporating the constraint of temporal continuity in the observations. Experimental results demonstrate improvements over its Condensation counterpart.
%B Pattern Recognition, 2002. Proceedings. 16th International Conference on
%V 1
%P 226 - 229 vol.1 - 226 - 229 vol.1
%8 2002///
%G eng
%R 10.1109/ICPR.2002.1044661
%0 Conference Paper
%B Multimedia Signal Processing, 2002 IEEE Workshop on
%D 2002
%T Wide baseline image registration using prior information
%A Chowdhury, AM
%A Chellapa, Rama
%A Keaton, T.
%K 2D
%K 3D
%K algorithm;
%K alignment;
%K angles;
%K baseline
%K Computer
%K configuration;
%K constellation;
%K correspondence
%K creation;
%K doubly
%K error
%K extraction;
%K Face
%K feature
%K global
%K holistic
%K image
%K images;
%K matching;
%K matrix;
%K model
%K models;
%K normalization
%K panoramic
%K probability;
%K procedure;
%K processes;
%K processing;
%K registration;
%K robust
%K sequences;
%K SHAPE
%K signal
%K Sinkhorn
%K spatial
%K statistics;
%K stereo;
%K Stochastic
%K video
%K view
%K viewing
%K vision;
%K wide
%X Establishing correspondence between features in two images of the same scene taken from different viewing angles in a challenging problem in image processing and computer vision. However, its solution is an important step in many applications like wide baseline stereo, 3D model alignment, creation of panoramic views etc. In this paper, we propose a technique for registration of two images of a face obtained from different viewing angles. We show that prior information about the general characteristics of a face obtained from video sequences of different faces can be used to design a robust correspondence algorithm. The method works by matching 2D shapes of the different features of the face. A doubly stochastic matrix, representing the probability of match between the features, is derived using the Sinkhorn normalization procedure. The final correspondence is obtained by minimizing the probability of error of a match between the entire constellations of features in the two sets, thus taking into account the global spatial configuration of the features. The method is applied for creating holistic 3D models of a face from partial representations. Although this paper focuses primarily on faces, the algorithm can also be used for other objects with small modifications.
%B Multimedia Signal Processing, 2002 IEEE Workshop on
%P 37 - 40
%8 2002/12//
%G eng
%R 10.1109/MMSP.2002.1203242
%0 Journal Article
%J Pattern Analysis and Machine Intelligence, IEEE Transactions on
%D 2000
%T Classification with nonmetric distances: image retrieval and class representation
%A Jacobs, David W.
%A Weinshall,D.
%A Gdalyahu,Y.
%K appearance-based
%K by
%K classification;image
%K correlation;correlation
%K dataspaces;nonmetric
%K distances;nonmetric
%K example;
%K functions;nonmetric
%K image
%K inequality;vector
%K judgments;robust
%K MATCHING
%K methods;image
%K methods;nonmetric
%K methods;triangle
%K points;boundary
%K points;class
%K representation;exemplar-based
%K representation;image
%K retrieval;learning
%K retrieval;nearest-neighbor
%K similarity
%K vision;atypical
%X A key problem in appearance-based vision is understanding how to use a set of labeled images to classify new images. Systems that model human performance, or that use robust image matching methods, often use nonmetric similarity judgments; but when the triangle inequality is not obeyed, most pattern recognition techniques are not applicable. Exemplar-based (nearest-neighbor) methods can be applied to a wide class of nonmetric similarity functions. The key issue, however, is to find methods for choosing good representatives of a class that accurately characterize it. We show that existing condensing techniques are ill-suited to deal with nonmetric dataspaces. We develop techniques for solving this problem, emphasizing two points: First, we show that the distance between images is not a good measure of how well one image can represent another in nonmetric spaces. Instead, we use the vector correlation between the distances from each image to other previously seen images. Second, we show that in nonmetric spaces, boundary points are less significant for capturing the structure of a class than in Euclidean spaces. We suggest that atypical points may be more important in describing classes. We demonstrate the importance of these ideas to learning that generalizes from experience by improving performance. We also suggest ways of applying parametric techniques to supervised learning problems that involve a specific nonmetric distance functions, showing how to generalize the idea of linear discriminant functions in a way that may be more useful in nonmetric spaces
%B Pattern Analysis and Machine Intelligence, IEEE Transactions on
%V 22
%P 583 - 600
%8 2000/06//
%@ 0162-8828
%G eng
%N 6
%R 10.1109/34.862197
%0 Journal Article
%J Image Processing, IEEE Transactions on
%D 2000
%T Residual coding in document image compression
%A Kia,O. E
%A David Doermann
%K coding
%K coding;compressed-domain
%K coding;entropy;lossless
%K coding;image
%K communication;
%K compact
%K compression;data
%K compression;document
%K compression;multiple
%K construction;rate-distortion;representative
%K detection;symbolic
%K difference;residual
%K distortion
%K document
%K edge;prototype
%K image
%K library
%K model;efficient
%K models;residual
%K pattern
%K patterns
%K pixels;similar
%K processing;compression
%K processing;entropy;image
%K prototype;residual
%K ratio;continuous
%K reconstruction;distance
%K reconstruction;progressive
%K reconstruction;rate
%K referencing;packet
%K theory;visual
%K transmission;prototype
%X Symbolic document image compression relies on the detection of similar patterns in a document image and construction of a prototype library. Compression is achieved by referencing multiple pattern instances ( ldquo;components rdquo;) through a single representative prototype. To provide a lossless compression, however, the residual difference between each component and its assigned prototype must be coded. Since the size of the residual can significantly affect the compression ratio, efficient coding is essential. We describe a set of residual coding models for use with symbolic document image compression that exhibit desirable characteristics for compression and rate-distortion and facilitate compressed-domain processing. The first model orders the residual pixels by their distance to the prototype edge. Grouping pixels based on this distance value allows for a more compact coding and lower entropy. This distance model is then extended to a model that defines the structure of the residue and uses it as a basis for continuous and packet reconstruction which provides desired functionality for use in lossy compression and progressive transmission
%B Image Processing, IEEE Transactions on
%V 9
%P 961 - 969
%8 2000/06//
%@ 1057-7149
%G eng
%N 6
%R 10.1109/83.846239
%0 Conference Paper
%B Image Processing, 1999. ICIP 99. Proceedings. 1999 International Conference on
%D 1999
%T Digital watermarking using shuffling
%A M. Wu
%A Liu,B.
%K coding;
%K data
%K embeddin;data
%K encapsulation;document
%K hiding;digital
%K image
%K processing;image
%K shuffling;shuffling;synchronization;data
%K source;random
%K watermarking;multimedia
%X This paper applies shuffling to digital watermarking and data hiding. The data embedding capacity in the multimedia source generally varies significantly from one part of the source to another. Sequential embedding is very sensitive to noise which may cause synchronization problem; the common but conservative solution via partitioning an image into large segments and embedding only one bit per segment is wasteful of the data embedding capacity. This paper shows how random shuffling can be used to equalize the uneven distribution of embedding capacity. The effectiveness of random shuffling is demonstrated by analysis and experiments
%B Image Processing, 1999. ICIP 99. Proceedings. 1999 International Conference on
%V 1
%P 291 -295 vol.1 - 291 -295 vol.1
%8 1999///
%G eng
%R 10.1109/ICIP.1999.821616
%0 Conference Paper
%B Simulation of Semiconductor Processes and Devices, 1999. SISPAD '99. 1999 International Conference on
%D 1999
%T Gate leakage current simulation by Boltzmann transport equation and its dependence on the gate oxide thickness
%A Han,Zhiyi
%A Lin,Chung-Kai
%A Goldsman,N.
%A Mayergoyz, Issak D
%A Yu,S.
%A Stettler,M.
%K 30
%K angstrom;60
%K angstrom;Boltzmann
%K Bias
%K calculations;leakage
%K charges;spherical
%K component;tunneling
%K current
%K currents;semiconductor
%K dependence;method
%K dependence;MOSFET;barrier
%K device
%K effect;distribution
%K equation;DC
%K equation;MOSFET;WKB
%K function;first
%K harmonic
%K image
%K leakage
%K lowering
%K method
%K method;gate
%K model;thermionic
%K models;tunnelling;
%K of
%K oxide
%K principle
%K probability;Boltzmann
%K simulation;gate
%K thickness
%K transport
%K WKB
%X As device dimensions shrink toward 0.1 mu;m, gate oxides are becoming so thin that MOSFET gate leakage current and oxide degradation are becoming limiting issues. We provide a more rigorous way to calculate gate leakage current. To achieve this we build upon the Spherical Harmonic Method of modeling, which deterministically solves the Boltzmann equation for an entire device. The method gives the distribution function and is 1000 times faster than MC. Once the distribution function is calculated, the tunneling probability is derived from the first principle WKB method. The barrier lowering effect is accounted for by the method of image charges. We calculate gate leakage current as a function of DC bias. The thermionic and tunneling components are compared at different DC bias points. The dependence of gate leakage current on gate oxide thickness is simulated
%B Simulation of Semiconductor Processes and Devices, 1999. SISPAD '99. 1999 International Conference on
%P 247 - 250
%8 1999///
%G eng
%R 10.1109/SISPAD.1999.799307
%0 Conference Paper
%B Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
%D 1999
%T On musical score recognition using probabilistic reasoning
%A Stuckelberg,M.V.
%A David Doermann
%K analysis;document
%K attribute
%K class;document
%K descriptive
%K document
%K engine;local
%K estimation;musical
%K grammar;attribute
%K grammars;character
%K handling;
%K image
%K interpretation;stochastic
%K Markov
%K mechanisms;music;uncertainty
%K model;global
%K modeling
%K models;inference
%K parameter
%K processing;image
%K propagation;explicit
%K reasoning;scanned
%K recognition;document
%K recognition;end-to-end
%K recognition;inference
%K recognition;probabilistic
%K score
%K structure;hidden
%K Uncertainty
%X We present a probabilistic framework for document analysis and recognition and illustrate it on the problem of musical score recognition. Our system uses an explicit descriptive model of the document class to find the most likely interpretation of a scanned document image. In contrast to the traditional pipeline architecture, we carry out all stages of the analysis with a single inference engine, allowing for an end-to-end propagation of the uncertainty. The global modeling structure is similar to a stochastic attribute grammar, and local parameters are estimated using hidden Markov models
%B Document Analysis and Recognition, 1999. ICDAR '99. Proceedings of the Fifth International Conference on
%P 115 - 118
%8 1999/09//
%G eng
%R 10.1109/ICDAR.1999.791738
%0 Conference Paper
%B Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on
%D 1999
%T A rotation, scale and translation resilient public watermark
%A Wu,M.
%A Miller,M.L.
%A Bloom,J.A.
%A Cox,I.J.
%K algorithms;Fourier
%K coding;
%K coding;image
%K data;transform
%K dimensions;projection
%K Fourier-Mellin
%K image
%K invariant
%K methods;detector;experimental
%K of
%K pattern;rotation
%K projection;mapping;original
%K public
%K registration;security
%K resilient
%K results;geometric
%K transform;detection
%K transform;RST
%K transformations;image
%K transforms;image
%K transforms;Radon
%K watermark;RST
%K watermark;scale
%K watermark;translation
%K watermark;watermarking
%K waveform;registration
%X Summary form only given. Watermarking algorithms that are robust to the common geometric transformations of rotation, scale and translation (RST) have been reported for cases in which the original unwatermarked content is available at the detector so as to allow the transformations to be inverted. However, for public watermarks the problem is significantly more difficult since there is no original content to register with. Two classes of solution have been proposed. The first embeds a registration pattern into the content while the second seeks to apply detection methods that are invariant to these geometric transformations. This paper describes a public watermarking method which is invariant (or bares a simple relation) to the common geometric transforms of rotation, scale, and translation. It is based on the Fourier-Mellin transform which has previously been suggested. We extend this work, using a variation based on the Radon transform. The watermark is inserted into a projection of the image. The properties of this projection are such that RST transforms produce simple or no effects on the projection waveform. When a watermark is inserted into a projection, the signal must eventually be back projected to the original image dimensions. This is a one to many mapping that allows for considerable flexibility in the watermark insertion process. We highlight some theoretical and practical issues that affect the implementation of an RST invariant watermark. Finally, we describe preliminary experimental results
%B Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on
%V 4
%P 2065 vol.4 - 2065 vol.4
%8 1999/03//
%G eng
%R 10.1109/ICASSP.1999.758337
%0 Conference Paper
%B Image Processing, 1998. ICIP 98. Proceedings. 1998 International Conference on
%D 1998
%T Watermarking for image authentication
%A M. Wu
%A Liu,Bede
%K alterations
%K analysis;image
%K analysis;message
%K authentication;image
%K authentication;table
%K camera;frequency
%K coding;image
%K colour
%K compression;frequency-domain
%K detection;alterations
%K domain;image
%K EMBEDDING
%K image
%K image;compressed
%K image;ownership
%K localisation;color
%K look-up;watermarking;data
%K lookup;
%K method;digital
%K protection;table
%K storage;data
%K tampering;marked
%X A data embedding method is proposed for image authentication based on table look-up in frequency domain. A visually meaningful watermark and a set of simple features are embedded invisibly in the marked image, which can be stored in the compressed form. The scheme can detect and localize alterations of the original image, such as the tampering of images exported from a digital camera
%B Image Processing, 1998. ICIP 98. Proceedings. 1998 International Conference on
%V 2
%P 437 -441 vol.2 - 437 -441 vol.2
%8 1998/10//
%G eng
%R 10.1109/ICIP.1998.723413
%0 Conference Paper
%B Image Processing, 1997. Proceedings., International Conference on
%D 1997
%T OCR-based rate-distortion analysis of residual coding
%A Kia,O. E
%A David Doermann
%K analysis;redundancy;representative
%K character
%K coding;distortion
%K coding;image
%K coding;lossy
%K coding;row-order
%K coding;symbolic
%K compression;data
%K compression;document
%K compression;lossy
%K database
%K distortion
%K Evaluation
%K image
%K images;document
%K images;experiments;ground
%K measure;document
%K OCR
%K of
%K performance;University
%K processing;distance-order
%K processing;image
%K prototypes;residual
%K recognition;rate
%K representation;optical
%K representation;progressive
%K software;OCR
%K system
%K theory;
%K transmission;rate-distortion
%K truth;image
%K Washington;compressed-domain
%X Symbolic compression of document images provides access to symbols found in document images and exploits the redundancy found within them. Document images are highly structured and contain large numbers of repetitive symbols. We have shown that while symbolically compressing a document image we are able to perform compressed-domain processing. Symbolic compression forms representative prototypes for symbols and encode the image by the location of these prototypes and a residual (the difference between symbol and prototype). We analyze the rate-distortion tradeoff by varying the amount of residual used in compression for both distance- and row-order coding. A measure of distortion is based on the performance of an OCR system on the resulting image. The University of Washington document database images, ground truth, and OCR evaluation software are used for experiments
%B Image Processing, 1997. Proceedings., International Conference on
%V 3
%P 690 -693 vol.3 - 690 -693 vol.3
%8 1997/10//
%G eng
%R 10.1109/ICIP.1997.632215
%0 Conference Paper
%B Multimedia Signal Processing, 1997., IEEE First Workshop on
%D 1997
%T The role of compressed document images in transmission and retrieval
%A Kia,O.
%A David Doermann
%K applications;progressive
%K coding;multimedia
%K component
%K compression;document
%K compression;image
%K computing;query
%K content;lossy
%K databases;
%K distortion
%K experiment;structural
%K hierarchy;symbols;visual
%K image
%K level
%K objects;data
%K processing;image
%K processing;visual
%K retrieval;image
%K structure;document
%K transmission;information
%K transmission;multimedia;network
%K transmission;rate
%X Document images belong to a unique class of images where the information content is contained in the language represented by a series of symbols on the page, rather than in the visual objects themselves. For this reason, it is essential to preserve the fidelity of individual components when considering methods of compression. Likewise the component level structure should be a prime consideration when ordering information for lossy or progressive transmission. We refine our work on document image compression as it applies to transmission and retrieval. We first overview the basic compression scheme, then describe a structural hierarchy which provides desirable properties for transmission. We present the results of a rate distortion experiment and discuss the implications for network applications
%B Multimedia Signal Processing, 1997., IEEE First Workshop on
%P 331 - 336
%8 1997/06//
%G eng
%R 10.1109/MMSP.1997.602657
%0 Journal Article
%J Pattern Analysis and Machine Intelligence, IEEE Transactions on
%D 1996
%T The space requirements of indexing under perspective projections
%A Jacobs, David W.
%K 2D
%K complexity;feature
%K complexity;table
%K extraction;image
%K hashing;indexing
%K image
%K images;3D
%K lookup;
%K lookup;computational
%K matching;geometric
%K matching;indexing;object
%K model
%K points;feature
%K process;invariants;object
%K processing;table
%K projections;space
%K recognition;perspective
%K recognition;stereo
%X Object recognition systems can be made more efficient through the use of table lookup to match features. The cost of this indexing process depends on the space required to represent groups of model features in such a lookup table. We determine the space required to perform indexing of arbitrary sets of 3D model points for lookup from a single 2D image formed under perspective projection. We show that in this case, one must use a 3D surface to represent model groups, and we provide an analytic description of such a surface. This is in contrast to the cases of scaled-orthographic or affine projection, in which only a 2D surface is required to represent a group of model features. This demonstrates a fundamental way in which the recognition of objects under perspective projection is more complex than is recognition under other projection models
%B Pattern Analysis and Machine Intelligence, IEEE Transactions on
%V 18
%P 330 - 333
%8 1996/03//
%@ 0162-8828
%G eng
%N 3
%R 10.1109/34.485561
%0 Conference Paper
%B Pattern Recognition, 1996., Proceedings of the 13th International Conference on
%D 1996
%T Structural compression for document analysis
%A Kia,O. E
%A David Doermann
%K analysis;document
%K bitmap;error
%K coding;image
%K compression
%K compression;document
%K compression;symbol
%K decomposition;data
%K image
%K manipulation;
%K processing;image
%K ratios;document
%K recognition;probability;symbol
%K representations;structural
%K representations;symbolic
%K retrieval;document
%K storage;error
%K text
%X In this paper we describe a structural compression technique to be used for document text image storage and retrieval. The primary objective is to provide an efficient representation, storage, transmission and display. A secondary objective is to provide an encoding which allows access to specified regions within the image and facilitates traditional document processing operations without requiring complete decoding. We describe an algorithm which symbolically decomposes a document image and structurally orders the error bitmap based on a probabilistic model. The resultant symbol and error representations lend themselves to reasonably high compression ratios and are structured so as to allow operations directly on the compressed image. The compression scheme is implemented and compared to traditional compression methods
%B Pattern Recognition, 1996., Proceedings of the 13th International Conference on
%V 3
%P 664 -668 vol.3 - 664 -668 vol.3
%8 1996/08//
%G eng
%R 10.1109/ICPR.1996.547029
%0 Conference Paper
%B Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
%D 1995
%T Robust table-form structure analysis based on box-driven reasoning
%A Hori,O.
%A David Doermann
%K analysis;table
%K analysis;touching
%K BDR;box
%K characters;character
%K document
%K domain;robust
%K driven
%K form
%K image
%K images;document
%K lines;document
%K mechanisms;
%K PROCESSING
%K processing;inference
%K reasoning;broken
%K recognition;data
%K structure
%K structures;document
%K table
%X Table form document structure analysis is an important problem in the document processing domain. The paper presents a method called Box Driven Reasoning (BDR) to robustly analyze the structure of table form documents which include touching characters and broken lines. Most previous methods employ a line oriented approach. Real documents are copied repeatedly and overlaid with printed data, resulting in characters which touch cells and lines which are broken. BDR deals with regions directly, in contrast with other previous methods. Experimental tests show that BDR reliably recognizes cells and strings in document images with touching characters and broken lines
%B Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on
%V 1
%P 218 -221 vol.1 - 218 -221 vol.1
%8 1995/08//
%G eng
%R 10.1109/ICDAR.1995.598980
%0 Conference Paper
%B Computer Vision, 1995. Proceedings., Fifth International Conference on
%D 1995
%T Stochastic completion fields: a neural model of illusory contour shape and salience
%A Williams,L. R
%A Jacobs, David W.
%K boundary
%K completion
%K computational
%K Computer
%K contour
%K contours;
%K convolutions;
%K cortex;
%K curves
%K detection;
%K distribution;
%K edge
%K energy;
%K estimation;
%K fields;
%K fragments;
%K geometry;
%K illusory
%K image
%K lattice;
%K least
%K likelihood
%K mammalian
%K maximum
%K model;
%K nets;
%K neural
%K of
%K paths;
%K plane;
%K probability
%K probability;
%K random
%K recognition;
%K shape;
%K stimuli;
%K Stochastic
%K vector-field
%K visual
%K walk;
%X We describe an algorithm and representation level theory of illusory contour shape and salience. Unlike previous theories, our model is derived from a single assumption-namely, that the prior probability distribution of boundary completion shape can be modeled by a random walk in a lattice whose points are positions and orientations in the image plane (i.e. the space which one can reasonably assume is represented by neurons of the mammalian visual cortex). Our model does not employ numerical relaxation or other explicit minimization, but instead relies on the fact that the probability that a particle following a random walk will pass through a given position and orientation on a path joining two boundary fragments can be computed directly as the product of two vector-field convolutions. We show that for the random walk we define, the maximum likelihood paths are curves of least energy, that is, on average, random walks follow paths commonly assumed to model the shape of illusory contours. A computer model is demonstrated on numerous illusory contour stimuli from the literature
%B Computer Vision, 1995. Proceedings., Fifth International Conference on
%P 408 - 415
%8 1995/06//
%G eng
%R 10.1109/ICCV.1995.466910
%0 Conference Paper
%B Computer Vision and Pattern Recognition, 1994. Proceedings CVPR '94., 1994 IEEE Computer Society Conference on
%D 1994
%T Error propagation in full 3D-from-2D object recognition
%A Alter,T. D.
%A Jacobs, David W.
%K 3D-from-2D
%K algorithm;
%K error
%K extraction;
%K feature
%K features;
%K handling;
%K image
%K initial
%K linear
%K matches;
%K object
%K Programming
%K programming;
%K propagation;
%K recognition
%K recognition;
%K robust
%K sequences;
%K systems;
%K Uncertainty
%K uncertainty;
%X Robust recognition systems require a careful understanding of the effects of error in sensed features. Error in these image features results in uncertainty in the possible image location of each additional model feature. We present an accurate, analytic approximation for this uncertainty when model poses are based on matching three image and model points. This result applies to objects that are fully three-dimensional, where past results considered only two-dimensional objects. Further, we introduce a linear programming algorithm to compute this uncertainty when poses are based on any number of initial matches
%B Computer Vision and Pattern Recognition, 1994. Proceedings CVPR '94., 1994 IEEE Computer Society Conference on
%P 892 - 898
%8 1994/06//
%G eng
%R 10.1109/CVPR.1994.323920
%0 Conference Paper
%B Pattern Recognition, 1994. Vol. 1 - Conference A: Computer Vision Image Processing., Proceedings of the 12th IAPR International Conference on
%D 1994
%T Finding structurally consistent motion correspondences
%A Jacobs, David W.
%A Chennubhotla,C.
%K 3D
%K boundaries;
%K common
%K consistent
%K correspondences;
%K estimation;
%K features;
%K image
%K independent
%K linear
%K MOTION
%K motion;
%K occlusion
%K programming;
%K scene
%K segmentation;
%K specularities;
%K structurally
%K structure;
%K tracked
%X Much work on deriving scene structure and motion from features assumes as input a set of tracked image features that share a common 3D motion. Producing this input requires segmenting independent motions, and detecting image features that do not correspond to 3D features, originating instead, for example in occlusion boundaries or specularities. We derive a linear program that tells when a set of tracked points might have come from 3D points that share a single motion, assuming affine motion and bounded error. We can also use linear programming to place conservative bounds on the structure of the scene that corresponds to tracked points. We implement and test this algorithm on real images
%B Pattern Recognition, 1994. Vol. 1 - Conference A: Computer Vision Image Processing., Proceedings of the 12th IAPR International Conference on
%V 1
%P 650 -653 vol.1 - 650 -653 vol.1
%8 1994/10//
%G eng
%R 10.1109/ICPR.1994.576388
%0 Conference Paper
%B Motion of Non-Rigid and Articulated Objects, 1994., Proceedings of the 1994 IEEE Workshop on
%D 1994
%T Segmenting independently moving, noisy points
%A Jacobs, David W.
%A Chennubhotla,C.
%K 3D
%K common
%K consistent
%K estimation;
%K features;
%K image
%K independently
%K linear
%K MOTION
%K motion;
%K moving
%K noisy
%K point
%K points;
%K programming;
%K real
%K segmentation;
%K sequence;
%K sequences;
%K video
%X There has been much work on using point features tracked through a video sequence to determine structure and motion. In many situations, to use this work, we must first isolate subsets of points that share a common motion. This is hard because we must distinguish between independent motions and apparent deviations from a single motion due to noise. We propose several methods of searching for point-sets with consistent 3D motions. We analyze the potential sensitivity of each method for detecting independent motions, and experiment with each method on a real image sequence
%B Motion of Non-Rigid and Articulated Objects, 1994., Proceedings of the 1994 IEEE Workshop on
%P 96 - 103
%8 1994/11//
%G eng
%R 10.1109/MNRAO.1994.346249
%0 Conference Paper
%B Computer Vision and Pattern Recognition, 1993. Proceedings CVPR '93., 1993 IEEE Computer Society Conference on
%D 1993
%T 2D images of 3-D oriented points
%A Jacobs, David W.
%K 2D
%K 3-D
%K database
%K derivation;
%K image
%K images;
%K indexing;
%K linear
%K model
%K nonrigid
%K oriented
%K points;
%K processing;
%K recovery;
%K structure-form-motion
%K structure-from-motion
%K transformation;
%X A number of vision problems have been shown to become simpler when one models projection from 3-D to 2-D as a nonrigid linear transformation. These results have been largely restricted to models and scenes that consist only of 3-D points. It is shown that, with this projection model, several vision tasks become fundamentally more complex in the somewhat more complicated domain of oriented points. More space is required for indexing models in a database, more images are required to derive structure from motion, and new views of an object cannot be synthesized linearly from old views
%B Computer Vision and Pattern Recognition, 1993. Proceedings CVPR '93., 1993 IEEE Computer Society Conference on
%P 226 - 232
%8 1993/06//
%G eng
%R 10.1109/CVPR.1993.340985
%0 Conference Paper
%B Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
%D 1993
%T Image based typographic analysis of documents
%A David Doermann
%A Furuta,R.
%K 2D
%K analysis;
%K attributes;
%K based
%K character
%K commands;
%K component
%K data
%K description
%K document
%K DVI
%K extraction;
%K feature
%K figure
%K file;
%K formatting
%K hierarchical
%K image
%K language;
%K languages;
%K layout;
%K line
%K margins;
%K page
%K placement;
%K processing;
%K read-order;
%K relationships;
%K representation;
%K spacing;
%K spatial
%K structures;
%K syntax;
%K synthesis;
%K typographic
%K understanding;
%X An approach to image based typographic analysis of documents is provided. The problem requires a spatial understanding of the document layout as well as knowledge of the proper syntax. The system performs a page synthesis from the stream of formatting commands defined in a DVI file. Since the two-dimensional relationships between document components are not explicit in the page language, the authors develop a representation which preserves the two-dimensional layout, the read-order and the attributes of document components. From this hierarchical representation of the page layout we extract and analyze relevant typographic features such as margins, line and character spacing, and figure placement
%B Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
%P 769 - 773
%8 1993/10//
%G eng
%R 10.1109/ICDAR.1993.395624
%0 Journal Article
%J Signal Processing, IEEE Transactions on
%D 1993
%T VLSI implementation of a tree searched vector quantizer
%A Kolagotla,R. K.
%A Yu,S.-S.
%A JaJa, Joseph F.
%K (mathematics);
%K 2
%K 20
%K chips;
%K coding;
%K compression;
%K data
%K design;
%K digital
%K image
%K implementation;
%K MHz;
%K micron;
%K PROCESSING
%K quantisation;
%K quantizer;
%K searched
%K signal
%K tree
%K TREES
%K vector
%K VLSI
%K VLSI;
%X The VLSI design and implementation of a tree-searched vector quantizer is presented. The number of processors needed is equal to the depth of the tree. All processors are identical, and data flow between processors is regular. No global control signals are needed. The processors have been fabricated using 2 mu;m N-well process on a 7.9 times;9.2 mm die. Each processor chip contains 25000 transistors and has 84 pins. The processors have been thoroughly tested at a clock frequency of 20 MHz
%B Signal Processing, IEEE Transactions on
%V 41
%P 901 - 905
%8 1993/02//
%@ 1053-587X
%G eng
%N 2
%R 10.1109/78.193225
%0 Journal Article
%J Communications, IEEE Transactions on
%D 1983
%T Digital Image Compression by Outer Product Expansion
%A O'Leary, Dianne P.
%A Peleg,S.
%K approximation;
%K coding;
%K image
%K Least-squares
%K Transform
%X We approximate a digital image as a sum of outer products dxy^{T}wheredis a real number but the vectorsxandyhave elements +1, -1, or 0 only. The expansion gives a least squares approximation. Work is proportional to the number of pixels; reconstruction involves only additions.
%B Communications, IEEE Transactions on
%V 31
%P 441 - 444
%8 1983/03//
%@ 0090-6778
%G eng
%N 3
%R 10.1109/TCOM.1983.1095823