Hal Daumé III

Associate Professor
3227 A.V. Williams Building
(301) 405-1073
University of Southern California (Computer Science)

Hal Daumé III is an associate professor of computer science with appointments in the Department of Linguistics and UMIACS.

His research focuses on understanding computational properties of learning and language.

He has written more than 50 research publications on problems in natural language processing and machine learning, one of which was awarded Best Paper in 2009. He currently serves as an associate editor for three journals: Machine Learning Journal, the ACM Transactions on Speech and Language Processing, and Computational Linguistics. From 2006 to 2008, he served on the executive committee of the North American Association for Computational Linguistics, during which time he was a driving force in a successful push to make the Computational Linguistics journal open access. He regularly serves on the senior program committee of conferences, such as the Association for Computational Linguistics, the International Conference on Machine Learning, Neural Information Processing Systems, and Empirical Methods in Natural Language Processing.

He has received a Dean's Accommodation in teaching twice, and authored two free online books: "Yet Another Haskell Tutorial," used in undergraduate courses at several universities, including Yale University, and translated independently into Chinese and Portuguese, and "A Course in Machine Learning."

He received his doctorate from the University of Southern California in computer science in 2006. He was an assistant professor at the University of Utah from 2006 to 2010 when he moved to the University of Maryland. He served as co-director of the Computational Linguistics and Information Processing lab from 2011 to 2012.



Teo CL, Yang Y, Daumé H, Fermüller C, Aloimonos Y.  2011.  A Corpus-Guided Framework for Robotic Visual Perception. Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence.

Pujara J, Daumé H, Getoor L.  2011.  Using classifier cascades for scalable e-mail classification. Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, ACM International Conference Proceedings Series.


Rai P, Daumé H.  2009.  Multi-label prediction via sparse infinite CCA. Advances in Neural Information Processing Systems. 22:1518-1526.

Daumé H.  2009.  Bayesian multitask learning with latent hierarchies. Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence.

Daumé H.  2009.  Semi-supervised or semi-unsupervised? Proceedings of the NAACL HLT Workshop on Semisupervised Learning for Natural Language Processing.

Daumé H.  2009.  Unsupervised search-based structured prediction. Proceedings of the 26th Annual International Conference on Machine Learning.

Daumé H.  2009.  Non-parametric bayesian areal linguistics. Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics.

Rai P, Daumé H, Venkatasubramanian S.  2009.  Streamed learning: one-pass SVMs. Proceedings of the 21st international jont conference on Artifical intelligence.

Daumé H.  2009.  Markov random topic fields. Proceedings of the ACL-IJCNLP 2009 Conference Short Papers.

Goyal A, Daumé H, Venkatasubramanian S.  2009.  Streaming for large scale NLP: Language modeling. Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics.

Agarwal A, Daumé H.  2009.  Exponential family hybrid semi-supervised learning. Proceedings of the 21st International Joint Conference on Artifical Intelligence (IJCAI-09).


Daumé H.  2008.  Cross-task knowledge-constrained self training. Proceedings of the Conference on Empirical Methods in Natural Language Processing.

Liu P, Shi Q, Daumé H, Voth GA.  2008.  A Bayesian statistics approach to multiscale coarse graining. The Journal of chemical physics. 129:214114-214114.


Daumé H, Campbell L.  2007.  A Bayesian model for discovering typological implications. ANNUAL MEETING-ASSOCIATION FOR COMPUTATIONAL LINGUISTICS. 45:65-65.

Daumé H.  2007.  Frustratingly easy domain adaptation. Annual meeting-association for computational linguistics. 45:256-256.


Daumé H, Marcu D.  2006.  Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research. 26(1):101-126.

Daumé H, Marcu D.  2006.  Bayesian query-focused summarization. Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics.

Daumé H, Marcu D.  2006.  A Bayesian model for supervised clustering with the Dirichlet process prior. Journal of Machine Learning Research. 6(2):1551-1551.


Daumé H, Marcu D.  2005.  Learning as search optimization: approximate large margin methods for structured prediction. Proceedings of the 22nd international conference on Machine learning.

Daumé H, Marcu D.  2005.  A large-scale exploration of effective global features for a joint entity detection and tracking model. Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing.

Daumé H, Langford J, Marcu D.  2005.  Search-based structured prediction as classification. NIPS Workshop on Advances in Structured Learning for Text and Speech Processing, Whistler, Canada.


Daumé H, Marcu D.  2004.  Supervised clustering with the dirichlet process. NIPS'04 Learning With Structured Outputs Workshop.

Daumé H, Marcu D.  2004.  Generic sentence fusion is an ill-defined summarization task. Proceedings of the Text Summarization Branches Out Workshop at ACL. 4:96-103.

Daumé H, Marcu D.  2004.  A tree-position kernel for document compression. Proceedings of the Fourth Document Understanding Conference (DUC 2004).

Daumé H, Brill E.  2004.  Web search intent induction via automatic query reformulation. Proceedings of HLT-NAACL 2004: Short Papers on XX.


Daumé H, Marcu D.  2002.  A noisy-channel model for document compression. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics.

Daumé H, Echihabi A, Marcu D, Munteanu D, Soricut R.  2002.  GLEANS: A generator of logical extracts and abstracts for nice summaries. Workshop on Automatic Summarization.

Daumé H, Knight K, Langkilde-Geary I, Marcu D, Yamada K.  2002.  The importance of lexicalized syntax models for natural language generation tasks. Proc. of INLG.


Nyberg E, Daumé H.  2001.  Integrated information management: an interactive, extensible architecture for information retrieval. Proceedings of the first international conference on Human language technology research.