“Automatic Feature Engineering: Learning to Detect Malware by Mining the Scientific Literature”
Location: LTS Auditorium, 8080 Greenmead Drive
The detection of malware and network attacks increasingly relies on machine learning techniques, which utilize multiple features to separate malicious and benign behaviors. The effectiveness of these techniques primarily depends on the feature engineering process, which is based on human knowledge and intuition. However, given the adversaries’ efforts to evade detection and the growing volume of security reports and publications, the human-driven feature engineering likely draws from a fraction of the relevant knowledge.
In this talk, I will present an approach to engineer such features automatically by mining natural language documents such as research papers, industry reports and hacker forums. We utilize techniques inspired by IBM’s Watson question answering system, and we address challenges and opportunities specific to the security domain. As a proof of concept, we train a classifier with automatically engineered features for detecting Android malware, and we achieve a performance comparable to that of a state-of-the-art malware detector, which uses manually engineered features. In addition, our techniques can suggest informative features that are absent from the manually engineered set, and they can link the features generated to human-understandable concepts that describe malware behaviors.
Finally, I will discuss the remaining challenges for automatically extracting semantic security insights from natural language and the opportunities that this direction opens for understanding and predicting adversary behaviors.
Tudor Dumitras is an assistant professor in the Electrical and Computer Engineering Department and a member of UMIACS at the University of Maryland.
His research focuses on Big Data approaches to problems in system security and dependability. In his previous role at Symantec Research Labs, he built the Worldwide Intelligence Network Environment (WINE)—a platform for experimenting with Big Data techniques.
Dumitras received an honorable mention in the NSA competition for the Best Scientific Cybersecurity Paper of 2012. He also was the recipient of the 2011 A. G. Jordan Award from the ECE Department at Carnegie Mellon University, the 2009 John Vlissides Award from ACM SIGPLAN, and the Best Paper Award at ASP-DAC’03.
He holds a doctorate in electrial and computer engineering from Carnegie Mellon University.