ABSTRACT
Modern detection systems use sensor outputs available in the deployment environment to probabilistically identify attacks. These systems are trained on past or synthetic feature vectors to create a model of anomalous or normal behavior. Thereafter, run-time collected sensor outputs are compared to the model to identify attacks (or the lack of attack). While this approach to detection has been proven to be effective in many environments, it is limited to training on only features that can be reliably collected at detection time. Hence, they fail to leverage the often vast amount of ancillary information available from past forensic analysis and post-mortem data. In short, detection systems do not train (and thus do not learn from) features that are unavailable or too costly to collect at run-time. Recent work proposed an alternate model construction approach that integrates forensic "privilege" information---features reliably available at training time, but not at run-time---to improve accuracy and resilience of detection systems. In this paper, we further evaluate two of proposed techniques to model training with privileged information: knowledge transfer, and model influence. We explore the cultivation of privileged features, the efficiency of those processes and their influence on the detection accuracy. We observe that the improved integration of privileged features makes the resulting detection models more accurate. Our evaluation shows that use of privileged information leads to up to 8.2% relative decrease in detection error for fast-flux bot detection over a system with no privileged information, and 5.5% for malware classification.
- A. A. Cardenas, P. K. Manadhata, and S. P. Rajan. Big data analytics for security. Proc. IEEE Security & Privacy, 2013. Google ScholarDigital Library
- Richard Zuech, Taghi M Khoshgoftaar, and Randall Wald. Intrusion detection and big heterogeneous data: a survey. Journal of Big Data, 2015.Google ScholarCross Ref
- Z. Berkay Celik, Patrick McDaniel, Rauf Izmailov, Nicolas Papernot, and Ananthram Swami. Extending detection with forensic information. arXiv:1603.09638, 2016.Google Scholar
- Vladimir Vapnik and Rauf Izmailov. Learning using privileged information: Similarity control and knowledge transfer. Journal of ML Research, 2015. Google ScholarDigital Library
- Vladimir Vapnik and Akshay Vashist. A new learning paradigm: Learning using privileged information. Neural Networks, 2009. Google ScholarDigital Library
- Ting-Fang Yen, Alina Oprea, Kaan Onarlioglu, Todd Leetham, William Robertson, Ari Juels, and Engin Kirda. Beehive: Large-scale log analysis for detecting suspicious activity in enterprise networks. In Proc. Computer Security Applications. ACM, 2013. Google ScholarDigital Library
- Jeffrey Bickford, H Andrés Lagar-Cavilla, Alexander Varshavsky, Vinod Ganapathy, and Liviu Iftode. Security versus energy tradeoffs in host-based mobile malware detection. In Proc. Mobile systems, applications, and services. ACM, 2011. Google ScholarDigital Library
- V. Sharmanska, N. Quadrianto, and C. H. Lampert. Learning to rank using privileged information. In Proc. International Conference on Computer Vision (ICCV), 2013. Google ScholarDigital Library
- Christopher M. Bishop. Pattern recognition and machine learning. 2006. Google ScholarDigital Library
- D. A. Belsley, E. Kuh, and R. E. Welsch. Regression diagnostics: Identifying influential data and sources of collinearity. John Wiley & Sons, 2005.Google Scholar
- Michael Friendly and Ernest Kwan. Where's Waldo? Visualizing collinearity diagnostics. The American Statistician, 2009.Google Scholar
- Z. Berkay Celik, Rauf Izmailov, and Patrick McDaniel. Proof and Implementation of Algorithmic Realization of Learning Using Privileged Information (LUPI) Paradigm: SVMGoogle Scholar
- . Technical Report NAS-TR-0187--2015, CSE Department, PSU, December 2015.Google Scholar
- Z. Berkay Celik and Sema Oktug. Detection of Fast-Flux Networks using various DNS feature sets. In Proc. IEEE Symposium on Computers and Communications (ISCC), 2013.Google ScholarCross Ref
- Microsoft malware classification challenge. https://www.kaggle.com/c/malware-classification/. {Online; accessed 10-May-2015}.Google Scholar
- Ida pro: Disassembler and debugger. http://www.hex-rays.com/idapro/.Google Scholar
- Lei Yu and Huan Liu. Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proc. International Conference on Machine Learning (ICML), 2003. Google ScholarDigital Library
- M Zubair Rafique and Juan Caballero. Firma: Malware clustering and network signature generation with mixed network behaviors. In Proc. RAID. 2013. Google ScholarDigital Library
- Nir Nissim, Robert Moskovitch, Lior Rokach, and Yuval Elovici. Novel active learning methods for enhanced pc malware detection in windows os. Expert Systems with Applications, 2014.Google ScholarCross Ref
- Mansour Ahmadi, Giorgio Giacinto, Dmitry Ulyanov, Stanislav Semenov, and Mikhail Trofimov. Novel feature extraction, selection and fusion for effective malware family classification. arXiv preprint arXiv:1511.04317, 2015.Google Scholar
- Manos Antonakakis, Roberto Perdisci, Yacin Nadji, Nikolaos Vasiloglou, Saeed Abu-Nimeh, Wenke Lee, and David Dagon. From throw-away traffic to bots: detecting the rise of dga-based malware. In Proc. USENIX Security, 2012. Google ScholarDigital Library
- Leyla Bilge, Engin Kirda, Christopher Kruegel, and Marco Balduzzi. Exposure: Finding malicious domains using passive dns analysis. In Proc. NDSS, 2011.Google Scholar
- Sandeep Yadav, Ashwath Kumar Krishna Reddy, AL Reddy, and Supranamaya Ranjan. Detecting algorithmically generated malicious domain names. In Proc. ACM Internet measurement, 2010. Google ScholarDigital Library
- Z. Berkay Celik, Jayaram Raghuram, George Kesidis, and David J Miller. Salting public traces with attack traffic to test flow classifiers. In Proc. Usenix Cyber Security Experimentation and Test, 2011. Google ScholarDigital Library
- Z. Berkay Celik, Robert J Walls, Patrick McDaniel, and Ananthram Swami. Malware traffic detection using tamper resistant features. In Proc. IEEE Military Communications Conference (MILCOM), 2015.Google Scholar
- Ziheng Wang and Qiang Ji. Classifier learning with hidden information. In Proc. IEEE Computer Vision and Pattern Recognition, 2015.Google Scholar
- Daniel Hernández-Lobato, Viktoriia Sharmanska, Kristian Kersting, Christoph H Lampert, and Novi Quadrianto. Mind the nuisance: Gaussian process classification using privileged noise. In Proc. Advances in Neural Information Processing Systems, 2014. Google ScholarDigital Library
- David Lopez-Paz, Léon Bottou, Bernhard Schölkopf, and Vladimir Vapnik. Unifying distillation and privileged information. arXiv preprint arXiv:1511.03643, 2015.Google Scholar
- Z. Berkay Celik, David Lopez-Paz, and Patrick McDaniel. Patient-driven privacy control through generalized distillation. arXiv:1611.08648, 2016.Google Scholar
Index Terms
- Feature Cultivation in Privileged Information-augmented Detection
Recommendations
Detection under Privileged Information
ASIACCS '18: Proceedings of the 2018 on Asia Conference on Computer and Communications SecurityFor well over a quarter century, detection systems have been driven by models learned from input features collected from real or simulated environments. An artifact (e.g., network event, potential malware sample, suspicious email) is deemed malicious or ...
A two-generation based method for few-shot learning with few-shot instance-level privileged information
AbstractFew-shot Learning (FSL) aims to recognize the novel classes from few novel samples. Recently, lots of methods have been proposed to improve FSL performance by introducing privileged information. However, on the one hand, they utilize the class ...
Pedestrian detection based on the privileged information
The pedestrian detection is always a challenging issue in the computer vision. Unlike the object recognition problem, the detection's speed is a critical factor. In order to accelerate detection speed while maintaining competitive accuracy, in this ...
Comments