Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3600160.3600182acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaresConference Proceedingsconference-collections
research-article
Open Access

STIXnet: A Novel and Modular Solution for Extracting All STIX Objects in CTI Reports

Published:29 August 2023Publication History

ABSTRACT

The automatic extraction of information from Cyber Threat Intelligence (CTI) reports is crucial in risk management. The increased frequency of the publications of these reports has led researchers to develop new systems for automatically recovering different types of entities and relations from textual data. Most state-of-the-art models leverage Natural Language Processing (NLP) techniques, which perform greatly in extracting a few types of entities at a time but cannot detect heterogeneous data or their relations. Furthermore, several paradigms, such as STIX, have become de facto standards in the CTI community and dictate a formal categorization of different entities and relations to enable organizations to share data consistently.

This paper presents STIXnet, the first solution for the automated extraction of all STIX entities and relationships in CTI reports. Through the use of NLP techniques and an interactive Knowledge Base (KB) of entities, our approach obtains F1 scores comparable to state-of-the-art models for entity extraction (0.916) and relation extraction (0.724) while considering significantly more types of entities and relations. Moreover, STIXnet constitutes a modular and extensible framework that manages and coordinates different modules to merge their contributions uniquely and exhaustively. With our approach, researchers and organizations can extend their Information Extraction (IE) capabilities by integrating the efforts of several techniques without needing to develop new tools from scratch.

References

  1. Alfred V. Aho and Margaret J. Corasick. 1975. Efficient String Matching: An Aid to Bibliographic Search. Commun. ACM 18, 6 (jun 1975), 333–340.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Sean Barnum. 2012. Standardizing cyber threat intelligence information with the structured threat information expression (stix). Mitre Corporation 11 (2012), 1–22.Google ScholarGoogle Scholar
  3. David Bianco. 2013. The pyramid of pain. Enterprise Detection & Response (2013).Google ScholarGoogle Scholar
  4. Long Chen, Yu Gu, Xin Ji, Chao Lou, Zhiyong Sun, Haodan Li, Yuan Gao, and Yang Huang. 2019. Clinical trial cohort selection based on multi-level rule-based natural language processing system. Journal of the American Medical Informatics Association 26, 11 (07 2019), 1218–1226.Google ScholarGoogle ScholarCross RefCross Ref
  5. Ping Chen, Lieven Desmet, and Christophe Huygens. 2014. A Study on Advanced Persistent Threats. In Communications and Multimedia Security, Bart De Decker and André Zúquete (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 63–72.Google ScholarGoogle Scholar
  6. K. R. Chowdhary. 2020. Natural Language Processing. Springer India, 603–649.Google ScholarGoogle Scholar
  7. Julie Connolly, Mark Davidson, and Charles Schmidt. 2014. The trusted automated exchange of indicator information (taxii). The MITRE Corporation (2014), 1–20.Google ScholarGoogle Scholar
  8. Christiane Fellbaum. 2010. WordNet. Springer Netherlands, 231–243.Google ScholarGoogle Scholar
  9. Houssem Gasmi, Jannik Laval, and Abdelaziz Bouras. 2019. Information Extraction of Cybersecurity Concepts: An LSTM Approach. Applied Sciences 9, 19 (2019).Google ScholarGoogle Scholar
  10. Balázs Godény. 2012. Rule Based Product Name Recognition and Disambiguation. In 2012 IEEE 12th International Conference on Data Mining Workshops. 858–860.Google ScholarGoogle Scholar
  11. Lei Hua and Chanqin Quan. 2016. A shortest dependency path based convolutional neural network for protein-protein relation extraction. BioMed research international 2016 (2016).Google ScholarGoogle Scholar
  12. Natalia Konstantinova. 2014. Review of Relation Extraction Methods: What Is New Out There?. In International Conference on Analysis of Images, Social Networks and Texts. Springer International Publishing, 15–28.Google ScholarGoogle ScholarCross RefCross Ref
  13. Valentine Legoy, Marco Caselli, Christin Seifert, and Andreas Peter. 2020. Automated Retrieval of ATT&CK Tactics and Techniques for Cyber Threat Reports.Google ScholarGoogle Scholar
  14. Tao Li, Yuanbo Guo, and Ankang Ju. 2019. A Self-Attention-Based Approach for Named Entity Recognition in Cybersecurity. In 2019 15th International Conference on Computational Intelligence and Security (CIS). 147–150.Google ScholarGoogle ScholarCross RefCross Ref
  15. Sepideh Mesbah, Christoph Lofi, Manuel Valle Torre, Alessandro Bozzon, and Geert-Jan Houben. 2018. TSE-NER: An Iterative Approach for Long-Tail Entity Extraction in Scientific Publications. In International Semantic Web Conference. Springer International Publishing, Cham, 127–143.Google ScholarGoogle Scholar
  16. Abhishek Nadgeri, Anson Bastos, Kuldeep Singh, Isaiah Onando Mulang’, Johannes Hoffart, Saeedeh Shekarpour, and Vijay Saraswat. 2021. KGPool: Dynamic Knowledge Graph Context Selection for Relation Extraction.Google ScholarGoogle Scholar
  17. Luke Noel. 2021. RedAI: A machine learning approach to cyber threat intelligence. (2021).Google ScholarGoogle Scholar
  18. Jakub Piskorski and Roman Yangarber. 2013. Information extraction: Past, present and future. Springer, 23–49.Google ScholarGoogle Scholar
  19. Z Porkorny. 2018. What Are the Phases of The Threat Intelligence Lifecycle. The Threat Intelligence Handbook (2018).Google ScholarGoogle Scholar
  20. Alexandra Pomares Quimbaya, Alejandro Sierra Múnera, Rafael Andrés González Rivera, Julián Camilo Daza Rodríguez, Oscar Mauricio Muñoz Velandia, Angel Alberto Garcia Peña, and Cyril Labbé. 2016. Named Entity Recognition Over Electronic Health Records Through a Combined Dictionary-based Approach. Procedia Computer Science 100 (2016), 55–61.Google ScholarGoogle ScholarCross RefCross Ref
  21. Priyanka Ranade, Aritran Piplai, Anupam Joshi, and Tim Finin. 2021. CyBERT: Contextualized Embeddings for the Cybersecurity Domain. In 2021 IEEE International Conference on Big Data (Big Data). 3334–3342.Google ScholarGoogle Scholar
  22. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.Google ScholarGoogle ScholarCross RefCross Ref
  23. Johan Sigholm and Martin Bang. 2013. Towards Offensive Cyber Counterintelligence: Adopting a Target-Centric View on Advanced Persistent Threats. In 2013 European Intelligence and Security Informatics Conference. 166–171.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Blake E Strom, Andy Applebaum, Doug P Miller, Kathryn C Nickels, Adam G Pennington, and Cody B Thomas. 2018. Mitre att&ck: Design and philosophy. In Technical report. The MITRE Corporation.Google ScholarGoogle Scholar
  25. Peng Sun, Xuezhen Yang, Xiaobing Zhao, and Zhijuan Wang. 2018. An Overview of Named Entity Recognition. In 2018 International Conference on Asian Language Processing (IALP). 273–278.Google ScholarGoogle Scholar
  26. Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019. BERT Rediscovers the Classical NLP Pipeline. (2019).Google ScholarGoogle Scholar
  27. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need.Google ScholarGoogle Scholar
  28. Thomas D. Wagner, Khaled Mahbub, Esther Palomar, and Ali E. Abdallah. 2019. Cyber threat intelligence sharing: Survey and research directions. Computers & Security 87 (2019), 101589.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Xuren Wang, Runshi Liu, Jie Yang, Rong Chen, Zhiting Ling, Peian Yang, and Kai Zhang. 2022. Cyber Threat Intelligence Entity Extraction Based on Deep Learning and Field Knowledge Engineering. In 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE, 406–413.Google ScholarGoogle Scholar
  30. Rebecka Weegar. 2021. Applying natural language processing to electronic medical records for estimating healthy life expectancy. The Lancet Regional Health – Western Pacific 9 (01 Apr 2021).Google ScholarGoogle Scholar
  31. Sachini Weerawardhana, Subhojeet Mukherjee, Indrajit Ray, and Adele Howe. 2015. Automated Extraction of Vulnerability Information for Home Computer Security. In Foundations and Practice of Security, Frédéric Cuppens, Joaquin Garcia-Alfaro, Nur Zincir Heywood, and Philip W. L. Fong (Eds.). Springer International Publishing, 356–366.Google ScholarGoogle Scholar
  32. Zhibiao Wu and Martha Palmer. 1994. Verb Semantics and Lexical Selection. (1994).Google ScholarGoogle Scholar
  33. Yan Xu, Lili Mou, Ge Li, Yunchuan Chen, Hao Peng, and Zhi Jin. 2015. Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Paths. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1785–1794.Google ScholarGoogle ScholarCross RefCross Ref
  34. Zhihao Yan and Jingju Liu. 2020. A Review on Application of Knowledge Graph in Cybersecurity. In 2020 International Signal Processing, Communications and Engineering Management Conference (ISPCEM). 240–243.Google ScholarGoogle Scholar
  35. Yizhe You, Jun Jiang, Zhengwei Jiang, Peian Yang, Baoxu Liu, Huamin Feng, Xuren Wang, and Ning Li. 2022. TIM: threat context-enhanced TTP intelligence mining on unstructured threat data. Cybersecurity 5, 1 (01 Feb 2022), 3.Google ScholarGoogle Scholar
  36. Yinghai Zhou, Yitong Ren, Ming Yi, Yanjun Xiao, Zhiyuan Tan, Nour Moustafa, and Zhihong Tian. 2023. CDTier: A Chinese Dataset of Threat Intelligence Entity Relationships. IEEE Transactions on Sustainable Computing (2023).Google ScholarGoogle Scholar
  37. Yinghai Zhou, Yi Tang, Ming Yi, Chuanyu Xi, and Hai Lu. 2022. CTI View: APT Threat Intelligence Analysis System. Security and Communication Networks 2022 (03 Jan 2022), 9875199.Google ScholarGoogle Scholar

Index Terms

  1. STIXnet: A Novel and Modular Solution for Extracting All STIX Objects in CTI Reports

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ARES '23: Proceedings of the 18th International Conference on Availability, Reliability and Security
        August 2023
        1440 pages
        ISBN:9798400707728
        DOI:10.1145/3600160

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 29 August 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate228of451submissions,51%
      • Article Metrics

        • Downloads (Last 12 months)496
        • Downloads (Last 6 weeks)102

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format