research-article

Open Access

STIXnet: A Novel and Modular Solution for Extracting All STIX Objects in CTI Reports

Authors:
Francesco Marchiori

University of Padova, Italy

University of Padova, Italy

0000-0001-5282-0965
View Profile

,
Mauro Conti

University of Padova, Italy

University of Padova, Italy

0000-0002-3612-1934
View Profile

,
Nino Vincenzo Verde

Leonardo S.p.A., Italy

Leonardo S.p.A., Italy

0000-0002-2379-6414
View Profile

ARES '23: Proceedings of the 18th International Conference on Availability, Reliability and SecurityAugust 2023Article No.: 3Pages 1–11https://doi.org/10.1145/3600160.3600182

Published:29 August 2023Publication History

ARES '23: Proceedings of the 18th International Conference on Availability, Reliability and Security

Pages 1–11

ABSTRACT

The automatic extraction of information from Cyber Threat Intelligence (CTI) reports is crucial in risk management. The increased frequency of the publications of these reports has led researchers to develop new systems for automatically recovering different types of entities and relations from textual data. Most state-of-the-art models leverage Natural Language Processing (NLP) techniques, which perform greatly in extracting a few types of entities at a time but cannot detect heterogeneous data or their relations. Furthermore, several paradigms, such as STIX, have become de facto standards in the CTI community and dictate a formal categorization of different entities and relations to enable organizations to share data consistently.

This paper presents STIXnet, the first solution for the automated extraction of all STIX entities and relationships in CTI reports. Through the use of NLP techniques and an interactive Knowledge Base (KB) of entities, our approach obtains F1 scores comparable to state-of-the-art models for entity extraction (0.916) and relation extraction (0.724) while considering significantly more types of entities and relations. Moreover, STIXnet constitutes a modular and extensible framework that manages and coordinates different modules to merge their contributions uniquely and exhaustively. With our approach, researchers and organizations can extend their Information Extraction (IE) capabilities by integrating the efforts of several techniques without needing to develop new tools from scratch.

References

Alfred V. Aho and Margaret J. Corasick. 1975. Efficient String Matching: An Aid to Bibliographic Search. Commun. ACM 18, 6 (jun 1975), 333–340.Google ScholarDigital Library
Sean Barnum. 2012. Standardizing cyber threat intelligence information with the structured threat information expression (stix). Mitre Corporation 11 (2012), 1–22.Google Scholar
David Bianco. 2013. The pyramid of pain. Enterprise Detection & Response (2013).Google Scholar
Long Chen, Yu Gu, Xin Ji, Chao Lou, Zhiyong Sun, Haodan Li, Yuan Gao, and Yang Huang. 2019. Clinical trial cohort selection based on multi-level rule-based natural language processing system. Journal of the American Medical Informatics Association 26, 11 (07 2019), 1218–1226.Google ScholarCross Ref
Ping Chen, Lieven Desmet, and Christophe Huygens. 2014. A Study on Advanced Persistent Threats. In Communications and Multimedia Security, Bart De Decker and André Zúquete (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 63–72.Google Scholar
K. R. Chowdhary. 2020. Natural Language Processing. Springer India, 603–649.Google Scholar
Julie Connolly, Mark Davidson, and Charles Schmidt. 2014. The trusted automated exchange of indicator information (taxii). The MITRE Corporation (2014), 1–20.Google Scholar
Christiane Fellbaum. 2010. WordNet. Springer Netherlands, 231–243.Google Scholar
Houssem Gasmi, Jannik Laval, and Abdelaziz Bouras. 2019. Information Extraction of Cybersecurity Concepts: An LSTM Approach. Applied Sciences 9, 19 (2019).Google Scholar
Balázs Godény. 2012. Rule Based Product Name Recognition and Disambiguation. In 2012 IEEE 12th International Conference on Data Mining Workshops. 858–860.Google Scholar
Lei Hua and Chanqin Quan. 2016. A shortest dependency path based convolutional neural network for protein-protein relation extraction. BioMed research international 2016 (2016).Google Scholar
Natalia Konstantinova. 2014. Review of Relation Extraction Methods: What Is New Out There?. In International Conference on Analysis of Images, Social Networks and Texts. Springer International Publishing, 15–28.Google ScholarCross Ref
Valentine Legoy, Marco Caselli, Christin Seifert, and Andreas Peter. 2020. Automated Retrieval of ATT&CK Tactics and Techniques for Cyber Threat Reports.Google Scholar
Tao Li, Yuanbo Guo, and Ankang Ju. 2019. A Self-Attention-Based Approach for Named Entity Recognition in Cybersecurity. In 2019 15th International Conference on Computational Intelligence and Security (CIS). 147–150.Google ScholarCross Ref
Sepideh Mesbah, Christoph Lofi, Manuel Valle Torre, Alessandro Bozzon, and Geert-Jan Houben. 2018. TSE-NER: An Iterative Approach for Long-Tail Entity Extraction in Scientific Publications. In International Semantic Web Conference. Springer International Publishing, Cham, 127–143.Google Scholar
Abhishek Nadgeri, Anson Bastos, Kuldeep Singh, Isaiah Onando Mulang’, Johannes Hoffart, Saeedeh Shekarpour, and Vijay Saraswat. 2021. KGPool: Dynamic Knowledge Graph Context Selection for Relation Extraction.Google Scholar
Luke Noel. 2021. RedAI: A machine learning approach to cyber threat intelligence. (2021).Google Scholar
Jakub Piskorski and Roman Yangarber. 2013. Information extraction: Past, present and future. Springer, 23–49.Google Scholar
Z Porkorny. 2018. What Are the Phases of The Threat Intelligence Lifecycle. The Threat Intelligence Handbook (2018).Google Scholar
Alexandra Pomares Quimbaya, Alejandro Sierra Múnera, Rafael Andrés González Rivera, Julián Camilo Daza Rodríguez, Oscar Mauricio Muñoz Velandia, Angel Alberto Garcia Peña, and Cyril Labbé. 2016. Named Entity Recognition Over Electronic Health Records Through a Combined Dictionary-based Approach. Procedia Computer Science 100 (2016), 55–61.Google ScholarCross Ref
Priyanka Ranade, Aritran Piplai, Anupam Joshi, and Tim Finin. 2021. CyBERT: Contextualized Embeddings for the Cybersecurity Domain. In 2021 IEEE International Conference on Big Data (Big Data). 3334–3342.Google Scholar
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.Google ScholarCross Ref
Johan Sigholm and Martin Bang. 2013. Towards Offensive Cyber Counterintelligence: Adopting a Target-Centric View on Advanced Persistent Threats. In 2013 European Intelligence and Security Informatics Conference. 166–171.Google ScholarDigital Library
Blake E Strom, Andy Applebaum, Doug P Miller, Kathryn C Nickels, Adam G Pennington, and Cody B Thomas. 2018. Mitre att&ck: Design and philosophy. In Technical report. The MITRE Corporation.Google Scholar
Peng Sun, Xuezhen Yang, Xiaobing Zhao, and Zhijuan Wang. 2018. An Overview of Named Entity Recognition. In 2018 International Conference on Asian Language Processing (IALP). 273–278.Google Scholar
Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019. BERT Rediscovers the Classical NLP Pipeline. (2019).Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention Is All You Need.Google Scholar
Thomas D. Wagner, Khaled Mahbub, Esther Palomar, and Ali E. Abdallah. 2019. Cyber threat intelligence sharing: Survey and research directions. Computers & Security 87 (2019), 101589.Google ScholarDigital Library
Xuren Wang, Runshi Liu, Jie Yang, Rong Chen, Zhiting Ling, Peian Yang, and Kai Zhang. 2022. Cyber Threat Intelligence Entity Extraction Based on Deep Learning and Field Knowledge Engineering. In 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE, 406–413.Google Scholar
Rebecka Weegar. 2021. Applying natural language processing to electronic medical records for estimating healthy life expectancy. The Lancet Regional Health – Western Pacific 9 (01 Apr 2021).Google Scholar
Sachini Weerawardhana, Subhojeet Mukherjee, Indrajit Ray, and Adele Howe. 2015. Automated Extraction of Vulnerability Information for Home Computer Security. In Foundations and Practice of Security, Frédéric Cuppens, Joaquin Garcia-Alfaro, Nur Zincir Heywood, and Philip W. L. Fong (Eds.). Springer International Publishing, 356–366.Google Scholar
Zhibiao Wu and Martha Palmer. 1994. Verb Semantics and Lexical Selection. (1994).Google Scholar
Yan Xu, Lili Mou, Ge Li, Yunchuan Chen, Hao Peng, and Zhi Jin. 2015. Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Paths. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1785–1794.Google ScholarCross Ref
Zhihao Yan and Jingju Liu. 2020. A Review on Application of Knowledge Graph in Cybersecurity. In 2020 International Signal Processing, Communications and Engineering Management Conference (ISPCEM). 240–243.Google Scholar
Yizhe You, Jun Jiang, Zhengwei Jiang, Peian Yang, Baoxu Liu, Huamin Feng, Xuren Wang, and Ning Li. 2022. TIM: threat context-enhanced TTP intelligence mining on unstructured threat data. Cybersecurity 5, 1 (01 Feb 2022), 3.Google Scholar
Yinghai Zhou, Yitong Ren, Ming Yi, Yanjun Xiao, Zhiyuan Tan, Nour Moustafa, and Zhihong Tian. 2023. CDTier: A Chinese Dataset of Threat Intelligence Entity Relationships. IEEE Transactions on Sustainable Computing (2023).Google Scholar
Yinghai Zhou, Yi Tang, Ming Yi, Chuanyu Xi, and Hai Lu. 2022. CTI View: APT Threat Intelligence Analysis System. Security and Communication Networks 2022 (03 Jan 2022), 9875199.Google Scholar

Index Terms

STIXnet: A Novel and Modular Solution for Extracting All STIX Objects in CTI Reports
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Information extraction
2. Security and privacy

Recommendations

CyNER: Information Extraction from Unstructured Text of CTI Sources with Noncontextual IOCs
Advances in Information and Computer Security
Abstract
Cybersecurity threats have been increasing and growing more sophisticated year by year. In such circumstances, gathering Cyber Threat Intelligence (CTI) and following up with up-to-date threat information is crucial. Structured CTI such as ...
Read More
Data-driven analytics for cyber-threat intelligence and information sharing

Efficient analysis of shared Cyber Threat Intelligence (CTI) information is crucial for network risk assessment and security hardening. There is a growing interest in implementing a proactive line of defense through threat profiling. However, ...
Read More
Useful Cyber Threat Intelligence Relation Retrieval Using Transfer Learning
EICC '23: Proceedings of the 2023 European Interdisciplinary Cybersecurity Conference

The emergence of hacker groups extends the complexity and frequency of cyberattacks. To adapt to the rapidly evolving cyberattacks, acquiring valuable information from security incident reports is critical for businesses to gain visibility into the fast-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ARES '23: Proceedings of the 18th International Conference on Availability, Reliability and Security
August 2023
1440 pages
ISBN:9798400707728
DOI:10.1145/3600160

Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 August 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Cyber Threat Intelligence
Information Extraction
Natural Language Processing
STIX
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate228of451submissions,51%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 496
  Total Downloads
- Downloads (Last 12 months)496
- Downloads (Last 6 weeks)102
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

STIXnet: A Novel and Modular Solution for Extracting All STIX Objects in CTI Reports

ARES '23: Proceedings of the 18th International Conference on Availability, Reliability and Security

ABSTRACT

References

Cited By

Index Terms

Recommendations

CyNER: Information Extraction from Unstructured Text of CTI Sources with Noncontextual IOCs

Data-driven analytics for cyber-threat intelligence and information sharing

Useful Cyber Threat Intelligence Relation Retrieval Using Transfer Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

STIXnet: A Novel and Modular Solution for Extracting All STIX Objects in CTI Reports

ARES '23: Proceedings of the 18th International Conference on Availability, Reliability and Security

ABSTRACT

References

Cited By

Index Terms

Recommendations

CyNER: Information Extraction from Unstructured Text of CTI Sources with Noncontextual IOCs

Data-driven analytics for cyber-threat intelligence and information sharing

Useful Cyber Threat Intelligence Relation Retrieval Using Transfer Learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media