IACyC Proceedings - 'JaVul': a Novel Java Dataset for Code Vulnerability Detection Based on CWE Labeling

Conference papers

Authors

Klesida Gjana , Hristina Mihajloska and Emrullah Fatih Yetkin

Abstract

Some of the most significant problems leading to exploitable systems are overlooked code vulnerabilities. To improve the quality of the Machine Learning models used to predict these vulnerabilities, researchers should extensively analyze the quality and relevance of the data available. This study addresses the necessity for a CWE-labeled dataset to aid in the automated vulnerability detection in Java/Spring Boot applications using advanced ML techniques.

Keywords

vulnerability detection, AI-driven security, graph representation, token representation, Java code, CWE labels