Microsoft has released a new program that says it can accurately differentiate 99 percent of the time between security and non-security software bugs.
A data collection of 13 million work items and bugs from 47,000 developers stored across AzureDevOps and GitHub repositories was used by Microsoft to build a process and machine learning model that correctly distinguishes between security and non-security bugs.
Additionally, the program can reliably detect critical, high-priority security bugs 97 percent of the time on average.
The company plans to open source the technique on GitHub in the coming months along with examples of models and other tools so the program can be used to help human experts.
The training data and the statistical samples used to provide them with a manageable amount of data to review were accepted by security experts when designing their model. This data was then encoded into representations called feature vectors, as Microsoft researchers used a two-step method to construct the device.
The model first learned to distinguish security and non-security bugs, and then learned to apply safety labels (critical, significant or low-impact) to those bugs.
In a blog post announcing the new system, Microsoft explained how it used machine learning models and security experts to better identify security bugs, saying:
“Every day, software developers stare down a long list of features and bugs that need to be addressed. Security professionals try to help by using automated tools to prioritize security bugs, but too often, engineers waste time on false positives or miss a critical security vulnerability that has been misclassified. To tackle this problem data science and security teams came together to explore how machine learning could help. We discovered that by pairing machine learning models with security experts, we can significantly improve the identification and classification of security bugs.”