GitHub’s CodeQL-based code analysis technology has been overhauled and now uses machine learning (ML) to find potential security vulnerabilities in code.
GitHub acquired the technology for CodeQL as part of its acquisition of Semmie. CodeQL is used by security research teams to perform semantic analysis of code and has been made open source by GitHub.
CodeQL builds a database containing a relational representation of the code, and then queries are run against the database to check for specific security issues. The queries are based on patterns from known security issues, and creating the patterns takes time.
GitHub’s Tiferet Gazit said:
“Manual modeling can be time-consuming, and there will always be a long line of lesser-known libraries and private code that we can’t model manually. This is where machine learning comes in.”
The CodeQL team uses samples discovered using the manual models to train deep learning neural networks that can determine if a code snippet contains a potentially risky sink.
This means CodeQL can uncover vulnerabilities even if they arise from using a library the team has never seen before. For example, CodeQL can detect SQL injection vulnerabilities related to lesser-known or closed-source database abstraction libraries.
In terms of accuracy, the team says their tests of CodeQL on repositories that weren’t in the training set and comparing the warnings detected by machine learning and a manual query created by a security researcher averaged a recall of have measured about 80% with an accuracy of about 60%.
On the subject of matching items
GitHub code scanning generally available
GitHub empowers teamwork
GitHub starts actions
Microsoft Buys GitHub – Get Ready for a Bigger Devil
or email your comment to: [email protected]