Have you ever wished you could use CodeQL query your code the same way you query a SQL database? Well, that’s exactly what GitHub’s CodeQL enables you to do. It’s a semantic code analysis engine that transforms your code into a structured database that you can use to surface security vulnerabilities or discover new insights.

You don’t need to learn a thing about static analysis or structured queries to benefit from CodeQL. GitHub’s code scanning feature runs hundreds of predefined queries right out of the box—for free on public repositories or as part of GitHub Advanced Security for enterprises. There are also many more niche “query packs” available that go far beyond the default scans. But while the number of ready-made queries is growing all the time, you can also create your own queries to meet your specific needs.

We’ve been writing custom CodeQL queries at Betsson for about two years, including ones to moderate package use, research and quantify code and quality metrics, and facilitate adherence to code structure and preferred architecture design. In this guide, I share some of what we’ve learned to help you get up and running with custom queries as quickly as possible.

Set up a simple local environment

In this guide, we will use JavaScript and Visual Studio Code, but you should be able to follow along regardless of your language and code editor of choice. I invite you to fork this small repository where I collected most of the setup used for this article, including a minimal application called “health-app” that we can scan.

You need the CodeQL command-line interface (CLI) tool to create and configure databases, a language pack for your programming language of choice to convert your code into a query-able database, and one or more query packs. You can find the CLI, packs, and Visual Studio Code plugin on the CodeQL tools page. For help setting everything up, you can refer to the CodeQL CLI quick-start documentation.

You’ll do most of your CodeQL work in the plugin for Visual Studio Code or a similar plugin for your code editor of choice. The Visual Studio Code plugin enables you to connect to different scan targets, design queries using IntelliSense, and run or view results of your scans from produced SARIF (Static Analysis Results Interchange Format) reports.

Create and run your first custom CodeQL query

Before you can run a scan, you need the following:

  1. The project’s source code or repository
  2. A CodeQL database built from that repository
  3. A CodeQL configuration file for the project

Remember, your CodeQL setup—which includes scripts, packages, and databases—will live in a separate directory from the project you’re scanning.

Let’s start by running all commands from the project’s root directory (if you’re using my health-app repository, all of this has been done already): Alternatively, consider checking this article on how you can setup

Initiate CodeQL by running the following (“.” stands for the current directory):

1codeql pack init -d . codeql

This will create the qlpack.yml file in a new subdirectory called codeql the project’s root directory.

Configureqlpack.yml by adding the JavaScript language reference:

1 codeql pack add –dir ./codeql codeql/javascript-all

Create a database from your codebase:

1codeql database create codeql/db -s . -l javascript

This creates a new subdirectory within the root directory called db.

Now let’s make our first custom query! Create a new file in your code editor with the following:

4import javascriptfrom PackageDependencies deps, string namewhere deps.getADependency(name, _)select deps, “Dependency found'” + name + “‘.”

This is a simple query that will return all of a project’s dependencies. Save it as a .ql file inside the newly created codeql subdirectory of the project’s directory. Of course, you can create far more interesting and sophisticated queries, but let’s start here. 

From the VS Code plugin, select the db directory you just created. Then right-click anywhere within the .ql file to run the first scan. The query should produce a list of package.json dependencies.

You can perform many different types of scans with CodeQL. For example, you could block vulnerable log4j usage at scale by disallowing affected versions of the package. You could update the example query we created above to explicitly disallow any library (dotenv in our case) by assigning appropriate security severity level (read on about security severity and alert settings for available options). 

16/*** @name dependencies* @description finds and lists referenced dependencies* @kind problem* @problem.severity error* @security-severity 10.0* @tags setup_check* @id setup*/import javascriptfrom PackageDependencies deps, string namewhere deps.getADependency(name, _) and name.matches(“dotenv”)select deps, “Dependency found'” + name + “‘.”

You can learn more about static analysis and using CodeQL for vulnerability detection from GitHub’s recent tutorial. A more exotic use for CodeQL would be implementing fitness functions to proactively pursue architectural designs in a measurable way.

As you can see, running custom queries locally is quite simple. Now let’s take it up a level with GitHub Actions.

Automating CodeQL query scans with GitHub Actions

The easiest way to run a custom query with GitHub Actions is with GitHub’s CodeQL Analysis workflow, which uses GitHub’s CodeQL action. It has three main components: setup, runner, and reporter. The setup and runner components are pretty self-explanatory. The reporter uploads scan results and a snapshot of your database to your repository context store, and makes them available in your Security tab. The best part is that you can download the database using a GitHub API call, should you want to investigate further or explore results in a semi-manual mode.

To run the custom dependency query we created above, be sure to add both the .ql and qlpack.yml files to your repository. Then set up the Actions workflow.

If you haven’t already enabled GitHub Actions for the repository, click Settings under your repository name. If you cannot see the Actions tab, select the “…”  dropdown menu, then click Actions. Click the button that says I understand my workflows, go ahead and enable them.

On the Actions tab, click New workflow and search for CodeQL Analysis. There should be one result. Click the Configure button.

You should see an Actions workflow YAML file. Add this line to the file in the github/codeql-action/init section (remember to include the white space):

1 queries: +./${{ env.CI_TMP_DIR }}/codeql/deps.ql

Click Commit. This should kick off a CodeQL scan. When the scan is complete you should see something like this in the repository’s Security tab:

CodeQL Query

Note: If you’re using my health-app repository, please be aware that the included codeql-custom.yml workflow requires GitHub Advanced Security. If you don’t have Advanced Security, you can still test the custom workflow by following the steps above.

While this process will work for testing our workflow, in the long run, it’s better to use a custom CodeQL configuration file, not the Actions workflow, to manage which custom queries you run.

Exploring further possibilities of CodeQL query

You can create multiple-language or multiple-configuration setups to quickly gather more information from a single run or perform multiple scans at once. For example, instead of specifying languages up front, you can automatically detect which languages are used in a repository and spawn appropriate scans based on the results. Here is an example of working with the GitHub CLI to fetch information:

1gh api repos/${{ env.CI_REPOSITORY }}/languages -q ‘keys[]’

And this documentation details how to customize your CodeQL scans.

Make something with a CodeQL query. Share it!

Of course, we just scratched the surface of what can and should be done with a CodeQL query. There is much more to be discovered in the documentation and the application itself. As you explore this powerful platform, you’ll probably find yourself making things that other people can use. If you create a query that could be useful in practically all codebases, you can submit your query to the open-source CodeQL query repository. If it’s a bit more niche—for example, a query that’s only applicable to actions written in JavaScript—you can create your own query pack and share it through GitHub Packages. I look forward to seeing what you come up with.

Conclusion

Mastering CodeQL query is essential for robust code analysis. With our comprehensive guide, you’ve gained valuable insights into custom CodeQL queries, empowering you to enhance code security and efficiency. Dive into the world of CodeQL and elevate your coding prowess today.”