Verifying Github Actions Artifacts is not Easy

created: April 05, 2026

tags:

reproducible builds

I created a few shell scripts to, when given a list of URLs to Javascript/Typescript Github actions and a commit, tries to reproduce their build artifacts in a container.

The list of actions I iterated over were obtained by curling the top 500-starred Github repositories and following the links to which actions and commit they brought into their build environment.

There is inherent trust when including a third-party action in your CI/CD pipelines. A developer who is especially cautious of rogue scripts leaking build secrets or compiling backdoors into their software might audit the action's source code for malware themselves. But they probably won't check that the commited build artifacts; often multi-megabyte minified Javascript files with all their dependencies baked in; actually corresponds to the source code. The inherent trust that comes with something that is libre and publicly auditable has been abused in ecosystems like NPM ecosystem to trick people into assuming the releases that are actually published aren't malware.

It was my assumption that, since my study was limited to actions that were used in very important software on Github (the top 500 repos) would be on their best behavior in ensuring people can build their artifacts independently and verify they get the same output. Lots of them come from verified authors, or are owned by Github themselves! At least I thought this meant they would be better than a wider study already done on the reproducibility of NPM packages.

Out of the 517 actions I tried to build (I have up to 628 more I haven't gotten through yet), 166 of them didn't provide a package-lock.json file. That means that they don't adequately describe the exact dependency tree, with pinned versions, they used in their development environment to create and publish their build artifacts. In these cases, I was forced to install the newest versions of the dependencies, which automatically makes the build process non-deterministic; it depends on whatever versions happen to be available at build time to be compiled into the final distributable.

Furthermore, I had to dig through the build failures of over 100 actions to determine which had failed because my build scripts couldn't infer the proper build commands from those that no longer compile at all due to missing inputs or deprecated dependencies.

For those that my scripts failed to build or produce comparable artifacts, it's often because the commands required to build are very ad-hoc. This is a barrier towards automating verification and transparency in the Actions environment!

For the others; I found many forgot to declare a dependency on one of three programs: tsc, ncc, or yarn. The number of actions which did this are in the minority of those I looked at, but I'm not sure why this is so common, and can't guarantee reproducibility without knowing which version of the compiler was used! Additionally, some actions couldn't be built anymore because they required deprecated libraries (openssl1.1). Whichever top repositories are using these actions should update to a newer commit, or drop unmaintained actions from their workflows entirely! Some actions, when inspecting the repository at the specific commit built, frighteningly did not include a source tree corresponding to the artifacts at all!

Of the 401 actions which I deemed fair to include in my study after reviewing the build logs (i.e., my scripts tried the correct build commands), only 57% produced identical ASTs as the pre-bundled artifacts. That means 43% didn't build, or produced a different output than the executables which were published.

I still need to manually inspect why things produced different outputs; but have noticed a trend of different variable names, and differently ordered statements. uglify-js minifies Javascript by renaming variables to be as small as possible, and can possibly produce random output. For differently ordered statements, it's possible different dependency versions were used, or a similar Javascript minifier is inherently non-deterministic!

While I had previously thought of an idea to help automate all of this across Github, I find that's unlikely to be scalable. Instead, I may take a look at Docker actions, and determine the common causes for differing ASTs between Javascript builds!