December 10th 2025, by François Proulx, VP of Security Research @ BoostSecurity.io
TL;DR: 2025 didn’t give us a new, magical Supply Chain vuln class; instead it gave us attackers who finally started reading our manuals.
From Ultralytics’ pull_request_target 0‑day (where a BreachForums post indicates they used our own poutine scanner to find it) through Kong, tj-actions, GhostAction, Nx, GlassWorm and both Shai‑Hulud waves, the common pattern wasn’t typosquats but Pipeline Parasitism: living off CI/CD, maintainer accounts and developer endpoints using the same tools and patterns we published to defend them.
The vuln mechanics stayed boring: shell injections and over‑privileged tokens. But they were operationalized with worms, invisible Unicode payloads, blockchain C2, and even wiper failsafes.
Thankfully, platforms are finally improving, yet “pwn request” is here to stay; the only sustainable answer is to treat pipelines as production systems and publish future research assuming adversaries are our most diligent readers!
Table of Contents
Introduction: The Uncomfortable Baseline
Chapter 1: The turning point: Ultralytics & BreachForums
Chapter 2: Pipeline Parasitism goes mainstream
Chapter 3: Invisible Enemies: Unicode & GlassWorm
Chapter 4: Ecosystem Scale: The Shai-Hulud Worms
Chapter 5: Analysis: Research as a Requirements Document
Conclusion: Defensible by Design
In 2025, the unsettling part wasn’t that Supply Chain attacks suddenly became possible: it was that they started to look uncomfortably familiar. Campaign after campaign read like someone had taken years of CI/CD vulnerability research, SLSA Threat Model diagrams, conference exploitation demos and simply ran them in production.
This article is NOT an end of year “Top 10 Most Bad Ass Breaches” recap. It’s the story of a pivot point: the year where pipeline hardening guides, LOTP talks, and Open Source scanners stopped being just defensive artifacts and became part of the offensive toolkit. If you spent 2023-2024 mapping how PRs, bots, and build runners could go wild, 2025 felt like watching those PTSD-inducing Tabletop whiteboard scenarios replayed at ecosystem scale.
To understand why, we have to start with the uncomfortable baseline: we already knew the foundations were fragile. I hate to say it, but I saw that movie playing in my head several years ago as I exploited my first “pwn request” and had my own “Oh Shit!” moment.
Filippo Valsorda captured the mood of 2025 in his essay “State of Supply Chain Compromise”. The thesis was simple and uncomfortable: our ecosystem is structurally fragile, compromise is inevitable, and we should stop pretending otherwise.
By the time his essay circulated through Blue Team Slack channels and newsletters, a loose coalition of researchers and builders had already spent a better part of the past couple of years stress‑testing CI/CD systems:
A large portion of the work we did was intentionally public. We wanted maintainers and platform teams to see how bad things were. We responsibly disclosed hundreds of those vulnerabilities, so much so that we had to build an agentic pipeline to triage, automate validation and generate draft reports.
We Open Sourced scanners like poutine to statically catch unsafe workflow patterns. We wrote articles like “Weaponizing Dependabot” to explain how seemingly benign automation can be chained into high‑impact attacks, and “Split‑Second Side Doors” to show how Bot‑Delegated TOCTOU in CI can break your core Threat Model assumptions even when humans think they are “approving” safe changes.
We talked at several conferences, we were invited on podcasts, we created a whole CTF training. Many of us were early contributors and cheerleaders for efforts like SLSA’s Source and Build track, trying to drag pipelines out of the “2005 PHP Web App” era of secure coding and into something more principled.
We weren’t the only ones listening.
Figure 1: The smoking gun. Threat actors explicitly citing our defensive tools.
Every field has a moment where a vague concern crystallizes into hard evidence. For CI/CD Supply Chain security, for me, that moment came in December 2024 with the Ultralytics incident.
A few months earlier, during routine research, we had flagged a painfully obvious injection bug in ultralytics/actions, a GitHub Action used in the YOLOv5 ecosystem. The pattern was depressingly familiar:
A textbook “pwn request” PPE scenario. We made a note to come back to it. Other fires were burning.
In early December, news broke: Ultralytics’ PyPI package had been trojanized. A compromised GitHub Actions workflow had exfiltrated publishing credentials and shipped cryptomining payloads to unsuspecting ML users.
That was bad. The next discovery was worse.
Looking at Dark Web chatter using Flare, we found a freshly created BreachForums account. The actor’s posting history looked… concise:
This wasn’t vague speculation about “attackers probably read our blogs.” This was a threat actor explicitly crediting research tools and methodology, weaponizing it almost verbatim, and then disappearing.
It was the smoking gun.
We were frustrated. We had seen the bug months before. Our intuition was that it could have been caught even earlier, by looking at the ecosystem from the outside, correlating suspicious workflow changes, package publishes, and maintainer activity.
So we did what engineers do when they’re annoyed: we built more plumbing.
Over a hackathon, we stitched together what became our Package Threat Hunter pipeline, ingesting the GitHub public events firehose, layering on some secret sauce, and trying to catch Build Pipeline exploits “in the act” instead of reverse‑engineering them weeks later.
Ten days later, Kong’s Kubernetes Ingress Controller incident happened, an unauthorized 3.4.0 release pushed through their CI, using legitimate build scripts and signing paths, but shipping a cryptominer.
Package Threat Hunter had captured the whole thing, minute by minute.
That experience locked in a mental model that 2025 would keep reinforcing: the interesting compromises were no longer simple typosquats. They were pipeline‑centric operations, where attackers:
And increasingly, they were doing it using our own research as a field guide.
The first quarter of 2025 made it clear that Ultralytics and Kong weren’t anomalies.
In March, we saw what happens when an attacker decides to treat GitHub Actions themselves as a transitive Supply Chain.
By compromising the maintainer account for tj-actions/changed-files, a hugely popular Action used in tens of thousands of workflows, an adversary was able to backdoor all existing versions, injecting a one‑liner shell payload that ran in the context of each consumer’s workflow.
This wasn’t subtle. It was “GitHub‑Actions‑on‑GitHub‑Actions”, a meta Supply Chain attack squarely aligned with the PPE patterns offensive researchers had been demonstrating for years. A popular vulnerable GitHub Action is the holy grail; the blast radius can be gigantic. It’s a side-door 0-day to any workflow having it as a dependency.
Figure 2: The meme from our Under The Radar talk slide deck now reality (early 2024)
GitHub’s response was equally telling: they took the unusual step of globally yanking tags, knowingly breaking builds to cut off impact. That’s the kind of move platforms make when they recognize they’re dealing with systemic, not local, risk.
By September, GhostAction pushed the idea further. Here, the “malware” wasn’t in a dependency: it was a GitHub Actions workflow.
Compromised accounts (ATO) received a malicious YAML file wired to run on every push and pull request. The logic was simple but devastating: on every execution, the workflow walked through all available secrets it could reach: GitHub, Docker Hub, NPM, PyPI, and cloud providers, then bundled them up and exfiltrated them off to attacker‑controlled infrastructure.
GitGuardian’s telemetry later showed hundreds of accounts and thousands of secrets impacted. More importantly, GhostAction behaved like a crude GitHub‑native worm. Once the malicious YAML landed in a few places, new tokens were exfiltrated and the cycle repeated.
The Nx incident followed a similar script to Kong, with a twist. The root cause was depressingly familiar: a flawed GitHub Actions workflow wired to pull_request_target that shoved unsanitized PR title and body straight into a shell, giving attackers high‑privilege RCE in the repo. The gut punch came when Adnan Khan pointed out that this workflow hadn’t been hand‑written at all: it was generated and committed by Claude Code. In other words, the attacker walked through a CI backdoor that an AI had casually stamped into the codebase for them before chaining other AI CLIs for recon in the malware itself. This is exactly why we’ve been wiring tools like poutine into MCP servers and code assistants: if you don’t put a LOTP‑aware linter in the loop, your “helper” can quietly gift attackers the perfect pull_request_target 0‑day.
If Nx was a reminder that AI can write you a perfect CI 0‑day, Grafana was the reminder that good instrumentation can still save you.
In April 2025, Grafana Labs had their own pull_request_target‑powered PPE: an insecure workflow let an attacker run a carefully crafted branch name through a shell, exfiltrating environment variables and a handful of credentials from a GitHub Actions job. On paper, that’s the same story as Ultralytics or Nx, a classic pwn‑request bug wired straight into CI.
The difference is how it ended. Grafana had seeded their environment with canary tokens, high‑value‑looking AWS keys and other decoys whose only purpose was to scream if anyone touched them. When the attacker validated one of those keys, Grafana’s team got an immediate page, swarmed the incident, rotated everything, and confirmed there was no production or customer impact. They followed up by hardening their workflows, leaning on tools like Zizmor, a scanner in the same family as poutine that catches vulnerable Actions at scale, and later wrote publicly about how they design and place canaries.
Grafana’s experience is worth calling out because it shows that Pipeline Parasitism isn’t automatically a death sentence for a well‑instrumented, well‑staffed project. Most small, understaffed but wildly popular dependencies will never have a dedicated SRE on pager duty; expecting them to hand‑craft canaries and incident playbooks is wishful thinking. The lesson here is less “everyone should do what Grafana did” and more “this kind of tripwire should exist as a platform‑level feature”. Registries, CI providers, or security tooling can make it cheap and mostly automatic for the long tail, not just a bespoke trick reserved for the Grafanas of the world.
As defenders got better at spotting crude pipeline abuse, attackers adapted in two directions: invisibility and persistence. Two campaigns in particular: the first Unicode‑steganography NPM malware and GlassWorm, felt like they were written specifically to humiliate our assumptions about what “obvious” malware looks like.
The os-info-checker-es6 family looked like yet another forgettable system‑info helper until fellow researchers pulled it apart and realized the malicious logic wasn’t just obfuscated, it was missing from the tools we trusted to show us code. The payload was woven through Unicode Private Use Area (PUA) which are zero‑width characters in a way that made the bad branches completely invisible in VS Code, vim, Emacs, and GitHub’s web diff. You could review the file line by line and never see the teeth. That shock is what led us to build an Open Source tool we called puant (“stinky” in French), whose only job is to sniff out suspicious PUA and zero‑width usage so maintainers have at least a fighting chance of noticing when a diff is lying to them. Pair that with a C2 channel that hid commands and exfiltrated data inside Google Calendar events, and you get tradecraft that feels a lot closer to Lazarus Group / “Contagious Interview”‑style operations than to a bored cryptominer.
GlassWorm adapted this technique to target VS Code and Open VSX publishers directly. They used the invisible Unicode trick to hide activation logic within benign-looking extensions, but paired it with a sophisticated worming capability that used stolen tokens to infect the maintainers’ other projects. Perhaps most alarmingly, they hardened their infrastructure by using the Solana blockchain for C2 communication, ensuring their command channel remained up even if their servers were nuked.
Taken together, the first Unicode‑PUA NPM malware and GlassWorm were a visceral reminder that “just read the diff” or “just check the extension reviews” is no longer a sufficient review strategy, and that developer endpoints: IDEs and extension ecosystems are now first‑class targets in their own right, not collateral damage.
By late summer, attackers stopped sniping individual packages and went straight after the humans who maintain the building blocks of the JavaScript ecosystem. The Great NPM Heist against Qix‑maintained extremely popular and critical packages (chalk, debug, ansi-styles, and many more) was pure social engineering: a convincing phishing campaign spoofing NPM support and a fake TOTP management page were enough to turn a single maintainer account into a distribution hub for malicious versions of dozens of core libraries. The payload was a crypto stealer quietly swapping wallet addresses in any app that handled them. It never fully realized its potential, but it proved the point: compromise the right person, and you can poison billions of weekly installs.
The first Shai‑Hulud wave took that insight and automated it. Instead of using a stolen NPM token to backdoor a single package, the worm’s post install hook enumerated every package owned by the victim maintainer, injected a huge obfuscated payload, and republished new versions. Exfiltration flowed into GitHub repositories acting as dropboxes for stolen tokens and secrets. Within days, “a few compromised packages” had become hundreds of trojanized libraries and thousands of downstream repos touched, a maintainer‑level worm, not a one‑off incident.
Shai‑Hulud 2.0 was the same concept turned up to eleven. Compromised packages dropped a preinstall script that installed or located the Bun runtime and spawned a heavily obfuscated bun_environment.js. On any machine with an NPM token, the worm walked the maintainer’s entire portfolio, backdooring and republishing packages en masse and spreading across several hundreds of packages. The payload was tuned for developer endpoints and CI: it hunted GitHub PATs, NPM credentials, cloud keys, SSH keys, then exfiltrated them via public GitHub repos literally branded “Sha1‑Hulud: The Second Coming,” often mixing data from multiple victims.
On machines with enough access, Shai‑Hulud 2.0 went further and registered the host as a GitHub Actions self‑hosted runner, wiring in attacker‑controlled workflows so that revoking NPM tokens wasn’t enough, you now had a durable, GitHub‑mediated RCE path back into a victim's laptop. And when it couldn’t steal or persist, no network, no usable tokens, it sometimes fell back to a blunt option: recursively deleting everything writable under the user’s home directory. At that point you’re not dealing with “just” a Supply Chain compromise, but with an ecosystem worm, a developer‑endpoint infostealer, a GitHub-proxied backdoor, and a wiper rolled into one, all rooted in the same lesson as the Heist: maintainers are the real root of trust.
Looking across Ultralytics, Kong, tj-actions, GhostAction, Nx, GlassWorm, the Great NPM Heist, and both Shai‑Hulud waves, a pattern emerges on the pipeline and maintainer side of the house:
Campaigns like rand-user-agent or os-info-checker-es6 live in a slightly different bucket: they were born malicious, crafted from day one as watering‑hole artifacts (tiny helpers, convenience libraries, “just‑trust‑me” interview tools) rather than compromises of long‑standing, trusted packages. They don’t need our research to exist; registry incentives and the economics of “npm install whatever Stack Overflow says” are enough.
But even there, the ecosystem reaction is shaped by the same research. The reason we recognized the Unicode PUA trick, built tools like puant, and could talk about blockchain‑backed C2 with any precision is because the community has spent years poking holes in how we review, ship, and consume code.
Attackers are doing what we would do in their position: reading everything, cloning everything, scripting everything. That doesn’t mean we should stop publishing research. It does mean we need to internalize a hard truth: there is no longer a clean line between “defensive” and “offensive” Supply Chain research.
Every hardening guide is also a gap analysis for the next campaign. Every proof‑of‑concept is a template. Every detection rule is a hint about how to evade the next version.
The good news is that the ecosystem is responding. GitHub is on the verge (in early December 2025) to tighten, partially, the Threat Model around pull_request_target and branch‑protected environments, chipping away at the “easy mode” PPE paths that powered Ultralytics and many copycats. NPM has changed token policies, including expiration for publish tokens, directly in response to the first Shai‑Hulud wave, making stealing publish tokens slightly less attractive. Extension marketplaces are starting to treat publisher accounts and signing keys as critical infrastructure, not just UX plumbing.
These changes matter. Some of the most egregious low‑hanging fruit is finally disappearing.
But as Clint Gibler put it in the TL;DRsec #307 newsletter referring to Shai‑Hulud 2.0, there’s a different kind of unease too: after each ecosystem‑level incident, we spin up a flurry of >80% overlapping vendor posts that can feel, in his words, like “a lot of duplicate work,” more marketing than new signal. The real opportunity cost isn’t just readers’ attention, it’s all the expert time that could be going into longer‑term, solve‑this‑class‑of‑problem work instead of re‑describing the same campaign every few months.
Right now, most CI/CD platforms are effectively RCE As A Service. We’ve handed every repo a Swiss‑army knife of triggers, runners, and integrations, plus a Threat Model you only really understand if you treat the docs as light bedtime horror. There are too many choices, too many sharp edges, and too many caveats that only surface after someone chains them into a real attack.
That’s barely acceptable for well‑funded product teams; it’s untenable for the tiny, under‑resourced Open Source projects that end up underpinning half the internet. We can’t keep playing the “SSO tax” game where enterprise‑grade detection and response sits behind paywalls while the weekend pet project, the one that quietly becomes that XKCD comic everyone loves to share about a single unpaid maintainer holding up the world, gets the most dangerous defaults and the fewest guardrails, and we all keep shrugging and reposting the joke instead of fixing the incentives.
If 2025 was the year attackers industrialized Living Off The Pipeline, 2026 needs to be the year we start designing pipelines and ecosystems that are defensible by design.
That means more batteries‑included, paved‑road platforms and fewer sharp edges, especially for Open Source: safer defaults, opinionated workflows behavior, and built‑in tripwires that make the secure path the easy path instead of an expert‑only side quest.
That points to a few concrete shifts:
Finally, we should keep doing what worked best in 2025: collaborating as humans.
As an olive branch to ourselves, we could make that collaboration a little more intentional: a small, vendor‑neutral Signal group where the people who actually reverse this stuff, ship the PoCs, and write the hardening guides can compare notes outside of weeks where marketing puts us in the line of fire. A place to trade weird telemetry, half‑baked ideas, and “does this look like the next Shai-Hulud or am I just tired?” questions, ideally without accidentally re‑creating a "SignalGate" of our own by inviting the threat actors into the chat 🤣.
The reason many of these campaigns were understood and contained as quickly as they were is that individuals across companies and communities quietly shared intel. People like Adnan Khan pushing the boundaries of CI/CD exploitation so others could understand the risk. Rami McCarthy sending a friendly “URGENT PSA - [redacted] got hit by the Nx issue. Revoke everything. Sorry!” DM when telemetry suggested one of our own employees had been hit. William “Woody” Woodruff keeping tools like Zizmor sharp so GitHub workflow bugs don’t stay bugs for long. Paul McCarty curating the Open Source Malware database and building tools like undelete that let us see what really happens on shady Asian NPM mirrors. Charlie Eriksen being one of the first on scene reversing new blobs so the rest of us don’t have to start from raw hex. Aviad Hahami et al. doing a massive forensics job retracing the first innings of complex campaigns like tj-actions. Analysts painstakingly teasing signal out of obfuscated bundles. Maintainers disclosing painful compromises in public so others could learn.
The irony of 2025 is that while attackers were busy weaponizing our research, the most effective defense was still relationships, Signal groups, DMs, and late‑night calls where we compared notes and turned isolated incidents into a coherent picture.
If we do this right, the story we tell a year from now won’t be “look how bad Shai‑Hulud 3.0 was.”
It will be: attackers kept reading our manuals and it stopped helping them.