LLMjacking Grows Up Into Weaponization
When Sysdig first documented LLMjacking in 2024, the threat was essentially theft. Attackers stole cloud AI credentials, resold the compute, and stuck the victim with the bill. It was expensive and annoying, but it was a billing problem. The team's latest research, published this June, marks the moment the threat grew teeth. On June 12, Sysdig watched an attacker use a misconfigured, internet-exposed Ollama model server not to resell compute, but as the reasoning engine for an autonomous offensive tool.
The tool, which the operator called VAPT, is a multi-stage offensive framework that fingerprints services, matches them to known vulnerabilities, synthesizes proof-of-concept exploits, and attempts intrusion, with the hijacked model making decisions at every step. As Sysdig put it bluntly, "the threat actor used exposed model capacity as the brain for their automated hacking tool." That is the line that should reset how security teams think about every unauthenticated inference endpoint in their estate.
A Pipeline That Verifies Its Own Success
What makes VAPT notable is not that it uses an LLM, but how disciplined the engineering is. The captured pipeline runs service fingerprinting, vulnerability matching and triage, web reconnaissance, proof-of-concept synthesis, blind SQL injection payload crafting, secret extraction across more than 100 invocations, and privilege escalation planning. Each stage feeds the next through structured output contracts, so the model's responses can be parsed deterministically rather than scraped from prose.
Most striking is the verification logic. The tool wraps command output in markers, echoing VAPTb3gin before a probe and VAPTfin after it, so the orchestrator can confirm execution with what Sysdig describes as a zero-false-positive invariant. It even treats target data as untrusted to defend its own pipeline against prompt injection from scraped content. This is not a script kiddie pasting prompts into a chatbot. It is an attacker building production-grade autonomous tooling on stolen infrastructure.
175,000 Open Doors to Borrowed Compute
The supply side of this economy is alarmingly large. Sysdig catalogued roughly 175,000 publicly exposed Ollama instances worldwide. Ollama requires no credentials by default, so a server bound to a public interface on port 11434 will answer anyone who calls it. Each of those exposed endpoints is not merely free compute to resell. It is a free, unattributed execution engine that an attacker can borrow to run offensive tooling under someone else's resource footprint.
That unattributed quality is the strategic problem. "Self-hosted model capacity is unmetered and unattributed," Sysdig notes, calling it "an AI-supply-chain exposure rather than merely a billing risk." When the brain of an attack runs on a victim's borrowed inference server, attribution and rate-limiting both break down. The attacker pays nothing, leaves a thinner trail, and the actual owner of the compute sees only elevated usage and an open port, not the multi-stage attack pipeline running through their hardware.
Iterated Like Software, From a Residential ISP
The campaign also behaved like a software project rather than a one-off raid. Sysdig observed an initial 8.5-hour session on June 12, followed by return sessions on June 14 across additional residential IP addresses traced to ISPs in Hyderabad, India. Between sessions, the operator added new stages, rewrote existing prompts, and incorporated the full set before returning. This is iterative development, complete with what amounts to a release cadence.
For now the actor was pointing VAPT at private practice ranges, RFC 1918 addresses and HackTheBox lab networks, which suggests active development rather than live victim targeting. We would not take much comfort from that. Researchers have long warned that a capable model handed a vulnerability description can autonomously exploit a large majority of one-day vulnerabilities, and Sysdig's capture shows the orchestration layer that turns that capability into a repeatable tool. The gap between lab practice and production use is measured in iterations, and this actor was iterating fast.
The Inference Endpoint Is Now Attack Surface
The defensive takeaway reorders enterprise priorities. An exposed Ollama or similar self-hosted inference server has always been a misconfiguration, but most teams filed it under compute waste. Sysdig's research reclassifies it as live attack surface that can power offensive operations against third parties. The immediate fixes are unglamorous and effective: bind model servers to localhost, never expose port 11434 to the internet, and require authentication through a reverse proxy.
The broader shift is mental. As organizations rush self-hosted models into production, every inference endpoint becomes part of the security perimeter, not just the AI roadmap. Monitoring for anomalous request volume and for the marker-bracketed command patterns Sysdig documented gives defenders a fighting chance. But the cleaner win is simply not standing up an unauthenticated model server on the open internet in the first place. In the agentic era, an exposed brain is a weapon, and right now far too many of them are sitting unlocked.



