Palo Alto Networks researchers recently disclosed a privilege escalation vulnerability in Google Cloud's Vertex AI SDK for Python that allows unauthenticated attackers to hijack machine learning model uploads and execute code within Google's serving infrastructure. The flaw, termed "Pickle in the Middle," exploits a fundamental assumption in how the SDK resolves storage bucket names during model deployment.
How the Attack Works
The vulnerability hinges on bucket naming conventions and resolution order. When a developer uploads a model to Vertex AI, the SDK constructs a bucket identifier based on the project ID and a predictable naming scheme. An attacker can register a publicly accessible Google Cloud Storage (GCS) bucket with a name that matches what the victim's SDK will attempt to use, effectively occupying the namespace before the legitimate bucket is created in the target project.
Once the attacker controls that bucket, they can place a malicious Python pickle file in it. When the victim's SDK attempts to upload the model, it deserializes the attacker's pickle object instead of the intended model file. Python's pickle module is known to execute arbitrary code during deserialization, giving the attacker code execution within the model serving environment—a privileged context with access to the underlying GCP infrastructure.
The critical detail is that this attack requires no prior access to the victim's GCP project, no compromised credentials, and no network interception. The vulnerability exists purely in the SDK's logic for resolving where to place the model artefact.
Why This Matters for Infrastructure Teams
This class of flaw—namespace collision or "bucket squatting"—is not new in cloud environments, but its application to machine learning pipelines exposes a gap in the SDK's trust model. ML model serving often runs in highly privileged contexts: it may have permissions to read sensitive data, write to other storage buckets, or trigger downstream workloads. An attacker gaining code execution there gains a foothold into the infrastructure, potentially allowing lateral movement or data exfiltration.
For organisations running Vertex AI in production, the immediate concern is whether any existing deployments have been compromised. Palo Alto's disclosure states that no active exploitation was observed in the wild, but the vulnerability existed for an extended period before discovery, meaning threat actors may have had time to probe or exploit it before patch availability.
Mitigation and Best Practice
Google has released a patched version of the SDK, and all users should upgrade immediately. Beyond that, a few defensive layers are worth implementing. First, enforce strict Identity and Access Management (IAM) policies that limit who can create GCS buckets in your projects—this reduces the window during which an attacker could squat a namespace. Second, use Artifact Registry or other managed artefact storage with explicit project scoping rather than relying on free-form bucket naming.
For teams already relying on Vertex AI, audit your model upload workflows to ensure they're using the latest SDK version. If your CI/CD pipelines or model training jobs are locked to older versions, flag them for update. Additionally, consider enabling Cloud Audit Logs for all GCS operations; while this won't prevent the attack, it will surface suspicious bucket access patterns if exploitation has occurred.
The underlying lesson extends beyond Vertex AI: any cloud SDK that makes assumptions about where artefacts should live or how they'll be named introduces risk if that resolution can be hijacked. Whether you're deploying models on GCP, managing containerised workloads, or storing configuration files, namespace collisions remain a viable attack vector in loosely coordinated multi-tenant cloud environments.
Closing Thought
Cloud SDKs abstract away the complexity of interacting with remote infrastructure, but that abstraction sometimes obscures critical assumptions about authentication and resource ownership. This flaw is a reminder that security reviews of client libraries—especially those handling sensitive workloads like ML model deployment—need to scrutinise not just the code but the fundamental trust model. Patching is necessary but not sufficient; understanding why the flaw existed in the first place informs how to design and audit similar systems in your own infrastructure.
