October 26, 2025
AI workloads amplify Kubernetes’ flexibility—and its failure modes. Integrity requires controls that understand models, data paths, and runtime drift, not just pods and namespaces. Push identity down with workload-bound credentials; push policy up with context from model criticality and data sensitivity. Watch the gray zones: GPU device plugins, sidecar sprawl, and egress to model registries. Least privilege, immutable images, and runtime enforcement are table stakes; without AI-aware guardrails, the blast radius grows silently. If latency budgets reject CPU hooks, pivot to eBPF plus network policy—not “trust me” exemptions.
Threats unique to GenAI on K8s.
- Model tampering vs. rollout drift: A “clean” image does not guarantee an unmodified model or tokenizer at runtime. Pin digests for model artifacts and verify on start; measure at runtime with attestation (e.g., signature or hash check in init).
- Shadow egress to feature stores/model hubs: Fine-grained egress is mandatory—differentiate telemetry from training data movement. Enforce DNS-aware and CIDR-scoped policies; block wildcard artifact pulls.
- GPU pathway blind spots: Device plugins and drivers sit outside typical policy engines. Treat them as part of your TCB: control versions, hash drivers, and restrict privileged mounts.
- Prompt/response leakage: Sidecars that proxy tokens or redact content become high-value targets; isolate and audit them as first-class security components.
Architecture that scales safety.
- Workload identity as default. No long-lived secrets in env vars. Bind short-lived credentials (federated identity) to the service account, limited to exactly the model and data scopes needed.
- Policy informed by data classification. Labels like
model=payment-fraud; sensitivity=highmust influence not only cluster policy (egress, volumes) but also delivery (node placement, encryption). - Immutable by design. Build images that contain models at pinned digests; forbid runtime
pip installorgit clone. Any change equals a new build. - Runtime enforcement without fragility. Where syscall hooks are too expensive, prefer eBPF visibility with network and process policy; couple with rate-limited inline responses (quarantine, restart) rather than delete-and-pray.
Operations that prove intent.
- Pre-flight attestations (image + model + config) block drift.
- CI/CD signs artifacts; the cluster verifies before admission.
- Observability correlates prompts, egress, and GPU utilization per Service to detect abuse without harvesting payloads.
Security enables safety. When identity is anchored in workloads and policy flows from business context, GenAI on Kubernetes becomes predictable—and predictability is what we ultimately trust.