Table of Contents

  1. Why Kubernetes Security Matters
  2. Pod Security Standards (Privileged, Baseline, Restricted)
  3. Network Policies for Microsegmentation
  4. RBAC Configuration and Best Practices
  5. Admission Controllers and OPA Gatekeeper
  6. Runtime Security with Falco
  7. Container Image Scanning and Supply Chain Security
  8. Secrets Management in Kubernetes
  9. Multi-Cloud Kubernetes Security: EKS, AKS, GKE
  10. Top 10 Kubernetes Security Best Practices
  11. Pod Security Standards Comparison
  12. Frequently Asked Questions

Why Kubernetes Security Matters

Kubernetes has become the de facto standard for container orchestration, powering workloads from startup microservices to the most demanding enterprise applications. However, its power and flexibility come with a sprawling attack surface that demands deliberate security engineering. According to the NSA/CISA Kubernetes Hardening Guide, misconfigured Kubernetes clusters are among the most commonly exploited infrastructure targets in both public and private sectors.

The Kubernetes threat model spans multiple layers: the control plane API, kubelet communications, inter-pod networking, container runtimes, supply chain integrity, and the workloads themselves. A single misconfigured RBAC binding or a pod running as root can be the foothold an attacker needs to move laterally through your entire cluster. I have spent years deploying production Kubernetes clusters across AWS EKS, Azure AKS, and Google GKE through my infrastructure-as-code modules at terraform-aws-eks, and the security lessons I have learned are distilled into this guide.

Security in Kubernetes is not a single tool or toggle. It is a defense-in-depth strategy that requires coordinated controls at every layer. This article walks through each critical security domain, provides working configurations, and ties everything back to real-world infrastructure deployments.

Pod Security Standards: Privileged, Baseline, and Restricted

With the deprecation of PodSecurityPolicy in Kubernetes 1.21 and its removal in 1.25, the community introduced Pod Security Standards (PSS) enforced via the built-in Pod Security Admission controller. PSS defines three profiles that represent progressively stricter security postures.

Understanding the Three Levels

The Privileged level applies no restrictions. It is intended for system-level workloads such as CNI plugins, logging agents, and storage drivers that genuinely need elevated access. The Baseline level blocks known privilege escalation vectors while remaining compatible with the vast majority of applications. The Restricted level enforces the tightest hardening posture and is the recommended target for all application workloads.

You apply Pod Security Standards at the namespace level using labels. The admission controller then enforces, audits, or warns depending on the mode you select. Most production clusters should enforce the Restricted standard on application namespaces while allowing Privileged only in dedicated system namespaces such as kube-system.

In our terraform-azure-aks module, namespace security labels are applied automatically during provisioning, ensuring that new namespaces start in a hardened state from the moment they are created. This eliminates the gap between cluster creation and security configuration that often plagues manual setups.

Applying PSS Labels to a Namespace

The following YAML demonstrates how to configure a production namespace with the Restricted standard enforced, Restricted auditing, and Baseline warnings:

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: latest
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/audit-version: latest
    pod-security.kubernetes.io/warn: baseline
    pod-security.kubernetes.io/warn-version: latest

This configuration rejects any pod that violates the Restricted standard, logs audit events for Restricted violations, and warns developers about Baseline violations. The layered approach provides visibility into workloads that would break under stricter policies before you enforce them.

Network Policies for Microsegmentation

By default, Kubernetes allows unrestricted pod-to-pod communication across the entire cluster. This means that if an attacker compromises a single pod, they can freely scan and reach every other pod, including databases, internal APIs, and control-plane components. Network Policies are the primary mechanism for implementing microsegmentation within a cluster.

How Network Policies Work

Network Policies are namespace-scoped Kubernetes resources that define ingress and egress rules for pods selected by label. They require a CNI plugin that supports NetworkPolicy enforcement, such as Calico, Cilium, or Antrea. Without a compatible CNI, NetworkPolicy resources are accepted by the API server but silently ignored, creating a false sense of security.

The fundamental principle is default-deny: once any Network Policy selects a pod, all traffic not explicitly allowed by a policy is blocked. This is the opposite of the default Kubernetes behavior, so the very first policy you should apply to any namespace is a blanket deny-all rule.

Default Deny and Selective Allow Policies

The following YAML implements a default deny-all ingress policy followed by a selective allow policy for a typical three-tier application:

# Default deny all ingress traffic in the namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Ingress
---
# Default deny all egress traffic in the namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-egress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Egress
---
# Allow frontend to receive traffic from ingress controller
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-ingress
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: frontend
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: ingress-nginx
          podSelector:
            matchLabels:
              app.kubernetes.io/name: ingress-nginx
      ports:
        - protocol: TCP
          port: 8080
---
# Allow backend to receive traffic only from frontend
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-backend-from-frontend
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: frontend
      ports:
        - protocol: TCP
          port: 3000
---
# Allow backend egress to database only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-backend-to-database
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
    - Egress
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: database
      ports:
        - protocol: TCP
          port: 5432
    - to:  # Allow DNS resolution
        - namespaceSelector: {}
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53

Notice the DNS egress rule in the last policy. This is a commonly overlooked requirement. Without DNS egress, pods cannot resolve service names, and applications fail in ways that are difficult to debug. Always include DNS as an allowed egress target when applying deny-all egress policies.

RBAC Configuration and Best Practices

Role-Based Access Control (RBAC) governs who can perform which actions on which resources within your Kubernetes cluster. RBAC is enabled by default in all modern Kubernetes distributions, but a default-enabled feature is only as good as its configuration. Poorly configured RBAC is functionally equivalent to no RBAC at all.

Roles, ClusterRoles, and Bindings

Kubernetes RBAC consists of four primary objects: Role (namespace-scoped permissions), ClusterRole (cluster-scoped permissions), RoleBinding (binds a Role or ClusterRole to subjects within a namespace), and ClusterRoleBinding (binds a ClusterRole to subjects across the entire cluster). The golden rule is to always prefer the most narrow scope possible: use Roles over ClusterRoles and RoleBindings over ClusterRoleBindings wherever possible.

Service Account Hardening

Every pod in Kubernetes runs under a service account. If no service account is specified, Kubernetes assigns the default service account in the namespace, which may have more permissions than your workload needs. Best practice demands creating dedicated service accounts per application and disabling automatic token mounting when the application does not need to interact with the Kubernetes API.

The Kubernetes Security documentation strongly recommends treating service account tokens as sensitive credentials, rotating them regularly, and using projected service account tokens with short expiration times rather than legacy non-expiring tokens.

Audit RBAC Regularly

RBAC configurations tend to accumulate cruft over time. Teams add permissions for debugging sessions and never revoke them. Developers leave the organization but their bindings persist. Implement automated RBAC auditing using tools like kubectl-who-can, rakkess, or Fairwinds Insights to continuously evaluate the effective permissions in your cluster and flag overly permissive configurations.

Admission Controllers and OPA Gatekeeper

Admission controllers intercept requests to the Kubernetes API server after authentication and authorization but before the object is persisted to etcd. They are the last line of defense for enforcing policy before a resource is created. Kubernetes ships with several built-in admission controllers (such as the Pod Security Admission controller discussed earlier), but OPA Gatekeeper extends this capability with custom, declarative policies.

OPA Gatekeeper Architecture

Gatekeeper operates as a validating webhook admission controller. It consists of two custom resource types: ConstraintTemplate (defines the policy logic in Rego) and Constraint (applies the template to specific resources with parameters). This separation allows platform teams to write reusable policy templates while allowing individual teams to customize enforcement parameters.

OPA Gatekeeper Constraint Template Example

The following ConstraintTemplate and Constraint enforce that all containers must specify CPU and memory resource limits, preventing noisy-neighbor problems and protecting cluster stability:

# ConstraintTemplate: require resource limits on all containers
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8srequiredresources
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredResources
      validation:
        openAPIV3Schema:
          type: object
          properties:
            exemptImages:
              type: array
              items:
                type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredresources

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not container.resources.limits.cpu
          msg := sprintf("Container '%v' must specify cpu limits", [container.name])
        }

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not container.resources.limits.memory
          msg := sprintf("Container '%v' must specify memory limits", [container.name])
        }

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not container.resources.requests.cpu
          msg := sprintf("Container '%v' must specify cpu requests", [container.name])
        }

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not container.resources.requests.memory
          msg := sprintf("Container '%v' must specify memory requests", [container.name])
        }
---
# Constraint: apply the template to all pods
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredResources
metadata:
  name: require-resource-limits
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    excludedNamespaces:
      - kube-system
      - gatekeeper-system
  parameters:
    exemptImages:
      - "registry.k8s.io/pause:*"

This policy blocks any pod deployment that lacks resource requests or limits on any container. The excludedNamespaces field ensures that system components are not blocked, while the exemptImages parameter allows specific images (like the pause container) to bypass the check.

Runtime Security with Falco

Admission controllers and policies prevent bad configurations from entering the cluster, but they cannot detect runtime threats: a legitimate container that starts behaving maliciously after deployment. This is where runtime security tools like Falco come in.

How Falco Works

Falco, a CNCF graduated project, monitors system calls at the Linux kernel level using eBPF probes (or a kernel module on older systems). It evaluates every syscall against a rules engine and generates alerts when it detects anomalous behavior such as a shell being spawned in a container, unexpected file writes to sensitive paths, outbound network connections to unknown endpoints, or privilege escalation attempts.

Key Detection Scenarios

Falco excels at detecting behaviors that static analysis and admission control cannot catch. These include cryptomining (sudden CPU consumption and connections to mining pools), reverse shells (interactive shell processes spawned by web servers), data exfiltration (unexpected DNS queries or outbound connections), lateral movement (attempts to access the Kubernetes API or cloud metadata service), and container escape (mount namespace manipulation or ptrace usage).

When deployed alongside our terraform-aws-auto-healing-eks module, Falco alerts can trigger automated remediation through the auto-healing pipeline, isolating compromised nodes and replacing them with clean instances within minutes.

Container Image Scanning and Supply Chain Security

The container image is the atomic unit of deployment in Kubernetes, and compromised images are one of the most common attack vectors. Supply chain security ensures that only trusted, scanned, and signed images run in your cluster.

Scanning Pipeline

A robust image security pipeline includes vulnerability scanning during CI (using tools like Trivy, Grype, or Snyk), image signing with Cosign and verification with Sigstore, Software Bill of Materials (SBOM) generation with Syft, admission enforcement using Kyverno or Gatekeeper to require signatures, and base image pinning to specific digests rather than mutable tags.

Image Policy Enforcement

Never allow images from public registries to run directly in production. Establish an internal registry (ECR, ACR, or Artifact Registry), mirror approved base images, scan them on import, and configure admission policies to reject any image not originating from your approved registries.

Secrets Management in Kubernetes

Kubernetes Secrets are base64-encoded, not encrypted. By default, they are stored unencrypted in etcd. This means anyone with access to the etcd datastore can read every secret in the cluster. Proper secrets management requires multiple layers of protection.

Encryption at Rest

Enable etcd encryption at rest using the EncryptionConfiguration resource. Managed Kubernetes services like EKS, AKS, and GKE support envelope encryption with cloud KMS keys. Our terraform-gcp-gke module configures Application-layer Secrets Encryption using Google Cloud KMS by default, ensuring that secrets are encrypted with customer-managed keys from the moment the cluster is provisioned.

External Secrets Operators

For the strongest security posture, avoid Kubernetes Secrets entirely and inject secrets at runtime from external vaults. The External Secrets Operator (ESO) synchronizes secrets from AWS Secrets Manager, Azure Key Vault, Google Secret Manager, or HashiCorp Vault into Kubernetes Secrets, with automatic rotation and least-privilege IAM policies.

Multi-Cloud Kubernetes Security: EKS, AKS, GKE

Each managed Kubernetes service has unique security features and default configurations. Understanding these differences is critical for organizations operating multi-cloud environments.

AWS EKS Security

EKS integrates with IAM for authentication via the aws-iam-authenticator and supports IAM Roles for Service Accounts (IRSA) for pod-level AWS permissions. Enable the EKS control plane logging for all log types (api, audit, authenticator, controllerManager, scheduler), use managed node groups with custom launch templates that enforce encrypted EBS volumes, and configure security groups to restrict node-to-node and node-to-control-plane traffic.

Azure AKS Security

AKS integrates with Azure Active Directory for authentication, supports Azure RBAC for Kubernetes authorization, and provides Azure Policy for Kubernetes (powered by Gatekeeper) as a first-class feature. Enable Defender for Containers for vulnerability assessment and runtime protection, use authorized IP ranges for API server access, and deploy private clusters with Private Link.

GKE Security

GKE offers the most opinionated security defaults, including Shielded GKE nodes, Workload Identity for pod-to-GCP-service authentication, Binary Authorization for image signing enforcement, and Container-Optimized OS nodes. GKE Autopilot enforces security best practices by default, making it difficult to deploy insecure workloads.

Top 10 Kubernetes Security Best Practices

  1. Enforce Pod Security Standards at the namespace level. Apply the Restricted standard to all application namespaces and reserve Privileged only for system namespaces that genuinely need elevated access.
  2. Implement default-deny Network Policies. Start with deny-all ingress and egress, then selectively allow only the traffic your applications require. Always remember to allow DNS egress.
  3. Follow the principle of least privilege for RBAC. Use namespace-scoped Roles and RoleBindings over ClusterRoles and ClusterRoleBindings. Avoid wildcard permissions and audit RBAC regularly.
  4. Deploy OPA Gatekeeper or Kyverno for policy enforcement. Encode your organization's security requirements as admission policies that automatically reject non-compliant resources.
  5. Run Falco or a similar runtime security tool. Static policies cannot detect runtime threats. Monitor system calls to detect container escapes, reverse shells, and data exfiltration.
  6. Scan images in CI/CD and enforce signatures. Never deploy unscanned images. Use Cosign for signing, Trivy for scanning, and admission policies to enforce both.
  7. Encrypt secrets at rest and use external vaults. Enable etcd encryption, use cloud KMS for envelope encryption, and consider the External Secrets Operator for runtime secret injection.
  8. Disable automounting of service account tokens. Set automountServiceAccountToken: false on pods and service accounts unless the workload explicitly needs Kubernetes API access.
  9. Run containers as non-root with read-only file systems. Set runAsNonRoot: true, readOnlyRootFilesystem: true, and drop all Linux capabilities except those explicitly needed.
  10. Audit and rotate credentials continuously. Enable Kubernetes audit logging, ship logs to a SIEM, rotate service account tokens, and review RBAC bindings at least quarterly.

Pod Security Standards Comparison

Feature / Control Privileged Baseline Restricted
HostProcess (Windows) Allowed Disallowed Disallowed
Host Namespaces (PID, IPC, Network) Allowed Disallowed Disallowed
Privileged Containers Allowed Disallowed Disallowed
Linux Capabilities Unrestricted Drop NET_RAW (unless added) Drop ALL, allow only NET_BIND_SERVICE
HostPath Volumes Allowed Disallowed Disallowed
Host Ports Allowed Restricted range (known ports) Disallowed
AppArmor Profile Unrestricted Not localhost or unconfined Must be RuntimeDefault or localhost
SELinux Context Unrestricted Type limited to well-known set Type limited to well-known set
Seccomp Profile Unrestricted Unrestricted Must be RuntimeDefault or Localhost
Run As Non-Root Not required Not required Required (runAsNonRoot: true)
Privilege Escalation Allowed Allowed Disallowed (allowPrivilegeEscalation: false)
Volume Types All All except hostPath Limited to core volume types only
Recommended Use Case System-level workloads (CNI, CSI, logging) General purpose, non-sensitive workloads Security-sensitive, production workloads

Frequently Asked Questions

What are the three Pod Security Standards in Kubernetes?

Kubernetes defines three Pod Security Standards: Privileged (unrestricted, for system-level workloads), Baseline (minimally restrictive, prevents known privilege escalations), and Restricted (heavily restricted, follows hardening best practices). These standards replaced the deprecated PodSecurityPolicy in Kubernetes 1.25+.

How do Kubernetes Network Policies work?

Kubernetes Network Policies are namespace-scoped resources that control traffic flow between pods. They use label selectors to define ingress and egress rules, and require a CNI plugin (like Calico or Cilium) that supports NetworkPolicy enforcement. By default, all traffic is allowed; once a policy selects a pod, only explicitly allowed traffic is permitted.

What is OPA Gatekeeper and how does it secure Kubernetes?

OPA Gatekeeper is a Kubernetes admission controller that enforces policies written in Rego. It intercepts API requests and evaluates them against ConstraintTemplates and Constraints, rejecting resources that violate policies such as requiring resource limits, blocking privileged containers, or enforcing image registries.

How does Falco provide runtime security for Kubernetes?

Falco is a CNCF runtime security tool that monitors system calls at the kernel level using eBPF or a kernel module. It detects anomalous behavior in containers such as shell spawning, unexpected file access, network connections, and privilege escalation attempts, then generates alerts for incident response.

What RBAC best practices should I follow in Kubernetes?

Key RBAC best practices include: use the principle of least privilege, prefer RoleBindings over ClusterRoleBindings, avoid wildcard permissions, regularly audit RBAC configurations, use service accounts per workload rather than the default, disable automounting of service account tokens when not needed, and integrate with external identity providers via OIDC.

KO

Kehinde Ogunlowo

Principal Multi-Cloud DevSecOps Architect at Citadel Cloud Management. Specializing in infrastructure-as-code, Kubernetes security, and multi-cloud architecture across AWS, Azure, and GCP.

GitHub · LinkedIn · Website

Secure Your Kubernetes Clusters with Production-Ready Terraform Modules

Explore my open-source Terraform modules for deploying hardened EKS, AKS, and GKE clusters with security best practices built in from day one.

Explore Modules on GitHub