GKE Autopilot with Terraform: Serverless Kubernetes on Google Cloud

GKE Autopilot Terraform Google Cloud Kubernetes Gateway API Workload Identity

Why GKE Autopilot for Serverless Kubernetes

Google Kubernetes Engine Autopilot represents a fundamental shift in how organizations consume Kubernetes. Rather than managing node pools, configuring autoscalers, and patching operating systems, Autopilot delegates all infrastructure decisions to Google. You define your workloads, and GKE provisions exactly the right compute resources to run them. The result is a serverless Kubernetes experience that maintains full API compatibility while eliminating undifferentiated operational overhead.

For platform engineering teams running multi-cloud environments, GKE Autopilot offers a compelling value proposition. The per-pod billing model means you pay only for the resources your containers actually request, not for idle VM capacity. Security hardening is enforced by default: container-optimized OS, shielded nodes, workload identity, and automatic node upgrades are all non-negotiable. This aligns perfectly with zero-trust principles that enterprise organizations demand.

In this guide, I walk through deploying a production-grade GKE Autopilot cluster using Terraform, covering VPC networking, workload identity federation, Gateway API for traffic management, and binary authorization for supply chain security. The infrastructure code follows the modular patterns I use at Citadel Cloud Management for client engagements across Google Cloud, AWS, and Azure.

GKE Standard vs GKE Autopilot vs Cloud Run Comparison

Choosing the right Google Cloud compute platform depends on your workload characteristics, team expertise, and operational requirements. The following comparison table breaks down the key differences between GKE Standard, GKE Autopilot, and Cloud Run to help you make an informed decision.

Feature GKE Standard GKE Autopilot Cloud Run
Node Management User-managed node pools Fully managed by Google Fully serverless, no nodes
Billing Model Per VM (node), regardless of utilization Per pod resource request (CPU, memory, ephemeral storage) Per request + CPU/memory per 100ms
Kubernetes API Access Full API access Full API with some restrictions No Kubernetes API
DaemonSets Fully supported Allowed partner DaemonSets only Not applicable
Privileged Containers Supported Not allowed by default Not applicable
GPU Workloads Full GPU support Supported (NVIDIA L4, A100, H100) GPU support (limited)
Minimum Resources No minimum per pod 250m CPU, 512Mi memory per container No minimum
Scaling Speed Minutes (node provisioning) Seconds to minutes Seconds (scale to zero)
Workload Identity Optional, configurable Mandatory, always enabled Built-in service identity
Security Hardening User responsibility Enforced by Google Managed by Google
Best For Custom infrastructure, special requirements Production Kubernetes without node ops Stateless HTTP services, event-driven

GKE Autopilot is the ideal middle ground for teams that need the full power of the Kubernetes API but want to eliminate node management overhead. If your workloads require privileged containers, custom kernel modules, or DaemonSets beyond the allowed partner list, GKE Standard remains the appropriate choice. For simple stateless HTTP services that benefit from scale-to-zero, Cloud Run provides the simplest operational model.

Architecture Overview and Prerequisites

The architecture we deploy spans a VPC with secondary IP ranges for pods and services, a private GKE Autopilot cluster with authorized networks, workload identity federation for secure GCP API access, Gateway API for L7 traffic routing, and binary authorization for image verification. All resources are defined as Terraform modules to enable reuse across environments.

Prerequisites

Before deploying, ensure you have the following in place:

  • Google Cloud project with billing enabled
  • Terraform 1.5+ installed locally or in your CI/CD pipeline
  • Google Cloud SDK (gcloud) configured with appropriate permissions
  • APIs enabled: container.googleapis.com, compute.googleapis.com, binaryauthorization.googleapis.com
  • IAM roles: roles/container.admin, roles/compute.networkAdmin, roles/iam.serviceAccountAdmin

The Terraform modules referenced in this guide are available in my GitHub repositories: terraform-gcp-gke for the cluster configuration, terraform-gcp-vpc-network for the VPC foundation, and terraform-gcp-iam for IAM bindings and workload identity setup.

VPC Network Foundation with Terraform

GKE Autopilot requires a VPC-native cluster configuration with secondary IP ranges for pods and services. Proper network planning is essential because the pod and service CIDR ranges cannot be changed after cluster creation. For production deployments, I recommend using a Shared VPC to centralize network management while granting service projects access to deploy GKE clusters.

Secondary IP Range Planning for Google Kubernetes Engine

The pod IP range determines the maximum number of pods your cluster can support. GKE Autopilot uses /17 as the default pod range, supporting up to 32,768 pod IPs. The service range defaults to /22, providing 1,024 service IPs. For large-scale deployments, plan your IP ranges carefully to avoid conflicts with other VPCs connected via VPC Peering or Cloud Interconnect.

A well-structured VPC module from my terraform-gcp-vpc-network repository handles subnet creation, secondary ranges, Cloud NAT for egress, and firewall rules. The network foundation is intentionally separated from the GKE cluster module so that networking changes do not trigger cluster recreation.

Private Cluster Networking

Production GKE clusters should always use private nodes with no public IP addresses. The control plane endpoint can be configured as private, public, or public with authorized networks. For most enterprise deployments, I recommend a private endpoint with authorized networks that include your CI/CD runners, VPN gateways, and developer workstations. Cloud NAT provides outbound internet access for nodes that need to pull container images from public registries, though Artifact Registry with VPC Service Controls eliminates this need for internal images.

Deploying GKE Autopilot with Terraform

The following Terraform configuration deploys a production-grade GKE Autopilot cluster with Gateway API enabled, workload identity federation, private networking, and binary authorization. This configuration reflects the patterns used in my terraform-gcp-gke module.

# providers.tf
terraform {
  required_version = ">= 1.5.0"
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.20"
    }
    google-beta = {
      source  = "hashicorp/google-beta"
      version = "~> 5.20"
    }
  }
  backend "gcs" {
    bucket = "your-terraform-state-bucket"
    prefix = "gke-autopilot"
  }
}

# variables.tf
variable "project_id" {
  description = "GCP project ID"
  type        = string
}

variable "region" {
  description = "GCP region for the GKE cluster"
  type        = string
  default     = "us-central1"
}

variable "cluster_name" {
  description = "Name of the GKE Autopilot cluster"
  type        = string
  default     = "autopilot-prod"
}

variable "network_name" {
  description = "VPC network name"
  type        = string
}

variable "subnet_name" {
  description = "Subnet name for GKE nodes"
  type        = string
}

# vpc.tf - VPC with secondary ranges for GKE
resource "google_compute_network" "vpc" {
  name                    = var.network_name
  project                 = var.project_id
  auto_create_subnetworks = false
  routing_mode            = "REGIONAL"
}

resource "google_compute_subnetwork" "gke_subnet" {
  name          = var.subnet_name
  project       = var.project_id
  region        = var.region
  network       = google_compute_network.vpc.id
  ip_cidr_range = "10.0.0.0/20"

  secondary_ip_range {
    range_name    = "pods"
    ip_cidr_range = "10.4.0.0/14"
  }

  secondary_ip_range {
    range_name    = "services"
    ip_cidr_range = "10.8.0.0/20"
  }

  private_ip_google_access = true

  log_config {
    aggregation_interval = "INTERVAL_5_SEC"
    flow_sampling        = 0.5
    metadata             = "INCLUDE_ALL_METADATA"
  }
}

# Cloud NAT for outbound connectivity
resource "google_compute_router" "router" {
  name    = "${var.cluster_name}-router"
  project = var.project_id
  region  = var.region
  network = google_compute_network.vpc.id
}

resource "google_compute_router_nat" "nat" {
  name                               = "${var.cluster_name}-nat"
  project                            = var.project_id
  router                             = google_compute_router.router.name
  region                             = var.region
  nat_ip_allocate_option             = "AUTO_ONLY"
  source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"

  log_config {
    enable = true
    filter = "ERRORS_ONLY"
  }
}

# gke.tf - GKE Autopilot cluster with Gateway API
resource "google_container_cluster" "autopilot" {
  provider = google-beta

  name     = var.cluster_name
  project  = var.project_id
  location = var.region

  # Enable Autopilot mode
  enable_autopilot = true

  # Networking configuration
  network    = google_compute_network.vpc.id
  subnetwork = google_compute_subnetwork.gke_subnet.id

  ip_allocation_policy {
    cluster_secondary_range_name  = "pods"
    services_secondary_range_name = "services"
  }

  # Private cluster configuration
  private_cluster_config {
    enable_private_nodes    = true
    enable_private_endpoint = false
    master_ipv4_cidr_block  = "172.16.0.0/28"
  }

  # Authorized networks for control plane access
  master_authorized_networks_config {
    cidr_blocks {
      cidr_block   = "10.0.0.0/8"
      display_name = "Internal VPC"
    }
    cidr_blocks {
      cidr_block   = "YOUR_CI_CD_IP/32"
      display_name = "CI/CD Pipeline"
    }
  }

  # Release channel for automatic upgrades
  release_channel {
    channel = "REGULAR"
  }

  # Gateway API configuration
  gateway_api_config {
    channel = "CHANNEL_STANDARD"
  }

  # Binary authorization
  binary_authorization {
    evaluation_mode = "PROJECT_SINGLETON_POLICY_ENFORCE"
  }

  # Workload identity (always enabled in Autopilot)
  workload_identity_config {
    workload_pool = "${var.project_id}.svc.id.goog"
  }

  # Security posture configuration
  security_posture_config {
    mode               = "BASIC"
    vulnerability_mode = "VULNERABILITY_BASIC"
  }

  # DNS configuration
  dns_config {
    cluster_dns        = "CLOUD_DNS"
    cluster_dns_scope  = "CLUSTER_SCOPE"
    cluster_dns_domain = "cluster.local"
  }

  # Logging and monitoring
  logging_config {
    enable_components = [
      "SYSTEM_COMPONENTS",
      "WORKLOADS"
    ]
  }

  monitoring_config {
    enable_components = [
      "SYSTEM_COMPONENTS",
      "STORAGE",
      "POD",
      "DEPLOYMENT",
      "STATEFULSET"
    ]
    managed_prometheus {
      enabled = true
    }
  }

  # Maintenance window
  maintenance_policy {
    recurring_window {
      start_time = "2026-01-01T09:00:00Z"
      end_time   = "2026-01-01T17:00:00Z"
      recurrence = "FREQ=WEEKLY;BYDAY=SA,SU"
    }
  }

  # Resource labels
  resource_labels = {
    environment = "production"
    managed_by  = "terraform"
    team        = "platform"
  }

  deletion_protection = true
}

# Outputs
output "cluster_name" {
  value = google_container_cluster.autopilot.name
}

output "cluster_endpoint" {
  value     = google_container_cluster.autopilot.endpoint
  sensitive = true
}

output "cluster_ca_certificate" {
  value     = google_container_cluster.autopilot.master_auth[0].cluster_ca_certificate
  sensitive = true
}

Several elements of this configuration deserve attention. The enable_autopilot = true flag is what differentiates this from a Standard cluster. The google-beta provider is required for Gateway API and some monitoring features. The release_channel is set to REGULAR, providing a balance between new features and stability. For more conservative environments, use STABLE.

Understanding Autopilot Resource Classes

GKE Autopilot introduced compute classes to give workloads access to different hardware profiles. The general-purpose class runs on E2 or N2 machines and suits most workloads. The scale-out class uses T2A ARM instances for cost-optimized batch processing. The accelerator class provisions GPU-attached nodes for machine learning inference. Specify the compute class in your pod spec using the cloud.google.com/compute-class node selector.

Workload Identity Federation Configuration

Workload identity federation eliminates the need for service account key files by allowing Kubernetes service accounts to impersonate Google Cloud IAM service accounts. On GKE Autopilot, workload identity is always enabled and cannot be disabled, which is an intentional security design decision.

The configuration requires three components: a Google Cloud IAM service account, an IAM binding that grants the Kubernetes service account permission to impersonate it, and a Kubernetes service account annotation that references the IAM service account. My terraform-gcp-iam module handles this setup with a reusable pattern.

Configuring Workload Identity for Application Pods

When a pod uses a Kubernetes service account annotated with workload identity, the GKE metadata server intercepts calls to the instance metadata endpoint and returns short-lived OAuth2 tokens scoped to the IAM service account. This approach is significantly more secure than distributing JSON key files, which cannot be automatically rotated and are frequently leaked in source control.

The principle of least privilege is critical here. Each microservice should have its own IAM service account with only the permissions it needs. A payment processing service might need roles/cloudkms.cryptoKeyEncrypterDecrypter and roles/secretmanager.secretAccessor, while a frontend service might only need roles/storage.objectViewer to serve static assets from Cloud Storage.

Gateway API for Advanced Traffic Routing

The Kubernetes Gateway API is the successor to Ingress, providing a more expressive, extensible, and role-oriented model for traffic management. GKE implements Gateway API through Google Cloud Load Balancers, giving you enterprise-grade traffic routing with native Kubernetes resources.

Gateway API vs Ingress on Google Kubernetes Engine

Gateway API introduces several concepts that Ingress lacks. The Gateway resource defines the load balancer and its listeners, owned by cluster operators. HTTPRoute resources define routing rules and are owned by application teams. This separation of concerns maps cleanly to platform engineering team structures where infrastructure teams manage gateways and application teams manage their own routes.

Key advantages of Gateway API on GKE include traffic splitting for canary deployments, header-based routing for A/B testing, URL rewrites and redirects, cross-namespace routing, and TLS passthrough. These capabilities previously required third-party ingress controllers like NGINX or Istio.

Connecting to Artifact Registry for Container Images

GKE Autopilot clusters pull container images from Artifact Registry without additional authentication when the cluster and registry are in the same project. For cross-project access, configure the GKE service account with roles/artifactregistry.reader on the registry project. Artifact Registry integrates with binary authorization for image signing verification, closing the loop on supply chain security.

Binary Authorization for Supply Chain Security

Binary Authorization is a deploy-time security control that ensures only trusted container images run on your GKE cluster. It uses attestations signed by trusted authorities to verify that images have passed through your CI/CD pipeline and meet your security policies before they are allowed to deploy.

How Binary Authorization Works with GKE Autopilot

The binary authorization flow begins in your CI/CD pipeline. After building a container image and pushing it to Artifact Registry, a vulnerability scan runs via Container Analysis. If the scan passes your defined threshold, an attestor signs the image digest with a Cloud KMS key. When the image is deployed to GKE, the binary authorization admission controller verifies the attestation exists and is valid before allowing the pod to start.

This chain of trust ensures that images cannot bypass your security pipeline. Even if an attacker gains access to your Artifact Registry, they cannot deploy unsigned images to clusters with binary authorization enforced. The evaluation mode PROJECT_SINGLETON_POLICY_ENFORCE in the Terraform configuration above activates this enforcement.

For development environments, you can configure binary authorization in MONITORING mode to log violations without blocking deployments. This gives teams time to integrate image signing into their pipelines before enforcement is activated.

Best Practices for GKE Autopilot in Production

After deploying GKE Autopilot clusters for multiple enterprise clients, I have consolidated the following best practices that consistently reduce operational incidents and optimize cost.

  1. Right-size resource requests carefully. Autopilot bills based on pod resource requests, not actual usage. Over-requesting wastes money. Under-requesting causes OOM kills and CPU throttling. Use Vertical Pod Autoscaler in recommendation mode to identify optimal requests, then set them explicitly in your manifests.
  2. Use the REGULAR release channel for production. The REGULAR channel balances feature availability with stability. RAPID exposes you to newer but less-tested versions. STABLE trails significantly behind. Set maintenance windows during off-peak hours and use PodDisruptionBudgets to maintain availability during node upgrades.
  3. Implement namespace-level resource quotas. Even though Autopilot manages node capacity, resource quotas prevent individual teams from over-provisioning. Define CPU, memory, and pod count quotas per namespace aligned with your cost allocation model.
  4. Enable GKE Managed Prometheus for observability. Google Cloud Managed Prometheus provides a fully managed, cost-effective metrics backend compatible with PromQL. It integrates with Cloud Monitoring dashboards and alerting, eliminating the need to run Prometheus and Thanos infrastructure.
  5. Use Cloud DNS for GKE instead of kube-dns. The dns_config block in the Terraform configuration enables Cloud DNS, which provides better performance and reliability than the in-cluster kube-dns deployment. It also reduces pod resource consumption since Cloud DNS runs outside the cluster.
  6. Enforce network policies from day one. GKE Autopilot supports Kubernetes NetworkPolicy resources. Define default-deny ingress policies per namespace, then explicitly allow traffic between services. This prevents lateral movement in the event of a container compromise.
  7. Leverage compute classes for cost optimization. Assign the scale-out compute class with ARM-based T2A instances for batch workloads, data processing, and CI/CD runners. These instances cost approximately 20% less than comparable x86 instances while delivering equivalent performance for most workloads.
  8. Configure backup for GKE with Backup for GKE. Enable the Backup for GKE service to create scheduled backups of cluster state and persistent volumes. Define backup plans per namespace with appropriate retention policies and test restores regularly to validate your disaster recovery process.
  9. Separate environments with Fleet management. Use GKE Fleet to manage multiple Autopilot clusters across environments and regions. Fleet enables centralized policy management, multi-cluster services, and consistent configuration through Config Sync with GitOps patterns.
  10. Monitor costs with GKE cost allocation. Enable cost allocation in the GKE cluster settings to attribute compute costs to individual namespaces and workloads. Integrate with Cloud Billing export to BigQuery for detailed cost analytics and chargeback reporting.

Frequently Asked Questions

What is GKE Autopilot and how does it differ from GKE Standard?

GKE Autopilot is a fully managed Kubernetes mode where Google manages the node infrastructure, scaling, and security hardening. Unlike GKE Standard where you manage node pools and VMs, Autopilot abstracts away node management entirely, charging per pod resource request rather than per VM. It enforces security best practices by default including workload identity, shielded nodes, and container-optimized OS.

How do I deploy GKE Autopilot with Terraform?

Deploy GKE Autopilot with Terraform using the google_container_cluster resource with enable_autopilot set to true. Configure VPC-native networking with secondary IP ranges for pods and services, private cluster settings, a release channel, and workload identity. Use the google-beta provider for features like Gateway API and binary authorization. Structure your Terraform into modules for VPC, IAM, and GKE for reusability.

Does GKE Autopilot support Gateway API?

Yes, GKE Autopilot fully supports the Kubernetes Gateway API. Gateway API provides a more expressive and role-oriented approach to traffic routing compared to Ingress. Enable it via the gateway_api_config block in Terraform. GKE implements Gateway API through Google Cloud Load Balancers, supporting HTTP routing, traffic splitting, and header-based matching natively.

What are the limitations of GKE Autopilot?

GKE Autopilot has specific limitations: no SSH access to nodes, no privileged containers by default, limited DaemonSet support, restricted hostPath volumes, minimum resource requests of 250m CPU and 512Mi memory per container, no custom machine types, and limited GPU type support. These trade-offs enable the fully managed security and operations model.

How does workload identity federation work with GKE Autopilot?

Workload identity federation is enabled by default on GKE Autopilot and cannot be disabled. It maps Kubernetes service accounts to Google Cloud IAM service accounts, eliminating the need for exported service account keys. Pods authenticate to Google Cloud APIs using short-lived tokens automatically rotated by the GKE metadata server. Configure it by annotating Kubernetes service accounts with the IAM service account email and granting the workloadIdentityUser role.

About Kehinde Ogunlowo

Kehinde Ogunlowo is a Principal Multi-Cloud DevSecOps Architect at Citadel Cloud Management, specializing in enterprise cloud architecture across AWS, Azure, and Google Cloud. With deep expertise in Terraform, Kubernetes, and zero-trust security, Kehinde helps organizations build scalable, secure, and cost-effective cloud infrastructure.

GitHub | LinkedIn | Website

Need Help with GKE Autopilot or Multi-Cloud Kubernetes?

Citadel Cloud Management delivers production-grade Kubernetes platforms across Google Cloud, AWS, and Azure. From architecture design to Terraform automation, we help enterprises ship faster with confidence.

Get in Touch