terraform-gcp-gke

Production-Ready Google Kubernetes Engine Module

GCPGoogle Cloud Platform Terraform Module

Executive Summary

This Terraform module provisions production-grade Google Kubernetes Engine (GKE) clusters on GCP, supporting both Autopilot and Standard cluster modes. It implements enterprise security controls including Workload Identity, Binary Authorization, Confidential Nodes, Dataplane V2 (Cilium), shielded nodes, and private cluster configurations. The module manages the full lifecycle: dedicated node service accounts with least-privilege IAM roles, configurable node pools with GPU support, cluster autoscaling (NAP), vertical pod autoscaling, maintenance windows, and integrated Cloud Logging/Monitoring.

Overview

4

Core Resources

GKE Cluster, Node Pools, Service Account, IAM Bindings

5

IAM Roles

Log Writer, Metric Writer, Monitoring Viewer, AR Reader, GCR Reader

26

Input Variables

Comprehensive configuration for networking, security, autoscaling

14

Outputs

Cluster, node pool, networking, service account details

Architecture Diagram

GKE Cluster
Standard / Autopilot
|
Control Plane
Release Channel
<-->
Private Cluster
Master Auth Networks
<-->
Workload Identity
Pool Config
|
Node Pools
Autoscaling
<-->
Node SA
Least Privilege
<-->
Shielded Nodes
Secure Boot
|
VPC Network
Subnet
-->
Pods Range
Secondary CIDR
-->
Services Range
Secondary CIDR
|
Cloud Logging
<-->
Cloud Monitoring
<-->
Maintenance Window

Component Breakdown

GKE Cluster (google_container_cluster)

Core cluster resource using google-beta provider. Supports Autopilot mode, release channels, private cluster config, master authorized networks, Dataplane V2, Binary Authorization, Confidential Nodes, VPA, and cluster-level autoscaling (NAP) with CPU/memory/GPU limits. Deletion protection is enabled by default.

Node Pools (google_container_node_pool)

Dynamically created for Standard clusters via for_each. Each pool supports configurable machine types, disk size/type, preemptible/spot instances, GPU accelerators, taints, labels, auto-repair, auto-upgrade, and shielded instance config with secure boot and integrity monitoring.

Node Service Account (google_service_account)

Dedicated service account for GKE nodes with 5 least-privilege IAM roles: logging.logWriter, monitoring.metricWriter, monitoring.viewer, artifactregistry.reader, and storage.objectViewer. Optional Workload Identity User binding.

Networking Integration

References existing VPC network and subnetwork via data sources. Configures pod and service secondary IP ranges, private cluster with optional private endpoint, and master authorized networks for API server access control.

Data Flow

Developer / CI
-->
terraform apply
-->
GCP API (google-beta)
|
Service Account Created
-->
IAM Roles Bound
-->
Cluster Provisioned
-->
Node Pools Scaled
|
Pods schedule on Nodes
-->
Workload Identity tokens
-->
GCP Services

Security Controls

ControlImplementationDefault
Private ClusterNodes have no public IPs; optional private endpoint for masterEnabled
Workload IdentityPod-to-GCP-service authentication without key filesEnabled
Dataplane V2 (Cilium)eBPF-based networking with built-in network policyEnabled
Network PolicyCalico-based (for non-Dataplane V2 clusters)Enabled
Binary AuthorizationContainer image verification before deploymentDisabled
Confidential NodesHardware-level memory encryption (AMD SEV)Disabled
Shielded NodesSecure boot + integrity monitoring on all node poolsAlways On
Master Auth NetworksCIDR-based access control to API serverConfigurable
Least-Privilege SADedicated node SA with 5 minimal IAM rolesAlways On
Legacy Endpoints DisabledMetadata server hardening via node metadataAlways On

Industry Adaptation

Financial Services

Enable Confidential Nodes + Binary Authorization + private endpoint for regulatory compliance. Use Stable release channel.

Healthcare / HIPAA

Private cluster with master authorized networks. Enable VPC Flow Logs on subnets. Use CMEK for etcd encryption.

E-Commerce

Autopilot mode for cost optimization. Enable cluster autoscaling (NAP) with spot instances for batch workloads.

AI / ML

Standard cluster with GPU node pools (gpu_type, gpu_count). Use Rapid release channel for latest K8s features.

Production Readiness Checklist

Configuration Reference

Key Input Variables

VariableTypeDefaultDescription
project_idstringrequiredGCP project ID
cluster_namestringrequiredName of the GKE cluster
regionstringrequiredGCP region
networkstringrequiredVPC network name
subnetworkstringrequiredSubnetwork name
enable_autopilotboolfalseEnable Autopilot mode
enable_private_clusterbooltrueEnable private cluster
enable_workload_identitybooltrueEnable Workload Identity
enable_dataplane_v2booltrueDataplane V2 (Cilium)
release_channelstringREGULARRelease channel
node_poolslist(object)[default-pool]Node pool configurations

Key Outputs

OutputDescriptionSensitive
cluster_idUnique identifier of the clusterNo
cluster_endpointIP address of the cluster masterYes
cluster_ca_certificateBase64-encoded CA certificateYes
node_service_account_emailNode service account emailNo
workload_identity_poolWorkload Identity pool identifierNo

Deployment

# Initialize and deploy
terraform init
terraform plan -out=tfplan
terraform apply tfplan

# Get cluster credentials
gcloud container clusters get-credentials $(terraform output -raw cluster_name) \
  --region $(terraform output -raw cluster_location) \
  --project YOUR_PROJECT_ID

Links

GKE Documentation | Terraform Registry | GitHub Repository | Autopilot Overview | Cluster Hardening