Creating Production GKE Cluster with Terraform¶

In Module 2 Assignment with created project $ORG-$PRODUCT-$ENV -> $student_name-notepad-dev with custom VPC using terraform. We've also used best practices for naming conventions and stored state of the terraform in GCS backend with versioning.

In Module 3 Assignment we've create GKE Production Cluster with gcloud, this is a first good step to start automating your Infrastructure environment. Next logical step is to take those gcloud as a basis building Terraform resources.

In Module 4 we going to continue use infrastructure that we've created in Module 2 and going to create Production GKE Clusters in it.

Objective:

Create Subnet, Cloud Nat with Terraform
Create GKE Regional, Private Standard Cluster with Terraform
Delete custom Node Pool and Create Custom Node Pool with Terraform
Create GKE Autopilot Cluster with Terraform

1 Creating Production GKE Cluster¶

Prepare Lab Environment¶

This lab can be executed in you GCP Cloud Environment using Google Cloud Shell.

Open the Google Cloud Shell by clicking on the icon on the top right of the screen:

alt text

Once opened, you can use it to run the instructions for this lab.

1.1 Locate Module 4 Assignment¶

Step 1 Locate your personal Google Cloud Source Repository:

export student_name=<write_your_name_here_and_remove_brakets>

Note

Replace $student_id with your ID

cd ~/$student_name-notepad

git pull                              # Pull latest code from you repo

Step 2 Create ycit020_module4 folder from you ycit020_module2

Important

Consider to finish all tasks of Module2 before doing this step.

cp -r ycit020_module2 ycit020_module4

Step 3 Commit ycit020_module4 folder using the following Git commands:

cd ~/$student_name-notepad
git status 
git add .
git commit -m "adding documentation for ycit020 module 4 assignment"

Step 6 Once you've committed code to the local repository, add its contents to Cloud Source Repositories using the git push command:

git push origin master

1.2 Create a `user-managed` subnet with terraform¶

Using google_compute_subnetwork resource create a user-managed subnet with terraform.

Locate notepad-infrastructure folder where we going to continue creating GCP service Layer using terraform configuration:

cd ~/$student_name-notepad/ycit020_module4/notepad-infrastructure

cat <<EOF >> subnets.tf
resource "google_compute_subnetwork" "gke_standard_subnet" {
  name          = format("gke-standard-%s-%s-%s-subnet", var.org, var.product, var.environment)
  network       = google_compute_network.vpc_network.self_link
  region        = var.gcp_region
  project       = var.gcp_project_id
  ip_cidr_range = var.network_cidr
  secondary_ip_range {
    range_name    = var.pods_cidr_name
    ip_cidr_range = var.pods_cidr
  }
  secondary_ip_range {
    range_name    = var.services_cidr_name
    ip_cidr_range = var.services_cidr
  }
}
resource "google_compute_subnetwork" "gke_auto_subnet" {
  name          = format("gke-auto-%s-%s-%s-subnet", var.org, var.product, var.environment)
  network       = google_compute_network.vpc_network.self_link
  region        = var.gcp_region
  project       = var.gcp_project_id
  ip_cidr_range = var.network_auto_cidr
  secondary_ip_range {
    range_name    = var.pods_auto_cidr_name
    ip_cidr_range = var.pods_auto_cidr
  }
  secondary_ip_range {
    range_name    = var.services_auto_cidr_name
    ip_cidr_range = var.services_auto_cidr
  }
}
EOF

Note

Notice power of terraform outputs. Here we link subnet with our VPC network using google_compute_network.vpc_network.self_link output value of created network in previous step.

Define variables:

cat <<EOF >> variables.tf

# variables used to create VPC subnets

variable "network_cidr" {
  type = string
}
variable "pods_cidr" {
  type = string
}
variable "pods_cidr_name" {
  type    = string
  default = "gke-standard-pods"
}
variable "services_cidr" {
  type = string
}
variable "services_cidr_name" {
  type    = string
  default = "gke-standard-services"
}
variable "network_auto_cidr" {
  type = string
}
variable "pods_auto_cidr" {
  type = string
}
variable "pods_auto_cidr_name" {
  type    = string
  default = "gke-auto-pods"
}
variable "services_auto_cidr" {
  type = string
}
variable "services_auto_cidr_name" {
  type    = string
  default = "gke-auto-services"
}
EOF

Define outputs:

cat <<EOF >> outputs.tf
output "subnet_selflink" {
  value = "\${google_compute_subnetwork.gke_standard_subnet.self_link}"
}
output "subnet_auto_selflink" {
  value = "\${google_compute_subnetwork.gke_auto_subnet.self_link}"
}
EOF

Task N1: Update terraform.tfvars file values with following information:

Node Range: See column subnet in above table for dev cluster
GKE Standard Secondary Ranges:
- Service range CIDR: See column srv range in above table for dev cluster
- Pods range CIDR: See column pod range in above table for dev cluster
GKE Auto Secondary Ranges:
- Service range CIDR: See column srv range in above table for dev cluster
- Pods range CIDR: See column pod range in above table for dev cluster

Table 1 
Project   | Subnet Name       |     subnet       |    pod range       |    srv range        | kubectl api range
app 1 Dev | gke-standard      |   10.130.0.0/24  |     10.0.0.0/16    |     10.100.0.0/23   |     172.16.0.0/28
          | gke-auto          |   10.131.0.0/24  |     10.1.0.0/16    |     10.100.2.0/23   |     172.16.0.16/28

edit terraform.tfvars

Update the file with values according to VPC subnet design in

#gke-standard subnet vars
network_cidr       = "TODO"
pods_cidr          = "TODO"
services_cidr      = "TODO"
#gke-auto subnet vars
network_auto_cidr  = "TODO"
pods_auto_cidr     = "TODO"
services_auto_cidr = "TODO"
EOF

Review TF Plan:

terraform plan -var-file terraform.tfvars

Create VPC:

terraform apply -var-file terraform.tfvars

Review created subnet:

export ORG=$student_name

export PRODUCT=notepad
export ENV=dev

gcloud compute networks subnets list

gcloud compute networks subnets describe gke-standard-$ORG-$PRODUCT-$ENV-subnet --region us-central1
gcloud compute networks subnets describe gke-auto-$ORG-$PRODUCT-$ENV-subnet --region us-central1

Example of the Output:

enableFlowLogs: false
fingerprint: sWtjHJyqrM8=
gatewayAddress: 10.130.0.1
id: '251139681714314790'
ipCidrRange: 10.130.0.0/24
kind: compute#subnetwork
logConfig:
  enable: false
name: gke-standard-archy-notepad-dev-subnet
network: https://www.googleapis.com/compute/v1/projects/archy-notepad-dev-898/global/networks/vpc-archy-notepad-dev
privateIpGoogleAccess: false
privateIpv6GoogleAccess: DISABLE_GOOGLE_ACCESS
purpose: PRIVATE
region: https://www.googleapis.com/compute/v1/projects/archy-notepad-dev-898/regions/us-central1
secondaryIpRanges:
- ipCidrRange: 10.0.0.0/16
  rangeName: gke-standard-pods
- ipCidrRange: 10.100.0.0/23
  rangeName: gke-standard-services
selfLink: https://www.googleapis.com/compute/v1/projects/archy-notepad-dev-898/regions/us-central1/subnetworks/gke-standard-archy-notepad-dev-subnet
stackType: IPV4_ONLY

Also check in Google cloud UI:

Networking->VPC Networks -> Click VPC network and check `Subnet` tab

Task N2: Update subnets.tf so that google_compute_subnetwork resource supports following features:

* Flow Logs Configuration
  * Aggregation interval: 15 min  # reduce cost for VPC Flow logging, as by default interval is 5 second
  * Flow logs Metadata Config: EXCLUDE_ALL_METADATA 
* Private IP Google Access

Hint

Use google_compute_subnetwork resource to update subnet configurations with terraform.

edit subnets.tf
TODO

Review TF Plan:

terraform plan -var-file terraform.tfvars

Create VPC:

terraform apply -var-file terraform.tfvars

Review created subnet:

gcloud compute networks subnets describe gke-standard-$ORG-$PRODUCT-$ENV-subnet --region us-central1
gcloud compute networks subnets describe gke-auto-$ORG-$PRODUCT-$ENV-subnet --region us-central1

Example of the Output:

creationTimestamp: '2022-10-03T19:56:09.579-07:00'
enableFlowLogs: true
fingerprint: Y8-BjoK1gK8=
gatewayAddress: 10.130.0.1
id: '251139681714314790'
ipCidrRange: 10.130.0.0/24
kind: compute#subnetwork
logConfig:
  aggregationInterval: INTERVAL_15_MIN
  enable: true
  filterExpr: 'true'
  flowSampling: 0.5
  metadata: EXCLUDE_ALL_METADATA
name: gke-standard-archy-notepad-dev-subnet
network: https://www.googleapis.com/compute/v1/projects/archy-notepad-dev-898/global/networks/vpc-archy-notepad-dev
privateIpGoogleAccess: true
privateIpv6GoogleAccess: DISABLE_GOOGLE_ACCESS
purpose: PRIVATE
region: https://www.googleapis.com/compute/v1/projects/archy-notepad-dev-898/regions/us-central1
secondaryIpRanges:
- ipCidrRange: 10.0.0.0/16
  rangeName: gke-standard-pods
- ipCidrRange: 10.100.0.0/23
  rangeName: gke-standard-services
selfLink: https://www.googleapis.com/compute/v1/projects/archy-notepad-dev-898/regions/us-central1/subnetworks/gke-standard-archy-notepad-dev-subnet
stackType: IPV4_ONLY

1.3 Create a Cloud router¶

Create Cloud Router for custom mode network (VPC), in the same region as the instances that will use Cloud NAT. Cloud NAT is only used to place NAT information onto the VMs. It is not used as part of the actual NAT gateway.

Task N3: Define a google_compute_router inside router.tf that will be able to create a NAT router so the nodes can reach DockerHub and external APIs from private cluster, using following parameters:

Create router for custom vpc_network created above with terraform
Same project as VPC
Same region as VPC
Router name: gke-net-router
Local BGP Autonomous System Number (ASN): 64514

Use following reference documentation to create Cloud Router Resource with terraform.

Hint

You can automatically recover vpc name from terraform output like this: google_compute_network.vpc_network.self_link.

edit router.tf
TODO
EOF

Save the populated file.

Review TF Plan:

terraform plan -var-file terraform.tfvars

Create Cloud Router:

terraform apply -var-file terraform.tfvars

Verify created Cloud Router:

CLI:

gcloud compute routers list
gcloud compute routers describe gke-net-router --region us-central1

Output:

bgp:
  advertiseMode: DEFAULT
  asn: 64514
  keepaliveInterval: 20
kind: compute#router
name: gke-net-router

UI:

Networking -> Hybrid Connectivity -> Cloud Routers

Result

Router resource has been created for VPC Network

1.4 Create a Cloud Nat¶

Set up a simple Cloud Nat configuration using google_compute_router_nat resource, which will automatically allocates the necessary external IP addresses to provide NAT services to a region.

When you use auto-allocation, Google Cloud reserves IP addresses in your project automatically.

cat <<EOF >> cloudnat.tf
resource "google_compute_router_nat" "gke_cloud_nat" {
  project                = var.gcp_project_id
  name                   = "gke-cloud-nat"
  router                 = google_compute_router.gke_net_router.name
  region                 = var.gcp_region
  nat_ip_allocate_option = "AUTO_ONLY"
  source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
}
EOF

Review TF Plan:

terraform plan -var-file terraform.tfvars

Create Cloud Router:

terraform apply -var-file terraform.tfvars

Verify created Cloud Nat:

CLI:

# List available Cloud Nat Routers
gcloud compute routers nats list --router gke-net-router --router-region us-central1
# Describe Cloud Nat Routers `gke-cloud-nat`:
gcloud compute routers nats describe gke-cloud-nat --router gke-net-router --router-region us-central1

Output:

enableEndpointIndependentMapping: true
endpointTypes:
- ENDPOINT_TYPE_VM
icmpIdleTimeoutSec: 30
name: gke-cloud-nat
natIpAllocateOption: AUTO_ONLY
sourceSubnetworkIpRangesToNat: ALL_SUBNETWORKS_ALL_IP_RANGES
tcpEstablishedIdleTimeoutSec: 1200
tcpTransitoryIdleTimeoutSec: 30
udpIdleTimeoutSec: 30

UI:

Networking -> Network Services -> Cloud NAT

Result

A NAT service created in a router

Task N4: Additionally turn ON logging feature for ALL log types of communication for Cloud Nat

edit cloudnat.tf
TODO

Review TF Plan:

terraform plan -var-file terraform.tfvars

Update Cloud Nat Configuration:

terraform apply -var-file terraform.tfvars

Output:

Apply complete! Resources: 0 added, 1 changed, 0 destroyed.

gcloud compute routers nats describe gke-cloud-nat --router gke-net-router --router-region us-central1

Result

Cloud Nat now supports Logging. Cloud NAT logging allows you to log NAT connections and errors. When Cloud NAT logging is enabled, one log entry can be generated for each of the following scenarios:

When a network connection using NAT is created.
When a packet is dropped because no port was available for NAT.

1.5 Create a Private GKE Cluster and delete default node pool¶

1.5.1 Enable GCP Beta Provider¶

In order to create a GKE cluster with terraform we will be leveraging google_container_cluster resource.

Some of google_container_cluster arguments, such VPC-Native networking mode, VPA, Istio, CSI Driver add-ons, requires google-beta Provider.

The google-beta provider is distinct from the google provider in that it supports GCP products and features that are in beta, while google does not. Fields and resources that are only present in google-beta will be marked as such in the shared provider documentation.

Configure and Initialize GCP Beta Provider, similar to how we did it for GCP Provider in 1.3.3 Initialize Terraform update provider.tf and main.tf configuration files.

rm provider.tf
cat <<EOF >> provider.tf
terraform {
  required_providers { 
    google = {
      source  = "hashicorp/google"
      version = "~> 4.37.0"
    }
    google-beta = {
      source  = "hashicorp/google-beta"
      version = "~> 4.37.0"
    }
  }
}
EOF

cat <<EOF >>  main.tf
provider "google-beta" {
  project = var.gcp_project_id
  region  = var.gcp_region
}
EOF

Initialize google-beta provider plugin:

terraform init

Success

Terraform has been successfully initialized!

1.5.2 Enable Kubernetes Engine API¶

Kubernetes Engine API used to build and manages container-based applications, powered by the open source Kubernetes technology. Before starting GKE cluster creation it is required to enable it.

Task N5: Enable container.googleapis.com in services.tf file similar to what we already did in 2.2 Enable required GCP Services API. Make sure that service is set to disable_on_destroy=false, which helps to prevent errors during redeployments of the system (aka immutable infra)

edit services.tf
TODO

Note

Adding disable_on_destroy=false helps to prevent errors during redeployments of the system.

Review TF Plan:

terraform plan -var-file terraform.tfvars

Update Cloud Nat Configuration:

terraform apply -var-file terraform.tfvars

1.5.3 Create a Private GKE Cluster and delete default node pool¶

Using Terraform resource google_container_cluster resource create a Regional, Private GKE cluster, with following characteristics:

Cluster Configuration:

Cluster name: gke-$ORG-$PRODUCT-$ENV
GKE Control plane is replicated across three zones of a region: us-central1
Private cluster with unrestricted access to the public endpoint:
- Cluster Nodes access: Private Node GKE Cluster with Public API endpoint
- Cluster K8s API access: with unrestricted access to the public endpoint
Cluster Node Communication: VPC Native
Secondary pod range with name: gke-standard-pods
Secondary service range with name: gke-standard-services
GKE Release channel: regular
GKE master and node version: "1.22.12-gke.300"
Terraform Provider: google-beta
Timeouts to finish creation of cluster and deletion of default node pool: 30M
Features:
- Enable Cilium based Networking DataplaneV2: enable-dataplane-v2
- Configure Workload Identity Pool: PROJECT_ID.svc.id.goog
- Enable HTTP Load-balancing Addon: http_load_balancing

Terraform resource google_container_node_pool creates a custom GKE Node Pool.

Custom Node Pool Configuration:

The name of a GCE machine (VM) type: e2-micro
Node count: 1 per zone
Node images: Container-Optimized OS
GKE Node Pool Boot Disk Size: "100 Gb"

Note

Why delete default node pool? The default node pools cause trouble with managing the cluster, when created with terraform as it is not part of the terraform lifecycle. GKE Architecture Best Practice recommends to delete default node pool and create a custom one instead and manage the node pools explicitly.

Note

Why define Timeouts for gke resource? Normally GKE creation takes few minutes. However, in our case we creating GKE Cluster, and then system cordon, drain and then destroy default node pool. This process may take 10-20 minutes and we want to make sure terraform will not time out during this time.

Step 1: Let's define GKE resource first:

edit gke.tf

resource "google_container_cluster" "primary_cluster" {
  provider = google-beta

  project = var.gcp_project_id

  name               = format("gke-standard-%s-%s-%s", var.org, var.product, var.environment)
  min_master_version = var.kubernetes_version
  network            = google_compute_network.vpc_network.self_link
  subnetwork         = google_compute_subnetwork.gke_standard_subnet.self_link

  location                    = var.gcp_region
  logging_service             = var.logging_service
  monitoring_service          = var.monitoring_service

  remove_default_node_pool = true
  initial_node_count       = 1

  private_cluster_config {
    enable_private_nodes   = var.enable_private_nodes
    enable_private_endpoint = var.enable_private_endpoint
    master_ipv4_cidr_block = var.master_ipv4_cidr_block
  }

  # Enable Dataplane V2
  datapath_provider = "ADVANCED_DATAPATH"

  release_channel {
    channel = "REGULAR"
  }


  addons_config {
    http_load_balancing {
      disabled = var.disable_http_load_balancing
    }

  ip_allocation_policy {
    cluster_secondary_range_name  = var.pods_cidr_name
    services_secondary_range_name = var.services_cidr_name
  }

  timeouts {
    create = "30m"
    update = "30m"
    delete = "30m"
  }

  workload_identity_config {
    workload_pool  = "${var.gcp_project_id}.svc.id.goog"
  }
}

Step 2: Next define GKE cluster specific variables:

cat <<EOF >> gke_variables.tf

# variables used to create GKE Cluster Control Plane

variable "kubernetes_version" {
  default     = ""
  type        = string
  description = "The GKE version of Kubernetes"
}

variable "logging_service" {
  description = "The logging service that the cluster should write logs to."
  default     = "logging.googleapis.com/kubernetes"
}

variable "monitoring_service" {
  default     = "monitoring.googleapis.com/kubernetes"
  description = "The GCP monitoring service scope"
}

variable "disable_http_load_balancing" {
  default     = false
  description = "Enable HTTP Load balancing GCP integration"
}

variable "pods_range_name" {
  description = "The pre-defined IP Range the Cluster should use to provide IP addresses to pods"
  default     = ""
}

variable "services_range_name" {
  description = "The pre-defined IP Range the Cluster should use to provide IP addresses to services"
  default     = ""
}

variable "enable_private_nodes" {
  default     = false
  description = "Enable Private-IP Only GKE Nodes"
}

variable "enable_private_endpoint" {
  default     = false
  description = "When true, the cluster's private endpoint is used as the cluster endpoint and access through the public endpoint is disabled."
}

variable "master_ipv4_cidr_block" {
  description = "The ipv4 cidr block that the GKE masters use"
}

variable "release_channel" {
  type        = string
  default     = ""
  description = "The release channel of this cluster"
}
EOF

Step 3: Define GKE cluster specific outputs:

edit outputs.tf

Add following outputs and save file:

output "id" {
  value = "${google_container_cluster.primary_cluster.id}"
}
output "endpoint" {
  value = "${google_container_cluster.primary_cluster.endpoint}"
}
output "master_version" {
  value = "${google_container_cluster.primary_cluster.master_version}"
}

Task N6: Complete terraform.tfvars with required values to GKE Cluster specified above:

edit terraform.tfvars

//gke specific
enable_private_nodes   = "TODO"  
master_ipv4_cidr_block = "TODO"    # Using Table 1 `kubectl api range` for GKE Standard
kubernetes_version     = "TODO"    # From GKE Cluster requirements
release_channel        = "TODO"    # From GKE Cluster requirements
EOF

In the next step, we going to create a custom GKE Node Pool.

1.5.4 Create a GKE `custom` Node pool¶

Using google_container_node_pool resource create a custom GKE Node Pool with following characteristics:

Node Pool Configuration:

Custom Node Pool Configuration:

The name of a GCE machine (VM) type: e2-micro
Node count: 1 per zone
Node images: Container-Optimized OS
GKE Node Pool Boot Disk Size: "100 Gb"

Step 1: Let's define GKE resource first:

cat <<EOF >> gke.tf

#Node Pool Resource
resource "google_container_node_pool" "custom-node_pool" {

  name       = "main-pool"
  location = var.gcp_region
  project  = var.gcp_project_id
  cluster    = google_container_cluster.primary_cluster.name
  node_count = var.gke_pool_node_count
  version    = var.kubernetes_version

  node_config {
    image_type   = var.gke_pool_image_type
    disk_size_gb = var.gke_pool_disk_size_gb
    disk_type    = var.gke_pool_disk_type
    machine_type = var.gke_pool_machine_type
  }

  timeouts {
    create = "10m"
    delete = "10m"
  }

  lifecycle {
    ignore_changes = [
      node_count
    ]
  }
}
EOF

Step 2: Next define GKE cluster specific variables:

cat <<EOF >> gke_variables.tf

#Node Pool specific variables
variable "gke_pool_machine_type" {
  type = string
}
variable "gke_pool_node_count" {
  type = number
}
variable "gke_pool_disk_type" {
  type = string
  default = "pd-standard"
}
variable "gke_pool_disk_size_gb" {
  type = string
}
variable "gke_pool_image_type" {
  type = string
}
EOF

Task 7 (Continued): Complete terraform.tfvars with required values to GKE Node Pool values specified above:

edit terraform.tfvars

//pool specific
gke_pool_node_count   = "TODO"
gke_pool_image_type   = "TODO"
gke_pool_disk_size_gb = "TODO"
gke_pool_machine_type = "TODO"

Step 3: Review TF Plan:

terraform plan -var-file terraform.tfvars

Step 4: Create GKE Cluster and Node Pool:

terraform apply -var-file terraform.tfvars

Output:

google_container_cluster.primary_cluster: Creating...
...
google_container_cluster.primary_cluster: Creation complete after 20m9s 
google_container_node_pool.custom-node_pool: Creating...
google_container_node_pool.custom-node_pool: Creation complete after 2m10s

Note

GKE Cluster Control plain and deletion of default Node Pool may take about 12 minutes, creation custom Node Pool another 6 minutes.

Verify Cluster has been created:

export ORG=$student_name

export PRODUCT=notepad
export ENV=dev

gcloud container clusters list
gcloud compute networks subnets describe gke-standard-$ORG-$PRODUCT-$ENV --region us-central1

1.5.5 Update GKE Node Pool to support Auto Upgrade and Auto Recovery features¶

Note

GKE Master Nodes are managed by Google and get's upgraded automatically. Users can only specify Maintenance Window if they have preference for that process to occur (e.g. after busy hours). Users can however control Node Pool upgrade lifecycle. They can choose to do it themselves or with Auto Upgrade.

Task N8: Using google_container_node_pool resource update node pool to turn-off Auto Upgrade and Auto Repair features that enabled by default.

edit gke.tf
TODO

Solution:

  management {
    auto_repair  = false
    auto_upgrade = false
  }

Step 3: Review TF Plan:

terraform plan -var-file terraform.tfvars

No errors.

Step 4: Update GKE Cluster Node Pool configuration:

terraform apply -var-file terraform.tfvars

Summary

Congrats! You've now learned how to deploy production grade GKE clusters.

1.6 Create Auto Mode GKE Cluster with Terraform¶

Step 1: Define GKE Autopilot resource first:

cat <<EOF >> gke_auto.tf
resource "google_container_cluster" "auto_cluster" {
  provider = google-beta

  project = var.gcp_project_id

  name               = format("gke-auto-%s-%s-%s", var.org, var.product, var.environment)
  min_master_version = var.kubernetes_version
  network            = google_compute_network.vpc_network.self_link
  subnetwork         = google_compute_subnetwork.gke_auto_subnet.self_link

  location                    = var.gcp_region
  logging_service             = var.logging_service
  monitoring_service          = var.monitoring_service

# Enable Autopilot for this cluster
  enable_autopilot = true

# Private Autopilot GKE cluster
  private_cluster_config {
    enable_private_nodes   = var.enable_private_nodes
    enable_private_endpoint = var.enable_private_endpoint
    master_ipv4_cidr_block = var.auto_master_ipv4_cidr_block
  }

# Configuration options for the Release channel feature, which provide more control over automatic upgrades of your GKE clusters.
  release_channel {
    channel = "REGULAR"
  }

# Configuration of cluster IP allocation for VPC-native clusters
  ip_allocation_policy {
    cluster_secondary_range_name  = var.pods_auto_cidr_name
    services_secondary_range_name = var.services_auto_cidr_name
  }

  timeouts {
    create = "20m"
    update = "20m"
    delete = "20m"
  }
}

Step 2: Next define GKE cluster specific variables:

cat <<EOF >> gke_variables.tf

# variables used to create GKE AutoPilot Cluster Control Plane

variable "auto_master_ipv4_cidr_block" {
  description = "The ipv4 cidr block that the GKE masters use"
}
EOF

Step 3: Define GKE cluster specific outputs:

edit outputs.tf

Add following outputs and save file:

output "autopilot_id" {
  value = "${google_container_cluster.auto_cluster.id}"
}
output "autopilot_endpoint" {
  value = "${google_container_cluster.auto_cluster.endpoint}"
}
output "autopilot_master_version" {
  value = "${google_container_cluster.auto_cluster.master_version}"
}

Update tfvars

edit terraform.tfvars

//gke autopilot specific
auto_master_ipv4_cidr_block = "172.16.0.16/28"

Step 4: Review TF Plan:

terraform plan -var-file terraform.tfvars

Step 5: Create GKE Autopilot Cluster:

terraform apply -var-file terraform.tfvars

Output:

google_container_cluster.auto_cluster: Still creating..
...
google_container_cluster.auto_cluster: Creation complete after 7m58s

Verify Cluster has been created:

export ORG=$student_name

export PRODUCT=notepad
export ENV=dev

gcloud container clusters list
gcloud compute networks subnets describe gke-standard-$ORG-$PRODUCT-$ENV --region us-central1

Task 9: Using google_container_cluster make sure that Maintenance Windows are set to occur Daily on weekdays from 9:00-17:00 UTC-4, but skip weekends, starting from October 2nd:

--maintenance-window-start 2022-10-02T09:00:00-04:00 \
--maintenance-window-end 2022-10-02T17:00:00-04:00 \
--maintenance-window-recurrence 'FREQ=WEEKLY;BYDAY=MO,TU,WE,TH,FR'

edit gke_auto.tf
TODO

Step 6: Review TF Plan:

terraform plan -var-file terraform.tfvars

Step 7: Create GKE Autopilot Cluster:

terraform apply -var-file terraform.tfvars

google_container_cluster.auto_cluster: Modifying... [id=projects/archy-notepad-dev-898/locations/us-central1/clusters/gke-auto-archy-notepad-dev]
google_container_cluster.auto_cluster: Modifications complete after 2s [id=projects/archy-notepad-dev-898/locations/us-central1/clusters/gke-auto-archy-notepad-dev]

Browse GKE UI and verify that GKE Autopilot is configured with Maintenance window.

Success

We can now create GKE Autopilot cluster

1.7 (Optional) Repeatable Infrastructure¶

When you doing IaC it is important to insure that you can both create and destroy resources consistently. This is especially important when doing CI/CD testing.

Step 3: Destroy all resources:

terraform destroy -var-file terraform.tfvars

No errors.

Step 4: Recreate all resources:

terraform plan -var-file terraform.tfvars
terraform apply -var-file terraform.tfvars

1.8 Create Documentation for terraform code¶

Documentation for your terraform code is an important part of IaC. Make sure all your variables have a good description!

There are community tools that have been developed to make the documentation process smoother, in terms of documenting Terraform resources and requirements.Its good practice to also include a usage example snippet.

Terraform-Docs is a good example of one tool that can generate some documentation based on the description argument of your Input Variables, Output Values, and from your required_providers configurations.

Task N10: Create Terraform Documentation for your infrastructure.

TODO

Step 1 Install the terraform-docs cli to your Google CloudShell environment:

curl -sSLo ./terraform-docs.tar.gz https://terraform-docs.io/dl/v0.16.0/terraform-docs-v0.16.0-$(uname)-amd64.tar.gz
tar -xzf terraform-docs.tar.gz
chmod +x terraform-docs
sudo mv terraform-docs /usr/local/bin/
terraform-docs

Generating terraform documentation with Terraform Docs:

cd ~/$student_name-notepad/ycit020_module4/foundation-infrastructure
terraform-docs markdown . > README.md
cd ~/$student_name-notepad/ycit020_module4/notepad-infrastructure
terraform-docs markdown . > README.md

Verify created documentation:

edit README.md

Step 1 Commit ycit020_module4 folder using the following Git commands:

cd ~/$student_name-notepad

git add .
git commit -m "TF manifests for Module 4 Assignment"

Step 2 Push commit to the Cloud Source Repositories:

git push origin master

Result

Your instructor will be able to review you code and grade it.

1.10 Cleanup¶

We only going to cleanup GCP Service foundation layer, as we going to use GCP project in future.

cd ~/$student_name-notepad/ycit020_module4/notepad-infrastructure
terraform destroy -var-file terraform.tfvars

2. Workaround for Project Quota issue¶

If you see following error during project creation in foundation layer:

Error: Error setting billing account "010BE6-CA1129-195D77" for project "projects/ayrat-notepad-dev-244": googleapi: Error 400: Precondition check failed., failedPrecondition

This is due to our Billing account has quota of 5 projects per account.

To solve this issue find all unused accounts:

gcloud beta billing projects list --billing-account $ACCOUNT_ID

And unlink them, so you have less then 5 projects per account:

gcloud beta billing projects unlink $PROJECT_ID

Creating Production GKE Cluster with Terraform¶

1 Creating Production GKE Cluster¶

Prepare Lab Environment¶

1.1 Locate Module 4 Assignment¶

1.2 Create a user-managed subnet with terraform¶

1.3 Create a Cloud router¶

1.4 Create a Cloud Nat¶

1.5 Create a Private GKE Cluster and delete default node pool¶

1.5.1 Enable GCP Beta Provider¶

1.5.2 Enable Kubernetes Engine API¶

1.5.3 Create a Private GKE Cluster and delete default node pool¶

1.5.4 Create a GKE custom Node pool¶

1.5.5 Update GKE Node Pool to support Auto Upgrade and Auto Recovery features¶

1.6 Create Auto Mode GKE Cluster with Terraform¶

1.7 (Optional) Repeatable Infrastructure¶

1.8 Create Documentation for terraform code¶

1.9 Commit Readme doc to repository and share it with Instructor/Teacher¶

1.10 Cleanup¶

2. Workaround for Project Quota issue¶

1.2 Create a `user-managed` subnet with terraform¶

1.5.4 Create a GKE `custom` Node pool¶