Another bout in the IaC war?

Another bout in the IaC war?

I’m a big fan of infrastructure-as-code (IaC). Being able to rebuild a complete set of cloud resources with a single command makes it straightforward to spin up new environments, whether for testing, disaster recovery, or staging; having the configuration stored in code enables tracking of changes over time, along with the associated reasons for any changes; and just being able to add comments to explain why things are set up the way they are is an often overlooked benefit. Knowing that I haven’t simply forgotten to make an adjustment on the production system after making it on the development system gives a lot of comfort.

Most of my IaC experience up until now has been with Hashicorp Terraform. I have briefly dabbled with CloudFormation while using the Serverless Framework and I have built some environments with Ansible, but Terraform has been my tool-of-choice in this field. However, the team I have joined for my current contract use AWS CDK for their IaC needs, and it has been interesting to give it a spin. I have found that in some areas it is incredibly powerful while in others it is fragile, and I thought I would add to the pile of comparisons on the web with my own perspectives on the matter.

First, a quick summary:

CDK

The AWS Cloud Development Kit (CDK) is a tool and set of libraries that allow developers to use a familiar language to specify their infrastructure requirements. The cdk tool translates the specification into a CloudFormation configuration and uploads the configuration to AWS for deployment.

Being based upon CloudFormation has both pros and cons. The main points that I want to note here are

  • :+1: the state is simpler to set up (as everything lives within the AWS account)
  • :-1: imports, moves, and other adjustments to existing resources are difficult or impossible.

In addition to the features inherited from CloudFormation, the CDK libraries have a rich collection of high-level constructs that make common patterns extremely simple.

Terraform

Terraform is a tool by Hashicorp which allows developers to specify their infrastructure requirements using a specific language (Hashicorp Configuration Language, or HCL). The terraform tool uses the APIs of relevant cloud providers to make the infrastructure meet the configured requirements.

  • :+1: As a provider-independent tool, Terraform is able to manage resources in multiple different providers from the same configuration, e.g., storage and processing in AWS, DNS records and content distribution with Cloudflare, and authentication with Auth0.
  • :+1: Terraform’s approach to state management is a lot more flexible than CloudFormation (and thus CDK), allowing the IaC structure to be reorganised without having to destroy state-holding resources (e.g. databases).
  • :-1: Decent non-local infrastructure state storage requires more work with Terraform than CDK.
  • :-1: While higher-level constructs exist, they are primarily from third party sources with variable quality and reliability.

How do they look?

Let’s assume you want to set up a static website with CloudFront as a content delivery network. In Terraform, you might write something like

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
provider "aws" {
  region = "ap-southeast-2"
}

provider "aws" {
  # TLS certificate for CloudFront must be in us-east-1
  region = "us-east-1"
  alias  = "tls_cert_provider"
}

locals {
  s3_origin_id = "example_website"
  dns_root     = "example.opie.nz"
  domain_name  = "examplesite.${local.dns_root}"
}

# S3

resource "aws_s3_bucket" "bucket" {
  acl           = "private"
  bucket_prefix = "static-site"
}

module "template_files" {
  source = "hashicorp/dir/template"

  base_dir = "${path.module}/website_files"
}

resource "aws_s3_bucket_object" "files" {
  for_each = module.template_files.files

  bucket = aws_s3_bucket.bucket.bucket
  key    = each.key
  source = each.value.source_path
  content_type = each.value.content_type
}

data "aws_iam_policy_document" "cf_access" {
  statement {
    actions   = ["s3:GetObject"]
    resources = ["${aws_s3_bucket.bucket.arn}/*"]

    principals {
      type        = "AWS"
      identifiers = [aws_cloudfront_origin_access_identity.oai.iam_arn]
    }
  }
}

resource "aws_s3_bucket_policy" "cf_access" {
  bucket = aws_s3_bucket.bucket.id
  policy = data.aws_iam_policy_document.cf_access.json
}

resource "aws_cloudfront_origin_access_identity" "oai" {
  comment = "Identity to allow CloudFront access to files on S3"
}

# CloudFront

resource "aws_cloudfront_distribution" "cdn" {
  aliases             = [local.domain_name]
  enabled             = true
  default_root_object = "index.html"
  default_cache_behavior {
    allowed_methods        = ["HEAD", "GET"]
    cached_methods         = ["HEAD", "GET"]
    target_origin_id       = local.s3_origin_id
    viewer_protocol_policy = "redirect-to-https"
    forwarded_values {
      query_string = false
      cookies {
        forward = "none"
      }
    }
  }
  origin {
    domain_name = aws_s3_bucket.bucket.bucket_regional_domain_name
    origin_id   = local.s3_origin_id
    s3_origin_config {
      origin_access_identity = aws_cloudfront_origin_access_identity.oai.cloudfront_access_identity_path
    }
  }
  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }
  viewer_certificate {
    acm_certificate_arn = module.acm.acm_certificate_arn
    ssl_support_method  = "sni-only"
  }
}

# DNS record

data "aws_route53_zone" "dns_zone" {
  name         = local.dns_root
  private_zone = false
}

resource "aws_route53_record" "website" {
  name    = local.domain_name
  type    = "A"
  zone_id = data.aws_route53_zone.dns_zone.zone_id
  alias {
    evaluate_target_health = false
    name                   = aws_cloudfront_distribution.cdn.domain_name
    zone_id                = aws_cloudfront_distribution.cdn.hosted_zone_id
  }
}

# TLS certificate

module "acm" {
  source = "terraform-aws-modules/acm/aws"

  providers = {
    aws = aws.tls_cert_provider
  }

  domain_name = local.domain_name
  zone_id     = data.aws_route53_zone.dns_zone.zone_id
}

In CDK, assuming you want to write your configuration with Python, this would look more like

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
#!/usr/bin/env python3
import os

from aws_cdk import (
    core as cdk,
    aws_certificatemanager as acm,
    aws_cloudfront as cloudfront,
    aws_cloudfront_origins as origins,
    aws_route53 as route53,
    aws_route53_targets as targets,
    aws_s3 as s3,
    aws_s3_deployment as s3_deployment,
)
from constructs import Construct

DNS_ROOT = "example.opie.nz"
DOMAIN_NAME = f"examplesite.{DNS_ROOT}"


class MainStack(cdk.Stack):
    def __init__(self, scope: Construct, construct_id: str, **kwargs):
        super().__init__(scope, construct_id, **kwargs)

        dns_zone = route53.HostedZone.from_lookup(
            self,
            "dns_zone",
            domain_name=DNS_ROOT,
        )

        # S3
        bucket = s3.Bucket(
            self,
            "bucket",
            access_control=s3.BucketAccessControl.PRIVATE,
        )

        # TLS certificate
        certificate = acm.DnsValidatedCertificate(
            self,
            "certificate",
            hosted_zone=dns_zone,
            domain_name=DOMAIN_NAME,
            region="us-east-1",
            cleanup_route53_records=True,
        )

        # CloudFront
        distribution = cloudfront.Distribution(
            self,
            "cdn",
            domain_names=[DOMAIN_NAME],
            enabled=True,
            default_root_object="index.html",
            default_behavior=cloudfront.BehaviorOptions(
                allowed_methods=cloudfront.AllowedMethods.ALLOW_GET_HEAD,
                cached_methods=cloudfront.CachedMethods.CACHE_GET_HEAD,
                origin=origins.S3Origin(bucket),
                viewer_protocol_policy=cloudfront.ViewerProtocolPolicy.REDIRECT_TO_HTTPS,
            ),
            certificate=certificate,
        )

        # S3 continued...
        s3_deployment.BucketDeployment(
            self,
            "files",
            destination_bucket=bucket,
            sources=[s3_deployment.Source.asset("./website_files")],
            distribution=distribution,
        )

        # DNS record
        route53.RecordSet(
            self,
            "website",
            record_name=DOMAIN_NAME,
            record_type=route53.RecordType.A,
            zone=dns_zone,
            target=route53.RecordTarget.from_alias(targets.CloudFrontTarget(distribution))
        )


app = cdk.App()
MainStack(
    app,
    "ExampleStack",
    env=cdk.Environment(account=os.getenv('CDK_DEFAULT_ACCOUNT'), region=os.getenv('CDK_DEFAULT_REGION')),
)

app.synth()

From these samples, we can see several important differences. First of all, Terraform uses a declarative language to define the resources, while CDK uses a procedural one. Using a declarative language can require a change in mindset when you’re accustomed to a procedural perspective; I found this very much when I first learnt VHDL - the “ah-ha!” moment of grasping that everything is happening at the same time makes a big difference. Since the resources being deployed are all expected to exist together, at the same time, a declarative approach makes a lot of sense for specifying the desired configuration. CloudFormation is also declarative (it’s just a bunch of JSON files), so CDK does eventually produce something in this form, but on the surface it is procedural, which can be a trap for new players that treat it as a way of specifying a sequence of instructions to send to AWS.

Secondly, even in this simple example we start to see the greater expressiveness of CDK with its higher-level constructs. Consider the deployment of files to S3. In Terraform, each static file is a separate resource to be uploaded and tracked, while in CDK, the whole source directory is treated as a single asset to be uploaded to S3. Behind the scenes, CDK creates a Lambda function to handle the delivery of the individual files into the distribution bucket, but from our perspective, we just defined an “asset” and it was uploaded. Even more useful is the distribution parameter: this allows us to specify a distribution whose cache should be invalidated whenever changed files are uploaded. Again, this is implemented by the Lambda function, but all we need to do is specify a single argument.

Another helpful feature of CDK can be seen in the DnsValidatedCertificate construct. This wraps up the TLS certificate resource as well as any DNS records required to validate it; again this is implemented with a Lambda function. In Terraform this can also be handled elegantly with the terraform-aws-modules/acm/aws module, but this is from a third-party source, not built right into the core library. I wasn’t able to find a similar third-party module that takes care of invalidating a CloudFront cache on file-change for Terraform. And one last convenience we can see offered by CDK here is the automatic handling of the Origin Access Identity.

Breaking it down

Of course, software is never finished and infrastructure requirements are never set in stone. When your project’s needs change, your IaC configuration may need some adjustments. Perhaps you need to add some dynamic content to your website, and want an API to do some calculations for it. Both tools will, of course, allow you to make these additions: just add a Lambda with an API Gateway, or an ECS service with a load-balancer. But now you might look at your configuration, still in one file, and think “that could really be separated into a few smaller parts”; or maybe you want to reuse some subset of the configuration.

In Terraform, there are several ways to break a configuration into smaller, more focused chunks. The simplest approach is to just put the configuration into multiple files in the same directory - Terraform will pick them all up and use them all. A slightly more sophisticated approach is to use modules, which allow a collection of resources to be abstracted away; along with keeping related things together, a module also allows reuse.

To demonstrate how to create and use a module, here’s a brief, somewhat contrived example based on the sample code above. Perhaps we want to abstract out the idea of “upload this source directory of files to an S3 bucket”. In a separate directory, we would create a file with the following content:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
variable "source_directory" {
  type        = string
  description = "The directory holding the files to upload"
}

resource "aws_s3_bucket" "bucket" {
  acl           = "private"
  bucket_prefix = "uploaded-files"
}

module "template_files" {
  source = "hashicorp/dir/template"

  base_dir = var.source_directory
}

resource "aws_s3_bucket_object" "files" {
  for_each = module.template_files.files

  bucket       = aws_s3_bucket.bucket.bucket
  key          = each.key
  source       = each.value.source_path
  content_type = each.value.content_type
}

output "bucket" {
  value = aws_s3_bucket.bucket
}

and we could use this like

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
module "source_files" {
  source           = "<the directory holding the above file>"
  source_directory = "${path.module}/website_files"
}

resource "aws_s3_bucket_policy" "cf_access" {
  bucket = module.source_files.bucket.id
  policy = data.aws_iam_policy_document.cf_access.json
}

...

Meanwhile, in CDK we generally group related resources into a Construct. Constructs can then be separated into their own source files as appropriate, and all pulled into the main source file in the same way as usual for the language being used (e.g., in Python, you’d import them). Larger groupings of resources are made with Stacks, which are then actually deployed separately with CloudFormation.

In CDK the BucketDeployment is somewhat similar to the module I’ve shown above, so I’ll show an even more contrived example of how to group resources into a Construct:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
class DistributionWithSource(cdk.Construct):
    def __init__(
            self,
            scope: Construct,
            construct_id: str,
            source_directory: str,
            domain_name: str,
            certificate: acm.ICertificate
    ):
        super().__init__(scope, construct_id)

        # S3
        bucket = s3.Bucket(
            self,
            "bucket",
            access_control=s3.BucketAccessControl.PRIVATE,
        )

        # CloudFront
        distribution = cloudfront.Distribution(
            self,
            "cdn",
            domain_names=[domain_name],
            enabled=True,
            default_root_object="index.html",
            default_behavior=cloudfront.BehaviorOptions(
                allowed_methods=cloudfront.AllowedMethods.ALLOW_GET_HEAD,
                cached_methods=cloudfront.CachedMethods.CACHE_GET_HEAD,
                origin=origins.S3Origin(bucket),
                viewer_protocol_policy=cloudfront.ViewerProtocolPolicy.REDIRECT_TO_HTTPS,
            ),
            certificate=certificate,
        )

        # S3 continued...
        s3_deployment.BucketDeployment(
            self,
            "files",
            destination_bucket=bucket,
            sources=[s3_deployment.Source.asset(source_directory)],
            distribution=distribution,
        )

and this would then be used like

1
2
3
4
5
6
7
8
9
class MainStack(cdk.Stack):
    def __init__(self, scope: Construct, construct_id: str, **kwargs):
        super().__init__(scope, construct_id, **kwargs)
        
        ...

        DistributionWithSource(self, "distribution", "./website_files", DOMAIN_NAME, certificate)
  
        ...

Shaking things up

Now that we can break our configuration into component parts, what happens to anything we’ve already deployed if we want to do that? Both CDK and Terraform maintain a mapping from resource definitions to the deployed resources so that they can find the resources again and update them on subsequent deployments. If we move those resources around (e.g., by taking them out of the single monolithic definition and putting them into a reusable module), our tool won’t know that mymodule->myresource is actually just root->myresource, and will create a new resource for the former and destroy the latter. For some resources (especially stateless ones), this may not cause any problems. But for many types of resource, particularly stateful ones (e.g., databases and S3 buckets), this would be terrible, resulting in data loss!

To address this situation, Terraform has a mechanism to tell it that a resource has changed its location in the configuration: state mv. This command alters the (path-qualified) name of the resource in the state map so that Terraform knows where the resource now lives. For example, if we created a module like I showed in the previous section (moving the S3 bucket into a module), the command might look something like

terraform state mv aws_s3_bucket.bucket module.source_files.aws_s3_bucket.bucket

I haven’t had to try to incorporate this into a continuous-deployment pipeline, and while it sounds a little tricky I imagine it should be possible if treated in a similar way to database migrations - an ordered list of state movements to be performed.

Meanwhile, I’m not aware of a good way to move existing resources around in CDK. In a real squeeze the renameLogicalId might make some movements possible, but it doesn’t look very sustainable. To me this is the biggest limitation of CDK (inherited from its base of CloudFormation).

A higher plane

Cloud computing has been around long enough now that common patterns of resources can be found for various use cases. CDK offers high-level constructs to quickly and easily implement many of these patterns without needing to build them up from their constituent resources by hand. For example, the ApplicationLoadBalancedFargateService is a single construct that wraps up a Fargate service on ECS with an application load-balancer and appropriate configuration to connect the two. Similarly, the LambdaRestApi sets up a REST API backed up a Lambda function.

To show how expressive these can be, here’s how the first example above might be used:

1
2
3
4
5
6
7
8
ecs_patterns.ApplicationLoadBalancedFargateService(
    self,
    "service",
    task_image_options=ecs_patterns.ApplicationLoadBalancedTaskImageOptions(
        image=ecs.ContainerImage.from_asset("./service_src"),
        container_port=443
    )
)
These 8 lines will set up a new ECS cluster in a new VPC, build a docker image from the source files in ./service_src, create a service running that docker image on that cluster, create an application load-balancer and configure it to pass traffic to the service. Magic!

¿Por qué no los dos?

Terraform has an experimental (as of writing) tool that allows you to write your configuration as CDK but have it converted into Terraform configuration files instead of CloudFormation configuration. This exposes all the multi-provider benefits of Terraform, without requiring developers to learn a new language. I haven’t tried CDKTF, and I’m not sure if all of the high-level constructs from regular CDK are available, nor how it handles reorganisation of the IaC source. If you have tried it, let me know how you found it in the comments below :arrow_double_down:

Conclusion

I have enjoyed my adventure with CDK, found it to be powerful and expressive. I haven’t yet run into the headaches of refactoring CDK resources, so at this stage I would happily use it again on a new project. I have also always enjoyed using Terraform, so to be honest I’m not sure which I will choose next time I have a choice!