I’m a big fan of infrastructure-as-code (IaC). Being able to rebuild a complete set of cloud resources with a single command makes it straightforward to spin up new environments, whether for testing, disaster recovery, or staging; having the configuration stored in code enables tracking of changes over time, along with the associated reasons for any changes; and just being able to add comments to explain why things are set up the way they are is an often overlooked benefit. Knowing that I haven’t simply forgotten to make an adjustment on the production system after making it on the development system gives a lot of comfort.
Most of my IaC experience up until now has been with Hashicorp Terraform. I have briefly dabbled with CloudFormation while using the Serverless Framework and I have built some environments with Ansible, but Terraform has been my tool-of-choice in this field. However, the team I have joined for my current contract use AWS CDK for their IaC needs, and it has been interesting to give it a spin. I have found that in some areas it is incredibly powerful while in others it is fragile, and I thought I would add to the pile of comparisons on the web with my own perspectives on the matter.
First, a quick summary:
CDK
The AWS Cloud Development Kit (CDK) is a tool and set of libraries that allow developers to use a familiar language to specify their infrastructure requirements. The cdk
tool translates the specification into a CloudFormation configuration and uploads the configuration to AWS for deployment.
Being based upon CloudFormation has both pros and cons. The main points that I want to note here are
- 👍 the state is simpler to set up (as everything lives within the AWS account)
- 👎 imports, moves, and other adjustments to existing resources are difficult or impossible.
In addition to the features inherited from CloudFormation, the CDK libraries have a rich collection of high-level constructs that make common patterns extremely simple.
Terraform
Terraform is a tool by Hashicorp which allows developers to specify their infrastructure requirements using a specific language (Hashicorp Configuration Language, or HCL). The terraform
tool uses the APIs of relevant cloud providers to make the infrastructure meet the configured requirements.
- 👍 As a provider-independent tool, Terraform is able to manage resources in multiple different providers from the same configuration, e.g., storage and processing in AWS, DNS records and content distribution with Cloudflare, and authentication with Auth0.
- 👍 Terraform’s approach to state management is a lot more flexible than CloudFormation (and thus CDK), allowing the IaC structure to be reorganised without having to destroy state-holding resources (e.g. databases).
- 👎 Decent non-local infrastructure state storage requires more work with Terraform than CDK.
- 👎 While higher-level constructs exist, they are primarily from third party sources with variable quality and reliability.
How do they look?
Let’s assume you want to set up a static website with CloudFront as a content delivery network. In Terraform, you might write something like
|
|
In CDK, assuming you want to write your configuration with Python, this would look more like
|
|
From these samples, we can see several important differences. First of all, Terraform uses a declarative language to define the resources, while CDK uses a procedural one. Using a declarative language can require a change in mindset when you’re accustomed to a procedural perspective; I found this very much when I first learnt VHDL - the “ah-ha!” moment of grasping that everything is happening at the same time makes a big difference. Since the resources being deployed are all expected to exist together, at the same time, a declarative approach makes a lot of sense for specifying the desired configuration. CloudFormation is also declarative (it’s just a bunch of JSON files), so CDK does eventually produce something in this form, but on the surface it is procedural, which can be a trap for new players that treat it as a way of specifying a sequence of instructions to send to AWS.
Secondly, even in this simple example we start to see the greater expressiveness of CDK with its higher-level constructs. Consider the deployment of files to S3. In Terraform, each static file is a separate resource to be uploaded and tracked, while in CDK, the whole source directory is treated as a single asset to be uploaded to S3. Behind the scenes, CDK creates a Lambda function to handle the delivery of the individual files into the distribution bucket, but from our perspective, we just defined an “asset” and it was uploaded. Even more useful is the distribution
parameter: this allows us to specify a distribution whose cache should be invalidated whenever changed files are uploaded. Again, this is implemented by the Lambda function, but all we need to do is specify a single argument.
Another helpful feature of CDK can be seen in the DnsValidatedCertificate
construct. This wraps up the TLS certificate resource as well as any DNS records required to validate it; again this is implemented with a Lambda function. In Terraform this can also be handled elegantly with the terraform-aws-modules/acm/aws
module, but this is from a third-party source, not built right into the core library. I wasn’t able to find a similar third-party module that takes care of invalidating a CloudFront cache on file-change for Terraform. And one last convenience we can see offered by CDK here is the automatic handling of the Origin Access Identity.
Breaking it down
Of course, software is never finished and infrastructure requirements are never set in stone. When your project’s needs change, your IaC configuration may need some adjustments. Perhaps you need to add some dynamic content to your website, and want an API to do some calculations for it. Both tools will, of course, allow you to make these additions: just add a Lambda with an API Gateway, or an ECS service with a load-balancer. But now you might look at your configuration, still in one file, and think “that could really be separated into a few smaller parts”; or maybe you want to reuse some subset of the configuration.
In Terraform, there are several ways to break a configuration into smaller, more focused chunks. The simplest approach is to just put the configuration into multiple files in the same directory - Terraform will pick them all up and use them all. A slightly more sophisticated approach is to use modules, which allow a collection of resources to be abstracted away; along with keeping related things together, a module also allows reuse.
To demonstrate how to create and use a module, here’s a brief, somewhat contrived example based on the sample code above. Perhaps we want to abstract out the idea of “upload this source directory of files to an S3 bucket”. In a separate directory, we would create a file with the following content:
|
|
and we could use this like
|
|
Meanwhile, in CDK we generally group related resources into a Construct. Constructs can then be separated into their own source files as appropriate, and all pulled into the main source file in the same way as usual for the language being used (e.g., in Python, you’d import
them). Larger groupings of resources are made with Stacks, which are then actually deployed separately with CloudFormation.
In CDK the BucketDeployment
is somewhat similar to the module I’ve shown above, so I’ll show an even more contrived example of how to group resources into a Construct:
|
|
and this would then be used like
|
|
Shaking things up
Now that we can break our configuration into component parts, what happens to anything we’ve already deployed if we want to do that? Both CDK and Terraform maintain a mapping from resource definitions to the deployed resources so that they can find the resources again and update them on subsequent deployments. If we move those resources around (e.g., by taking them out of the single monolithic definition and putting them into a reusable module), our tool won’t know that mymodule->myresource
is actually just root->myresource
, and will create a new resource for the former and destroy the latter. For some resources (especially stateless ones), this may not cause any problems. But for many types of resource, particularly stateful ones (e.g., databases and S3 buckets), this would be terrible, resulting in data loss!
To address this situation, Terraform has a mechanism to tell it that a resource has changed its location in the configuration: state mv
. This command alters the (path-qualified) name of the resource in the state map so that Terraform knows where the resource now lives. For example, if we created a module like I showed in the previous section (moving the S3 bucket into a module), the command might look something like
terraform state mv aws_s3_bucket.bucket module.source_files.aws_s3_bucket.bucket
I haven’t had to try to incorporate this into a continuous-deployment pipeline, and while it sounds a little tricky I imagine it should be possible if treated in a similar way to database migrations - an ordered list of state movements to be performed.
Meanwhile, I’m not aware of a good way to move existing resources around in CDK. In a real squeeze the renameLogicalId
might make some movements possible, but it doesn’t look very sustainable. To me this is the biggest limitation of CDK (inherited from its base of CloudFormation).
A higher plane
Cloud computing has been around long enough now that common patterns of resources can be found for various use cases. CDK offers high-level constructs to quickly and easily implement many of these patterns without needing to build them up from their constituent resources by hand. For example, the ApplicationLoadBalancedFargateService
is a single construct that wraps up a Fargate service on ECS with an application load-balancer and appropriate configuration to connect the two. Similarly, the LambdaRestApi
sets up a REST API backed up a Lambda function.
To show how expressive these can be, here’s how the first example above might be used:
|
|
./service_src
, create a service running that docker image on that cluster, create an application load-balancer and configure it to pass traffic to the service. Magic!
¿Por qué no los dos?
Terraform has an experimental (as of writing) tool that allows you to write your configuration as CDK but have it converted into Terraform configuration files instead of CloudFormation configuration. This exposes all the multi-provider benefits of Terraform, without requiring developers to learn a new language. I haven’t tried CDKTF, and I’m not sure if all of the high-level constructs from regular CDK are available, nor how it handles reorganisation of the IaC source. If you have tried it, let me know how you found it in the comments below ⏬
Conclusion
I have enjoyed my adventure with CDK, found it to be powerful and expressive. I haven’t yet run into the headaches of refactoring CDK resources, so at this stage I would happily use it again on a new project. I have also always enjoyed using Terraform, so to be honest I’m not sure which I will choose next time I have a choice!