Giving Concourse AWS Permissions With IAM Roles
I’m currently running Concourse for my team at work. It’s a two instance deployment, one web and one worker EC2 instance.
Most of the pipelines we write do stuff that require permissions to various AWS services. We needed to figure out a way to grant permissions to these AWS services to our pipelines.
Our first thought was to create an IAM role and assign that role to the Concourse Worker’s EC2 instance. Done! Every pipeline instantly had the permissions needed.
This worked as Proof-of-Concept when I was initally getting Concourse setup at work and wanted to test out some simple workflows. It did break down in our production environment though because our production workloads are spread across multiple AWS accounts and the Concourse worker can only have one IAM role assigned to it.
At this point I thought of two ways to solve this multi-aws accounts permissions issue:
- Create a Concourse Worker in each AWS account
- Create an IAM role in each AWS account that tasks on the single Concourse worker can assume
We went with option 2 because it was:
- The cheapest; spinning up a bunch of EC2 instances costs $$$$$
- More modular; we can make more granular roles since our pipelines always have to assume roles
- Pipelines don’t depend on Worker’s that have certain attributes
To make it easier to assume IAM roles we did have the Concourse Worker assigned a role that would allow it to assume other IAM Roles. This is a very minimal requirement in my opinion though and still forces us to think about the permissions each pipeline, and not worry about the condition the Worker the pipeline is running on.
IAM Roles Setup
We created an IAM Role called concourse-worker
. This role was assigned to the
Concourse Worker’s EC2 instance. It’s set of permissions looked like this:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Resources": ["arn:aws:iam::*:role/worker-permissions"]
}
]
}
This role allows processes running on that worker to assume any IAM role called
concourse-permissions
in any AWS account. Let’s look at the setup for that role next.
These next set of IAM roles we called worker-permissions
. We created one in
each AWS account that we want Concourse to have acccess to. The
worker-permissions
role had a set of IAM permissions and most importantly, a
Trust Policy that only allows the concourse-worker
role to assume it. The
Trust Policy looks like this (where 111111111111
is the AWS Account ID
that the concourse-worker
role is in):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "sts:AssumeRole",
"Principal": {
"AWS": "arn:aws:iam::111111111111:role/concourse-worker"
}
}
]
}
Assuming Roles Inside Pipelines
Now that our IAM roles are setup we can figure out how to assume these IAM
roles. On my team, we mainly use bash
and go
so we created some helper
functions and pipeline semantics to make it easy to assume roles in any tasks
that we create. First, the pipeline semantics.
To assume an IAM role we need one piece of information: The role’s ARN.
On any task that we make you can add an aws_assume_role_arn
environment
variable set to the ARN of one of the worker-permissions
roles. That env var
will then be picked up by the helper functions (see below) which will try to
assume the role.
We also decided to expose a variable to allow us to set how long the
credentials for the assumed role should be good for. By default AWS sets a
duration of 15 minutes. We decided to override this default and set it to one
hour in our helper functions. This can be overridden by setting the env var
aws_assume_role_duration
, which assumes a value in seconds.
In practice, our pipeline YAML for a single task looks like this:
jobs:
- name: my-job
plan:
- ...<get steps>
- task: install-foobar
params:
aws_assume_role_arn: arn:aws:iam::222222222222:role/worker-permissions
We don’t actually store the role ARN in the pipeline config, as shown above. We
store the ARN in our secret manager for Concourse (Vault) and reference the
path in Vault in our pipelines. So the env var ends up look like
aws_assume_role_arn: ((roles/worker-permissions))
. See the Concourse docs for
how to reference Vault secrest in your pipelines:
https://concourse-ci.org/vault-credential-manager.html
Assume Role Using bash
The aws
CLI is required in order for this helper script to work.
In a file called aws-auth.sh
we have the following script:
#!/usr/bin/env bash
set -euo pipefail
export AWS_PAGER=""
if [[ -z "${aws_assume_role_arn}" ]]; then
echo "aws_assume_role_arn not provided. Please provide an IAM role for the task to assume."
exit 1
fi
if [[ -z "${aws_assume_role_duration:-}" ]]; then
echo "aws_assume_role_duration not provided. Defaulting to 1hr (3600 seconds) session length"
fi
# auth and assume the role
export $(printf "AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s" \
$(aws sts assume-role \
--role-arn "${aws_assume_role_arn}"\
--role-session-name "concourse-task" \
--duration-seconds "${aws_assume_role_duration:-3600}" \
--query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]" \
--output text))
Near the top of all our bash scripts we then source
this script in order to
assume the role.
Assume Role Using go
We have a function that looks like the following that will assume Roles. An
aws.Config
is passed back to the caller and the standard set of AWS
credential env vars are also set in case we spawn a separate process that needs
permissions as well.
package helpers
import (
"context"
"log"
"os"
"strconv"
"time"
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/config"
"github.com/aws/aws-sdk-go-v2/credentials/stscreds"
"github.com/aws/aws-sdk-go-v2/service/sts"
)
func AssumeIamRole() aws.Config {
roleArn := os.Getenv("aws_assume_role_arn")
if roleArn == "" {
log.Fatal("aws_assume_role_arn not provided. Please provide an IAM role for the task to assume.")
}
ssessionDuration := 1 * time.Hour
readDuration := os.Getenv("aws_assume_role_duration")
if readDuration != "" {
i, err := strconv.Atoi(readDuration)
if err != nil {
log.Fatal("error aws_assume_role_duration, might not be an int:", err)
}
ssessionDuration = time.Duration(i) * time.Secon
} else {
log.Println("aws_assume_role_duration not provided. Defaulting to 1hr (3600 seconds) session length")
}
cfg, err := config.LoadDefaultConfig(context.Background(),
config.WithRegion(awsRegion),
)
if err != nil {
log.Fatal("error loading aws config: ", err)
}
stsClient := sts.NewFromConfig(cfg)
provider := stscreds.NewAssumeRoleProvider(stsClient, roleArn,
func(aro *stscreds.AssumeRoleOptions) {
aro.Duration = ssessionDuration
},
)
cfg.Credentials = aws.NewCredentialsCache(provider)
creds, err := cfg.Credentials.Retrieve(context.Background())
if err != nil {
log.Fatal("failed to authenticate to AWS with IAM role:", roleArn)
}
// in case we spawn processes that also need permissions
os.Setenv("AWS_ACCESS_KEY_ID", creds.AccessKeyID)
os.Setenv("AWS_SECRET_ACCESS_KEY", creds.SecretAccessKey)
os.Setenv("AWS_SESSION_TOKEN", creds.SessionToken)
return cfg
}
IMDSv2 And Hop Limit
By default we’re using IMDSv2 on our EC2 instances. In order for processes
running inside containers that Concourse spins up to successfully access the
IMDSv2 endpoints, we had to set the hop limit on the instance to 2
.