
So, I had a problem. I decided to solve it with docker, now I had two problems. But, I’m a glutton for punishment, so I decided to orchestrate my containers with kubernetes, and ended up with 99 problems.
What was I trying to solve? I wanted to have scalable self-hosted Azure DevOps agents. The examples here and here got me started. Sadly, most of the examples were in Azure Kubernetes service, and by all accounts, Azure just seems to know how to do services. AWS on the other hand, well that was an acute pain.
AWS’ kubernetes offering is called Elastic Kubernetes Service (eks), and by default, you’re expected to configure your node groups with ec2 instances. You see, AWS loves ec2 instances. The addiction runs deep. But as they saw: we move! I created my cluster to use Fargate.
Creating a docker image
To start with, I built the docker image based on the examples, built and ran the container on my local machine (macOs), sweet. Agents were registered with Azure DevOps (on prem) and Bob’s your uncle. Tried to host this image with AWS and I got the following error:
standard_init_linux.go:228: exec user process caused: no such file or directory
Google results point to crlf and lf and dos2unix utilities. Let me just say, I lost some time to this red herring. After searching just for the error code “standard_init_linux.go:228
” we realised the image was the culprit. You seethe Azure DevOps examples are based on Ubuntu based containers, but Fargate only supports Amazon Linux based containers. On their documentation page they state:

To address all this, I had to start with a new dockerfile:
FROM amazonlinux:latest # yup
# To make it easier for build and release pipelines to run apt-get,
# configure apt to not require confirmation (assume the -y argument by default)
# ENV DEBIAN_FRONTEND=noninteractive
RUN yum update -y && yum install -y git cmake gcc make \
openssl-devel libssh2-devel openssh-server \
ca-certificates curl wget jq tar libicu67 zip unzip \
openssl tree \
git-daemon
RUN yum install -y java-11-amazon-corretto-headless... #WHY?!?!?!
#....
All the apt-get install
had to be replaced with yum install
, where possible. I installed dotnet core sdk, and then had some weird behaviour with yum install -y java-11-openjdk-devel
with the package not found. Eventually I settled for
yum install -y
java-11-amazon-corretto-headless
and this worked consistently.
Docker image is built, and pushed to Amazon’s Elastic Container Register (ecr). Now on to hosting.
Creating a “serverless” kubernetes cluster in AWS
The cluster creation came with its own challenges. I initially used eksctl
, which creates and executes a dynamic CloudFormation template. However, since I wanted a repeatable process, I copied the resulting template and tweaked it before applying it.
Now I have an image, and with my kubernetes manifest I ran:
kubectl apply -f /tmp/azdo.yaml --namespace=azure-devops
No dice. error: You must be logged in to the server (Unauthorized)
. Turns out that if you create the cluster as a different iam role, forget about connecting to it with kubectl
from another role. Just forget it. So tear it all down and deploy it with the same role you intend to use (or one that can assume the role).
After connecting, the pods are all stuck in pending state, including all the coredns pods. Remember the addiction to ec2 instances? No coredns, to agents. Well, Amazon says under the section Update CoreDNS:

So, you need to run the following to fix that:
kubectl patch deployment coredns \
-n kube-system \
--type json \
-p='[{"op": "remove", "path": "/spec/template/metadata/annotations/eks.amazonaws.com~1compute-type"}]'
Then delete and recreate the pods: kubectl rollout restart -n kube-system deployment coredns
After this, I was finally able to deploy the pods. Oh, did I mention I also got ecs working, but that’s a story for another day. I use the metadata from the pod name or ecs task name to name my agents for ease of identification.

Below is my cloudformation template for creating the cluster:
AWSTemplateFormatVersion: 2010-09-09
Description: >-
EKS cluster (dedicated VPC: false, dedicated IAM: true) [created and managed
by eti@p82.com ]
Parameters:
1Name: # I always use this to have the stack name easily accessible :)
Description: Name
Type: String
Default: p80-solutions-azure-devops-eks-cluster
Mappings:
ServicePrincipalPartitionMap:
aws:
EC2: ec2.amazonaws.com
EKS: eks.amazonaws.com
EKSFargatePods: eks-fargate-pods.amazonaws.com
aws-cn:
EC2: ec2.amazonaws.com.cn
EKS: eks.amazonaws.com
EKSFargatePods: eks-fargate-pods.amazonaws.com
aws-us-gov:
EC2: ec2.amazonaws.com
EKS: eks.amazonaws.com
EKSFargatePods: eks-fargate-pods.amazonaws.com
Tags:
Name:
Value: "my-cluster"
Appid:
Value: "p80"
Appname:
Value: "p80-solutions"
Costcenter:
Value: "12345"
Owner:
Value: "eti@p82.com"
Resources:
ClusterSharedNodeSecurityGroup:
Type: 'AWS::EC2::SecurityGroup'
Properties:
GroupDescription: Communication between all nodes in the cluster
Tags:
- Key: Name
Value: !Sub '${AWS::StackName}/ClusterSharedNodeSecurityGroup'
- Key: Appid
Value: !FindInMap
- Tags
- Appid
- Value
- Key: Appname
Value: !FindInMap
- Tags
- Appname
- Value
- Key: Costcenter
Value: !FindInMap
- Tags
- Costcenter
- Value
- Key: Owner
Value: !FindInMap
- Tags
- Owner
- Value
VpcId: vpc-123456789
ControlPlane:
Type: 'AWS::EKS::Cluster'
Properties:
KubernetesNetworkConfig:
IpFamily: ipv4
Name: p80-solutions-devops-cluster
ResourcesVpcConfig:
EndpointPrivateAccess: true
EndpointPublicAccess: true
SecurityGroupIds:
- !Ref ControlPlaneSecurityGroup
- sg-123456
- sg-789012
- sg-345678
SubnetIds:
- subnet-12345678
- subnet-56781234
- subnet-90121234
EncryptionConfig:
- Provider:
KeyArn: 'arn:aws:kms:eu-central-1:123456789012:key/27d5e1dc-fd51-4a2a-bdc4-932f5e83bcce'
Resources:
- secrets
RoleArn: 'arn:aws:iam::123456789012:role/p80-solutions-eks-role'
Logging:
ClusterLogging:
EnabledTypes:
- Type: api
- Type: audit
- Type: controllerManager
- Type: scheduler
- Type: authenticator
Tags:
- Key: Name
Value: !Sub '${AWS::StackName}/ControlPlane'
- Key: Appid
Value: !FindInMap
- Tags
- Appid
- Value
- Key: Appname
Value: !FindInMap
- Tags
- Appname
- Value
- Key: Costcenter
Value: !FindInMap
- Tags
- Costcenter
- Value
- Key: Owner
Value: !FindInMap
- Tags
- Owner
- Value
- Key: purpose
Value: 'Azure DevOps'
Version: '1.22'
ControlPlaneSecurityGroup:
Type: 'AWS::EC2::SecurityGroup'
Properties:
GroupDescription: Communication between the control plane and worker nodegroups
Tags:
- Key: Name
Value: !Sub '${AWS::StackName}/ControlPlaneSecurityGroup'
- Key: Appid
Value: !FindInMap
- Tags
- Appid
- Value
- Key: Appname
Value: !FindInMap
- Tags
- Appname
- Value
- Key: Costcenter
Value: !FindInMap
- Tags
- Costcenter
- Value
- Key: Owner
Value: !FindInMap
- Tags
- Owner
- Value
VpcId: vpc-123456789
IngressDefaultClusterToNodeSG:
Type: 'AWS::EC2::SecurityGroupIngress'
Properties:
Description: >-
Allow managed and unmanaged nodes to communicate with each other (all
ports)
FromPort: 0
GroupId: !Ref ClusterSharedNodeSecurityGroup
IpProtocol: '-1'
SourceSecurityGroupId: !GetAtt
- ControlPlane
- ClusterSecurityGroupId
ToPort: 65535
IngressInterNodeGroupSG:
Type: 'AWS::EC2::SecurityGroupIngress'
Properties:
Description: Allow nodes to communicate with each other (all ports)
FromPort: 0
GroupId: !Ref ClusterSharedNodeSecurityGroup
IpProtocol: '-1'
SourceSecurityGroupId: !Ref ClusterSharedNodeSecurityGroup
ToPort: 65535
IngressNodeToDefaultClusterSG:
Type: 'AWS::EC2::SecurityGroupIngress'
Properties:
Description: Allow unmanaged nodes to communicate with control plane (all ports)
FromPort: 0
GroupId: !GetAtt
- ControlPlane
- ClusterSecurityGroupId
IpProtocol: '-1'
SourceSecurityGroupId: !Ref ClusterSharedNodeSecurityGroup
ToPort: 65535
Outputs:
ARN:
Value: !GetAtt
- ControlPlane
- Arn
Export:
Name: !Sub '${AWS::StackName}::ARN'
CertificateAuthorityData:
Value: !GetAtt
- ControlPlane
- CertificateAuthorityData
ClusterSecurityGroupId:
Value: !GetAtt
- ControlPlane
- ClusterSecurityGroupId
Export:
Name: !Sub '${AWS::StackName}::ClusterSecurityGroupId'
ClusterStackName:
Value: !Ref 'AWS::StackName'
Endpoint:
Value: !GetAtt
- ControlPlane
- Endpoint
Export:
Name: !Sub '${AWS::StackName}::Endpoint'
FeatureNATMode:
Value: Disable
SecurityGroup:
Value: !Ref ControlPlaneSecurityGroup
Export:
Name: !Sub '${AWS::StackName}::SecurityGroup'
SharedNodeSecurityGroup:
Value: !Ref ClusterSharedNodeSecurityGroup
Export:
Name: !Sub '${AWS::StackName}::SharedNodeSecurityGroup'
SubnetsPrivate:
Value: !Join
- ','
- - subnet-12345678
- subnet-56781234
Export:
Name: !Sub '${AWS::StackName}::SubnetsPrivate'
SubnetsPublic:
Value: subnet-90121234
Export:
Name: !Sub '${AWS::StackName}::SubnetsPublic'
VPC:
Value: vpc-123456789
Export:
Name: !Sub '${AWS::StackName}::VPC'