Skip to main content

Command Palette

Search for a command to run...

AWS ECS & Fargate: containers without the cluster headache

Updated
10 min read
AWS ECS & Fargate: containers without the cluster headache

AWS ECS & Fargate: containers without the cluster headache

Series: AWS Field Guide · Part 4 of 6 — Compute This series: 01 — Overview · 02 — EC2 · 03 — Lambda · 04 — ECS & Fargate · 05 — EKS · 06 — Wrap-up


The scenario

Your team has been running a Python API on EC2. It works, but deployments are a manual process — SSH in, pull the latest code, restart the process, hope nothing breaks. A new engineer joins and asks why you're not using Docker. You don't have a good answer.

You containerise the app. Now what? You could run the container on EC2 and manage it yourself — but then you're back to babysitting servers. You've heard of Kubernetes but the team isn't ready for that complexity. You need something in between: a way to run containers reliably at scale, without becoming a platform team.

That's ECS with Fargate.

TL;DR: ECS (Elastic Container Service) is AWS's container orchestration platform. Fargate is the launch type that removes the need to manage the underlying servers — you define your container, ECS runs it, and AWS handles the machines underneath. It's the pragmatic middle path between "raw EC2" and "full Kubernetes."


The problem it solves

Running containers in production is harder than running them locally. You need to handle scheduling (which machine does this container run on?), health checking (is the container still alive?), rolling deployments (update without downtime), service discovery (how do other services find this one?), and scaling (spin up more when load increases).

Doing all of this yourself on raw EC2 is possible — and painful. ECS handles it for you. Fargate goes further by removing the underlying EC2 fleet entirely, so you're not managing instance types, cluster capacity, or OS patches.


Core concepts

Clusters

A cluster is the logical boundary for your ECS resources — think of it as the environment (production, staging, dev). It doesn't represent physical infrastructure on its own; it's a namespace that groups tasks and services together.

Task definitions

A task definition is a blueprint for your container workload — the equivalent of a docker-compose.yml but for ECS. It specifies:

  • Which container image(s) to run and from where (ECR, Docker Hub)

  • CPU and memory requirements

  • Port mappings

  • Environment variables and secrets

  • IAM task role (what AWS permissions the container has)

  • Logging configuration

  • Volume mounts

Task definitions are versioned. Every change creates a new revision, which means deployments are auditable and rollbacks are straightforward.

Tasks vs services

A task is a single running instance of a task definition — a container (or group of containers) running until it completes or is stopped. Tasks are for one-off or batch workloads: run a migration, process a file, execute a job.

A service is ECS's way of keeping a long-running task alive. You tell ECS "I want 3 copies of this task definition running at all times," and ECS ensures exactly that — replacing failed tasks, spreading them across availability zones, integrating with a load balancer, and handling rolling deployments.

For APIs, web servers, and background workers that need to stay up, you want a service.

Launch types: Fargate vs EC2

This is the most important architectural decision in ECS:

Fargate — AWS manages the underlying compute entirely. You specify vCPU and memory per task; AWS picks a machine, runs your container, and you never see the host. No EC2 instances to patch, no cluster capacity to manage, no AMIs to maintain. You pay per task per second.

EC2 launch type — you manage a cluster of EC2 instances that ECS schedules containers onto. More control (custom AMIs, GPU instances, specific networking configurations), but you're back to managing servers — just with containers on top.

For most teams, start with Fargate. Move to EC2 launch type only when you have a specific reason: GPU workloads, very high container density requirements, or compliance constraints around shared tenancy.

ECR — Elastic Container Registry

ECR is AWS's Docker image registry — where your container images live before ECS pulls them. It integrates natively with IAM (no Docker Hub credentials to rotate), supports image scanning for vulnerabilities, and keeps your images in the same region as your cluster (faster pulls, no egress costs).


Minimal working example

Deploy a containerised API to ECS with Fargate using the AWS CLI:

# 1. Create an ECS cluster
aws ecs create-cluster --cluster-name my-api-cluster

# 2. Register a task definition
aws ecs register-task-definition --cli-input-json '{
  "family": "my-api",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512",
  "executionRoleArn": "arn:aws:iam::123456789:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::123456789:role/ecsTaskRole",
  "containerDefinitions": [{
    "name": "my-api",
    "image": "123456789.dkr.ecr.us-east-1.amazonaws.com/my-api:latest",
    "portMappings": [{"containerPort": 8080, "protocol": "tcp"}],
    "logConfiguration": {
      "logDriver": "awslogs",
      "options": {
        "awslogs-group": "/ecs/my-api",
        "awslogs-region": "us-east-1",
        "awslogs-stream-prefix": "ecs"
      }
    }
  }]
}'

# 3. Create a service to keep 2 tasks running
aws ecs create-service \
  --cluster my-api-cluster \
  --service-name my-api-service \
  --task-definition my-api \
  --desired-count 2 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={
    subnets=[subnet-abc123,subnet-def456],
    securityGroups=[sg-xyz789],
    assignPublicIp=ENABLED
  }"

Pricing model

Fargate pricing is based on the vCPU and memory your tasks request, billed per second with a 1-minute minimum.

Current rates (us-east-1):

  • vCPU: $0.04048 per vCPU per hour

  • Memory: $0.004445 per GB per hour

Example — two-task API service running 24/7:

Each task: 0.25 vCPU, 0.5GB memory

vCPU cost:    2 tasks × 0.25 vCPU × \(0.04048 × 730 hrs  = \)14.78/month
Memory cost:  2 tasks × 0.5GB    × \(0.004445 × 730 hrs  =  \)3.25/month
Total:                                                     ~$18/month

Compare that to a t3.small EC2 instance (~$15/month) running the same two containers — Fargate is slightly more expensive at this scale, but you're not managing the EC2 instance, its OS, or its capacity. The operational saving is worth the premium for most teams.

At higher scale (50+ tasks), the EC2 launch type often becomes cheaper because you can pack more containers per instance and use Reserved Instance or Spot pricing. Fargate Spot also exists — up to 70% off for interruption-tolerant workloads.


When to use ECS + Fargate (and when not to)

Use ECS + Fargate when:

  • ✅ Your workload is containerised (or you're ready to containerise it)

  • ✅ You want managed container orchestration without Kubernetes complexity

  • ✅ You need long-running services that Lambda's 15-minute limit rules out

  • ✅ You want rolling deployments, health checks, and load balancer integration out of the box

  • ✅ Your team knows Docker but not Kubernetes

  • ✅ You want to scale to zero with Fargate Spot for non-critical workloads

Don't use ECS + Fargate when:

  • ❌ Your workload is short-lived and event-driven — Lambda is simpler and cheaper

  • ❌ You need Kubernetes-specific features (custom operators, CRDs, Helm charts from your existing infra)

  • ❌ You need GPU instances or bare metal — use EC2 launch type instead of Fargate

  • ❌ You have very high container density requirements — EC2 launch type with large instances is more cost-efficient

  • ❌ Your team is already running Kubernetes everywhere else — consistency across your stack matters


Common gotchas

1. The IAM task role vs execution role confusion. ECS uses two separate IAM roles and conflating them is the most common source of permission errors on first deployments.

The execution role (ecsTaskExecutionRole) is used by the ECS agent — it's what allows ECS to pull your image from ECR and write logs to CloudWatch. AWS provides a managed policy for this.

The task role is used by your application code — it's the IAM role your container assumes when making AWS API calls (reading from S3, writing to DynamoDB, etc.). This is the one you customise for your application's needs.

Missing the task role entirely is why "my container can't access S3" is a top ECS Stack Overflow question.

2. awsvpc networking and security groups. Fargate tasks run in awsvpc network mode — each task gets its own elastic network interface (ENI) and its own IP address. This is great for security isolation but means your security groups need to allow traffic between tasks, not just from the load balancer. A common first-deploy failure is a healthy task that can't receive traffic because the security group doesn't allow inbound on the container port.

3. Container image pull failures at deploy time. ECS pulls your image at task launch time. If ECR authentication fails, the image tag doesn't exist, or the image is in a different region, the task fails to start with a cryptic error. Always verify your image URI and tag before deploying, and make sure your execution role has ecr:GetAuthorizationToken and ecr:BatchGetImage permissions.

4. Log configuration is not automatic. Unlike EC2 where you might SSH in and tail a log file, Fargate containers have no accessible host. If you don't configure the awslogs log driver in your task definition, your container logs go nowhere. Always set up CloudWatch Logs in the task definition — it's a one-time setup that saves hours of debugging later.

5. Desired count of zero doesn't mean free. Scaling a service to zero tasks stops compute charges, but you still pay for any attached load balancers (~$16/month for an ALB). For development environments you want to shut down completely, you may need to also delete or disable the load balancer.


Compared to the alternatives

ECS + Fargate vs Lambda

Lambda is cheaper and simpler for event-driven, short-lived tasks. Fargate wins for long-running services, persistent connections, and workloads that don't fit Lambda's execution model. The practical dividing line is the 15-minute timeout and whether your workload is request-driven or continuously running.

ECS + Fargate vs EKS

EKS gives you the full Kubernetes API — custom resource definitions, Helm charts, advanced scheduling, and portability across cloud providers. ECS is simpler, AWS-native, and has less operational overhead. If your team doesn't have a specific Kubernetes requirement, ECS will serve you well until you're a much larger engineering organisation. We cover this in detail in the next article.

ECS + Fargate vs Google Cloud Run

Cloud Run is the closest equivalent on GCP — managed containers, scale to zero, pay per use. Cloud Run is arguably simpler to get started with and has better scale-to-zero economics for very low-traffic services. If you're AWS-native, Fargate's tighter integration with IAM, VPC, and the rest of the AWS ecosystem makes it the clear choice.


Key takeaways

  • ECS is AWS's container orchestration platform. Fargate is the launch type that removes the underlying server management entirely.

  • The core primitives are clusters (environments), task definitions (blueprints), tasks (one-off runs), and services (long-running, self-healing workloads).

  • Start with Fargate. Move to the EC2 launch type only when you have a specific reason — GPU workloads, high container density, or cost at very large scale.

  • The IAM execution role and task role are different things with different purposes. Getting this wrong is the most common first-deploy failure.

  • Always configure CloudWatch Logs in your task definition. Fargate containers have no accessible host — logs are your only debugging surface.


Up next

Part 5 → AWS EKS: Kubernetes on AWS — power, complexity, and when it's worth it

We go deep on EKS — the managed control plane, node groups, the ECS vs EKS decision framework, and the networking complexity that catches most teams off guard on their first cluster.


Previously

Part 3 → AWS Lambda: the serverless sweet spot (and where it falls apart)

Covers Lambda's execution model, invocation types, concurrency, cold starts, and the hidden production pitfalls — including the database connection problem that takes down more than a few first deployments.


Part of the AWS Field Guide series. Tags: #aws #ecs #fargate #containers #aws-field-guide

More from this blog

Cloud Tuned

640 posts

Your starting point for anything cloud: AWS, Azure, GCP, Serverless, Architecture, Hybrid Cloud, Systems Design and other Information Technology topics.