ECS

From DikapediaV2
Revision as of 00:10, 27 August 2024 by Ardika Sulistija (talk | contribs) (Created page with "<b>ECS</b> - Amazon Elastic Container Service https://docs.aws.amazon.com/AmazonECS/latest/developerguide/Welcome.html Related Topics: <b>Docker</b>, <b>Docker Troubleshooting and CLI</b>, <b>ECR</b> Really good blog post about how ECS manages CPU and memory resources: https://aws.amazon.com/blogs/containers/how-amazon-ecs-manages-cpu-and-memory-resources/ ====ECS Components==== ---- =====<u><b>Cluster</b></u>==...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

ECS - Amazon Elastic Container Service


https://docs.aws.amazon.com/AmazonECS/latest/developerguide/Welcome.html

Related Topics: Docker, Docker Troubleshooting and CLI, ECR

Really good blog post about how ECS manages CPU and memory resources: https://aws.amazon.com/blogs/containers/how-amazon-ecs-manages-cpu-and-memory-resources/

ECS Components


Cluster

A logical grouping of resources, namely services and tasks. With Fargate, clusters can exist without instances:

  • Region specific
  • Support ECS and/or Fargate launch types
  • EC2 Launch Type Cluster may contain different instances types (t2, m4, m5, c4, etc.)
  • Access to cluster resources for IAM Users can be restricted using policies

Each instance can be a member of only one cluster.

Container Instance

EC2 instance running the ECS Agent which has registered itself to a cluster.

Requirements:

  • You must have unrestricted outbound connectivity to the Public Internet (NAT, HTTP proxy, or IGW) in order to connect to the ECS HTTPS endpoint.
  • Linux kernel 3.10 and above
  • Latest ECS agent running
  • Docker Daemon v1.50 and above.
  • Must be an EC2 instance
  • Once a container instance is registered on a cluster, you cannot change the instance type. You must replace the instance with a new one.
  • AWS provides an Optimized AMI to run ECS on it (recommended), which simplifies the setup steps needed to run ECS on a normal instance, but you can use your own AMI to run the ECS Agent and have your customized ECS AMI.
    • Doesn't apply to Fargate, as Fargate abstracts the Container Instances outside the customer control.


Container Instance States:

Active
    • Container instance can accept run task requests.
    • Active container instance is the one that will be used to place tasks scheduled by a Scheduler or the user.
Draining
    • https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-instance-draining.html
    • Container instance draining enables you to remove a container instance from a cluster without impacting tasks in your cluster.
    • When you set a container instance to DRAINING, Amazon ECS prevents new tasks from being scheduled for placement on the container instance. Service tasks on the draining container instance that are in the PENDING state are stopped immediately. If there are container instances in the cluster that are available, replacement service tasks are started on them.
    • Service tasks on the container instance that are in the RUNNING state are stopped and replaced according to the service's deployment configuration parameters, minimumHealthyPercent and maximumPercent.
    • To gracefully stop tasks on an instance.
    • No new Tasks are placed on the instance.
    • Active Services are terminated and relocated to other instances.
Inactive
    • If you deregister or terminate a container instance, the container instance status changes to INACTIVE immediately.


Bootstrapping

In general, bootstrapping refers to a self-starting process that is supposed to proceed without any external input.

Container instances can be bootstrapped to run configurations on startup by leveraging supported user-data formats: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/bootstrap_container_instance.html

  • User Data script
    • User-data scripts run only during the first boot cycle.
    • Mainly used to pass configuration options to the ECS Agent. The most common example is to join an instance to the cluster or even pull a pre-configured ecs.config file from an S3 bucket.
    • The ECS agent starts later in the boot process, hence the shell-script passed through user data will be executed before the ECS agent starts.
    • The Cloud-init service will save the scripts passed through user data in directory /var/lib/cloud/instances/<instance ID>/scritps directory. These scripts are executed in the final stage of the boot.
    • Any supported configuration option that needs to be applied to ECS agent can be passed in this way.
    • Docker daemon is started before the shell-script's in uesr-data is executed.
    • Example of using user data to configure ECS agent:
      • Here we are passing the ECS cluster name and ECS agent log level configuration. When the ECS agent starts, it will read the cluster name from this file and registers itself with the specified cluster.
#!/bin/bash
echo "ECS_CLUSTER=MyCluster" >> /etc/ecs/ecs.config
echo "ECS_LOGLEVEL=debug" >> /etc/ecs/ecs.config
  • Cloud-boothook
    • Cloud-boothook is the earliest "hook" available. User-data scripts run much later in the process !!!!!!!!
    • Cloud boothook runs on every instance boot and its frequency can be controlled by cloud-init-per to control the frequency of execution.
    • To configure Docker daemon before the daemon starts, use this. The content in boothook is saved in /var/lib/cloud directory and is executed immediately.
    • If we check the order of execution in /etc/init we can see that cloud-init is executed before the docker service is started. Hence passing Docker configuration through boothook will make sure the Docker configuration file is updated as soon as cloud-init detects it and before Docker daemon is started.
    • A common example of passing configuration settings to the docker daemon would be the modification of the base device size. By default, docker limits the size of images and containers to 10GB. There may be some users that will need to increase the dm.basesize
#cloud-boothook
cloud-init-per once docker_options echo
'OPTIONS="${OPTIONS} --storage-opt
dm.basesize=20G"' >> /etc/sysconfig/docker
  • MIME multipart
    • This format allows you to specify more than one type of data. I.e. user-data script and cloud-boothook type. Example where both the ECS container agent is configured to register the instance to a cluster and also configures the base device size to 20GB: See Specifying Multiple User Data Blocks Using a MIME Multi Part Archive here
    • To configure Docker daemon you use cloud-boothook, and to configure ecs agent you would want to use shell script format.
    • In this example, we can see a MIME multi part file consists "Content-Type" attribute with value "multipart/mixed", which indicates that the file consists of multiple user data blocks. There are boundaries indicating the start and end of the whole MIME file and also boundaries indicating the start and end of each block within the file.
    • Within each block there is again a "Content-Type" attribute indicating what type of user data format it represents. In the example we can see that there is a Cloud-boothook format and shell script fromat combined together.
    • MIME multi part file example shown below will configure Docker daemon to set container base size to 200GB and it also sets ECS agent configuration options.
Content-Type: multipart/mixed;
boundary="==BOUNDARY=="
MIME-Version: 1.0
--==BOUNDARY==
Content-Type: text/cloud-boothook; charset="us-ascii"
# Set Docker Daemon options
cloud-init-per once docker_options echo 'Options="${OPTIONS} --storage-opt dm.basesize=200GB"'>> /etc/sysconfig/docker
--==BOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
# Set any ECS agent configuration options
echo ECS_CLUSTER=MyCluster >> /etc/ecs/ecs.config
echo ECS_LOGLEVEL=debug >> /etc/ecs/ecs.config
--==BOUNDARY==--


Click here to learn how to bootstrap an ECS container instance to mount an existing EFS volume to the instance.



Container agent

Application running on ECS Container instances that performs the "bridge" between the local Docker Daemon API and the ECS Endpoint.


Scheduler

ECS mechanism that determines the task placement on specific container instances based on a set algorithm that takes into account several variables.

  • On ECS you can use the ECS Scheduler, but as we provide an APi to start/run tasks, you can write your own scheduler in any language supported by the AWS SDK.
  • Also you can use the BLOX open source scheduler project with ECS.
  • Mesos also has a driver provided by AWS for scheduling tasks.

How to create a custom scheduler for ECS: https://aws.amazon.com/blogs/compute/how-to-create-a-custom-scheduler-for-amazon-ecs/


Task Definition

JSON formatted template describing what the associated Task will run. Contains one or more container specifications Docker run parameters. Also defines the Networking Mode, Task Placement constraints and volume mounts associated with the task.

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definitions.html

  • Before you can run a task on your ECS cluster, you must register a task definition. Task definitions are lists of containers grouped together.
  • Aside from using the console, you can also register a Task Definition by sending using a JSON formatted file or input to the AWS CLI command.


Task

Good Read: https://aws.amazon.com/blogs/containers/how-amazon-ecs-manages-cpu-and-memory-resources/

In ECS, the basic unit of a deployment is a task, a logical construct that models one or more containers. This means that the ECS APIs operate on tasks rather than individual containers. In ECS, you can’t run a container: rather, you run a task, which, in turns, run your container(s). A task contains (no pun intended) one or more containers. An instantiation of a Task Definition which will launch the container or containers listed within the Task Definition.

In other words, a task is an instantiation of a Task Definition which will launch the container or containers listed within the Task Definition.

  • Placement can be defined by the scheduler or specified by the API.
  • Ideal for short live tasks or batch processing.


Scheduled Tasks
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/scheduling_tasks.html
  • Cron like tasks
  • CloudWatch Events

ECS Scheduled Tasks provide the avility to run ECS tasks on a regular, scheduled basis and in response to CloudWatch Events Rules. This makes it easier to launch Docker containers that you need to run only at certain times.

  • ECS scheduled tasks can be run on a specific interval called, rate, or on a ,ore detailed pattern using a cron expression.
  • Scheduled tasks are managed through the CloudWatch Events service events API and not ECS API calls.
  • CloudWatch Events also allows you to run ECS Tasks as a target of event based rules.


Rate: the minimum amount of time allowed to schedule a Task is 1 minute.


Task definitions that use the awsvpc network mode are not supported for scheduled tasks.


Common Use cases:

  • Batch processing: launch a Task to process a specific file that is uploaded into a S3 bucket everyday at the same time.
  • Retrieving log files in a most cost-effective way: Running task that are long and not suitable for Lambda functions, like gathering and processing big logs from an RDS Data Base, using Spot Container Instance to make it cost-effective


How to Run Scheduled Tasks:

  • Before you can use ECS Scheduled Tasks you need to be sure you have the right permissions in place, as the CloudWatch Events service needs permission to run ECS tasks on your behalf.
  • These permissions are provided by the CloudWatch Events IAM role, called the ecsEventsRole. A new specific IAM role is automatically created in the AWS Management Console when a scheduled task is configured, but it needs to be created manually if you want to configure the same role for all your Scheduled Tasks or if you want to run them programmatically. Click here for more info.


Schedule Expressions for Rules:
For a rule to be triggered every day at 12:00pm UTC:

cron(0 12 * * ? *)

For a rule to be triggered every day, at 5 and 35 minutes past 2:00pm UTC:

cron(5,35 14 * * ? *)

For a rule to be triggered at 09:10am UTC on the last Friday of each month during the years 2013 to 2018:

cron(10 9 ? * 6L 2013-2018)
  • You can't specify the day-of-month and day-of-week fields in the same cron expression.
  • If you specify a value or an asterisk in one of the fields, you must use a question mark in the other.
  • Also, cron expressions that lead to rates faster than 1 minute are not supported.

For a rule to be triggered every 5 minutes:

rate(5 minutes)

For a rule to be triggered every 1 hour:

rate(1 hour)

For a rule to be triggered every 1 day:

rate(1 day)
  • On rate expressions, if the value is equal to 1, then the unit must be singular. Similarly for values greater than 1, the unit must be plural.


Troubleshooting ECS Scheduled Tasks:

  • Troubleshooting ECS scheduled tasks starts with checking the state of the task, and determining whether the task is in running or in stopped state.
  • If the Task should be running but is stopped or failing to start, the troubleshooting is done on ECS side.
  • If the task was not running on the ECS cluster, then it needs to be investigated why it was not invoked by CloudWatch events. So check:
    • Failed Invocations
    • Invocations metrics.


One of the most common causes for a Failed Invocation Task is the lack of permissions. If the scheduled task requires a specific task execution role or a task role override, then the iam:PassRole permissions need to be added to the CloudWatch Events IAM role, for the CloudWatch Event rule to be able to "forward" the role to the ECS Scheduled Task.

  • Solution: A new policy needs to be created and attached to the CloudWatch Events IAM role. The policy needs to specify the full ARN of the task execution role or task role override. This is an example of a policy for an ECS Scheduled Task that needs a specific task role.

Other Common issues include:

  • Cron Validation Error
    • Scheduled task using a cron expression fails with a validation error on the scheduled tasks syntax, even though the syntax is correct according to documentation, because it is not possible to specify the day-of-month and day-of-week fields in the same cron expression. To solve this, if a value or a * is specified in one of the fields, a question must (?) must be used in the other.
  • Task Definition Misconfiguration
    • Task definitions that use the awsvpc network mode are not supported for scheduled tasks. For this reason, the Fargate launch type is not supported either. To solve this a different network mode like "Bridge" should be selected.
  • The rule did not trigger at the expected time
    • CloudWatch Events doesn't support setting an exact start time when you create a rule to run every time period. The count down to run time begins as soon as you create the rule.


Pending Task

PENDING tasks count = the number of tasks scheduled to launch which are yet to receive the task result state from ECS agent.

ECS Scheduler will examine the cluster for task placement.

ECS service increments PENDING task count after acknowledgement, clears PENDING task count after receiving tasks result state.


The ECS Scheduler will MAINTAIN the PENDING task count on the following scenarios:

  • The ECS agent received the scheduled task and task launch was already initiated.
  • The ECS agent is experiencing an issue while updating the task result state to Scheduler due to issues local to the instance.
  • The ECS agent is experiencing a problem in the middle of task launch and lost connectivity with Scheduler due to external factors.
  • The ECS agent is in the process of stopping the existing tasks, and a new task will not start until the stopping task has fully stopped.


The ECS Scheduler will CLEAR the PENDING task count on following scenarios:

  • ECS agent successfully launched the task and updated the Scheduler with task in RUNNING state.
  • The container instance where the task scheduled to launch is terminated.


If the ECS Schedyler continues to show PENDING status for a while, then it could indicate a problem at the instance side, whcih impacts the ECS agent to update the ECS scheduler with result state of the task launch. The following are common areas that could affect the ECS agent in updating the task result state to the ECS scheduler:

  • ECS Instance
    • Network congestion
      • The ECS agent running on a contianer instance depends on outbound internet connectivity over HTTPS to perform following on container instance.
      • The following are issues that could affect the container instance Network connectivity in reaching out to the ECS service endpoint or dependent service like ECR:
        • NAT Instance - ENI bandiwdth or CPU credits
        • NAT gatway - ENI bandwidth
        • VPC Endpoint - ENI bandwidth
        • VPN Gateway issues
        • HTTP Proxy issues.
    • Resource issues
      • The ECS agent and task requires system resources for its execution, and if the container instance experiences any resource challenges in the middle of ECS agent task placement, that will delay the task launch or agent polling operation to scheudler. The scheduler will keep the positive value in PENDING task until it receives task launch state information form the ECS agent.
      • The following are the issues that could affect the resource allocation:
        • Container instances (T2) CPU utilization exceeds CPU credit balance for a longer duration or ran out of CPU credits. Analyzing the following CW instance metrics will be helpful in confirming the issue:
          • CPUUtilization
          • CPUCreditUsage
          • CPUCreditBalance
        • Container Instances (GP) disk IOPS utilization exceeds IO credit balance for a longer duration or ran out of IO credits. CW metrics to look at:
          • BurstBalance
          • VolumeReadOps
          • VolumeWriteOps
          • VolumeThroughputPercentage
  • ECS agent
    • IAM permission
      • The "ecs:Submit*" IAM permission is crucial for ECS agent to send a result state of task launch to scheduler, to avoid task hung up in PENDING status.
      • You will either "error - access denied" stuff in the ECS agent logs
    • Agent Status Issues
      • If the ECS agent disconnects and fails to reconnect to cluster/scheduler in the middle of task launch that will cause a PENDING task issue. Several reasons that might impact the ECS agent status and connectivity:
        • The ECS agent service status on container instance is not UP, perhaps due to crash or stopped by user.
        • ECS agent may be impacted by a bug affecting a current or old version. (Review github for info)
        • Network issues
    • Configuration Issues
      • If Cx tweaks the configuration variable 'ECS_CONTAINER_STOP_TIMEOUT' to a higher value in ECS agent configuration that may lead to PENDING tasks issue.
      • The 'ECS_CONTAINER_STOP_TIMEOUT' configuration instructs Docker service to wait for the given period to allow the container to exist gracefully and 'SIGKILL' signal will be sent if the cotnainer does not exit within the stop timeout period.
      • When ECS agent STOPS the task, the equivalent of docker stop is issued to the containers running in the task. It results in a SIGTERM and a default 30-second timeout, after which SIGKILL is sent and the containers are forcibly stopped. If the container handles the SIGTERM gracefully and exits within 30 seconds from receiving it, no SIGKILL is sent.
      • Example: if we use 30 minutes as the stop timeout and assume container experiences issue to exit gracefully. ECS agent has to wait up to 30 minutes before applying SIGKILL and to update the result of 'container stop' to ECS scheduler. ECS scheduler will keep the tasks scheduled on the container instance in PENDING status until it receives an update from ECS agent.
  • Image registry slowness.
    • The ECS agent will run a task on container instance once it is received the schedule from the ECS service. Task launch depends on image pull from ECR or DockerHub or Private Image registry.
    • The network congestion between container instance to image repository will slow down the image download. The default 'net.ipv4.tcp_keepalive_time' is 7200 seconds so container to image repository connection will continue to be active up to a maximum of 2 hours and this will delay the task launch. The ECS service will keep the progressing task in PENDING status until docker run completes or timed out.
    • The following are may affect image download speed:
      • Image registry performance issues
      • S3 performance issues (ECR depends on S3)
      • VPC Endpoint Issues - ENI bandwidth

Click here to learn various ways on how to find PENDING tasks count per cluster, service and instance.


Service

Allows you to run and maintain a specified number of instances of a task definition. Are capable of integrating with ELB/ALB and of being scaled up or down based on specific CloudWatch Metrics by leveraging Application Auto Scaling.


Container

A Docker container that was launched as part of a task.


ECS Optimized AMI

It's optimized, meaning there's a minimal number of packages installed, and it is an EC2 image preconfigured and tested by AWS engineers.


ECS Optimized AMI components:

  • Minimalized Amazon Linux AMI
  • ECS Agent
  • Docker
  • ECS-init
  • Pre-configured Storage

Launching options to launch an instance that uses the ECS Optimized AMI:

  • First Run Wizard
  • EC2 Console (via the Marketplace and Community AMIs)
  • AWS CLI/SDK
  • Cloudformation
  • OpsWorks

Storage:
ECS Optimized AMIs come with 30 GB of storage by default. Older AMIs will differe on how the sotrage is configured.

  • AMI version 2015.09c and older had a single 30 GB EBS volume.
  • Ami versions 2015.09d and newer split it ito two EBS volumes:
    • 8GB volume for root
    • 22GB volume for Docker image and metadata.


Container Agent



ECS container agent:

  • Is an open source application and programmed in Go.
  • ECS agent is essentially what allows the instances themselves to communicate with the cluster, and it processes ECS commands and turns them into Docker commands. This instructs the EC2 instances to start and stop containers, and monitor available resources that it has across the cluster.
  • Running the ECS container agent outside of Amazon EC2 is not supported.
  • Included in ECS Optimized AMI.


ECS Agent states:

  • Connected
    • ECS Agent CONNECTED status indicates the agent is able to communicate with the ECS Scheduler on AWS.
  • Disconnected
    • ECS Agent DISCONNECTED status indicates there is a problem with the agent. It can be networking or permissions generally.
    • Agents periodically disconnects and reconnects (several times per hour) as a part of its normal operation. This is done to ensure that the connectivity can be established, and that there are no issues with the connectivity.
      • If an agent is kept in disconnected state for a long period, this indicates a problem with the agent, instance, permissions, connectivity, or the service.


Agent Configuration

The Amazon ECS container agent supports a number of configuration options, most of which should be set through environment variables. The following environment variables are available, and all of them are optional.

The ECS Agent will search for configuration options from the /etc/ecs/ecs.config file.

Example: To register an EC2 instance with ECS cluster:

  • Attach IAM role to instance
  • add the attribute ECS_CLUSTER to the file:
$ cat /etc/ecs/ecs.config 
ECS_CLUSTER=FirstCluster
ECS_LOGLEVEL=debug

Here are the available parameters: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-config.html#ecs-agent-availparam


Amazon/Linux AMI

On Amazon or other Linux AMIs, ECS container agent will most likely be run in a Docker container on an EC2 instance.

The ECS container agent may also be run outside of a Docker container as a Go binary. However, this is not recommended for production on Linux, but it can be useful for development or easier integration with your local Go tools.


Windows Server 2016

On Windows Server 2016, the container agent runs as a service on the host. Unlike Linux, the agent may not run inside a container as it uses the host's registry and the named pipe at \\.\pipe\docker_engine to communicate with the Docker daemon.


API

Also, ECS container agent provides an API for gathering details about the container instance.

You can use the curl command from within the container instance to query the Amazon ECS container agent (port 51678) and return container instance metadata or task information.


Agent Debugging

There are logs available from the Amazon ECS container agent and the ecs-init service that controls the state of the agent (start/stop) on the container instance.

You can increase the verbosity of the container agent logs by setting ECS_LOGLEVEL=debug and restarting the container agent.

If you are unsure how to collect all of the various logs on your container instances, you can use the Amazon ECS log collector: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-logs-collector.html

Click here for the log file locations.


Updating the container agent

Updating the Amazon ECS Container Agent https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-update.html

Checking Your Amazon ECS Container Agent Version https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-update.html#checking_agent_version

Updating the Amazon ECS Container Agent on an Amazon ECS-optimized AMI https://docs.aws.amazon.com/AmazonECS/latest/developerguide/agent-update-ecs-ami.html

Manually Updating the Amazon ECS Container Agent (for Non-Amazon ECS-Optimized AMIs) https://docs.aws.amazon.com/AmazonECS/latest/developerguide/manually_update_agent.html


How to stop/start the ecs agent
$ sudo stop ecs
$ sudo start ecs

# OR

$ sudo systemctl stop ecs
$ sudo systemctl start ecs


How to check ECS Agent status and connectivity

Commands to verify ECS Agent Status and Connectivity:

$ sudo status ecs
$ sudo docker ps -filter name=ecs-agent

Commands to view ECS container instance metadata & running tasks info:

$ curl http://localhost:51678/v1/metadata
$ curl http://localhost:51678/v1/tasks

How to verify the Instance to ECS/TCS/ACS endpoing connectivity:

$ telnet ecs.<region>.amazonaws.com 443
$ telnet ecs-t-1.<region>.amazonaws.com 443
$ telnet ecs-a-1.<region>.amazonaws.com 443


The ecs-init service


ecs-init service

  • Controls the starting and stopping of the agent at boot and shutdown.
  • Also handles starting the agent if it crashes.

Amazon ECS RPM

  • The Amazon ECS RPM is packaged for RPM-based systems that utilize Upstart as the init system.
    • The upstart script installed by the ECS container service RPM can be started or stopped like so: see here.
  • RPM Logs info:
    • Logs from the RPM are available at /var/log/ecs/ecs-init.log, while logs from the Amazon ECS container agent are available at /var/log/ecs/ecs-agent.log.
    • The Amazon ECS RPM makes the Amazon ECS Container Agent introspection endpoint available at http://127.0.0.1:51678/v1 or http://localhost:51678/v1
    • Configuration for the Amazon ECS container agent is read from /etc/ecs/ecs.config.

Ecs-init Github source code: https://github.com/aws/amazon-ecs-init


Configuration and log files


  • Configuration for the Amazon ECS container agent is read from /etc/ecs/ecs.config.
  • Logs from the RPM are available at /var/log/ecs/ecs-init.log
  • Logs from the Amazon ECS container agent are available at /var/log/ecs/ecs-agent.log.
  • For Windows: both ECS agent and ecs-init logs are located at: C:\ProgramData\Amazon\ECS\log
  • The ECS CLI configuration file: ~/.ecs/config
  • The ECS CLI credentials file: ~/.ecs/credentials
  • The checkpoint file: /var/lib/ecs/data/ecs_agent_data.json or /var/lib/ecs/data/agent.db.
    • Delete this file if you have made changes to the /etc/ecs/ecs.config, and you want to restart the agent.


Autoscaling in ECS


There are two levels of autoscaling in ECS:

  • Instance level: autoscaling groups
  • Container level: Application autoscaling

Scale out is not a problem, as you add instances and add tasks.

Scale in: ASG will stop any instance, by default, without ECS knowing which one is being terminated.

  • Solution: Implement instance draining in Lambda, allowing ECS to stop the tasks before the instance is terminated.

Recommneded, but not mandatory: Adapt alarms so Application Autoscaling Scale In first, then the Instances.

https://aws.amazon.com/blogs/compute/how-to-automate-container-instance-draining-in-amazon-ecs/


Load Balancers in ECS


Dynamic Port Mapping:

  • Allows you to run more than one task on the same host.
  • Supported by ALB and NLB only.
  • HostPort on the task definition must be set to '0'.

https://aws.amazon.com/premiumsupport/knowledge-center/dynamic-port-mapping-ecs/

Classic support:
Classic support more than 1 port per service on the load balancer. You can map as many port as you want on each container (http and https on the same container for example).

Support Load balancers

  • ALB
  • NLB
  • CLB

Application and Network Support Dynamic Port

  • You can only map 1 port per service/target group.
  • With a NLB, you cannot register instances by instance ID if they have the following instance types: C1, CC1, CC2, CG1, CG2, CR1, CS1, G1, G2, HI1, HS1, M1, M2, M3, and T1. You can register instances of these types by IP address instead.


Task Networking (awsvpc)


https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-networking.html

ENI is attached to the Task:

  • One ENI per task, not per container; all the containers in the task share the same ENI.

Task Networking...

  • Improves security - Security groups are attached to the ENI not the instance.
  • Only supported on Optimized ECS AMI or Amazon Linux with ecs-init >= 1.15.0-4
  • Linked Service Role is required -
$ aws iam create-service-linked-role --aws-service-name ecs.amazonaws.com

Task networking considerations: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-networking.html#task-networking-considerations

  • The awsvpc network mode does not provide task elastic network interfaces with public IP addresses. You need to use NAT or load balancer to allow the task to access resources outside the VPC. Load balancer is a better choice overall as it proves to provide a better scaling option.
  • awsvpc is only supported on Linux (ECS Optimized AMI or Amazon Linux with ecs-init >= 1.15.0-4)
  • The limit of ENI depends on the EC2 instance in use, i.e. c4.large can have up to 3 ENI.


Some considierations with AWSVPC mode:

  • No dynamic-port mapping
  • No Public IP with awsvpc mode in EC2 Launch Type


ENI Limit per instance:

Container Storage


You can create volumes in a Docker container to store and/or share information with other containers.

  • In the Linux ECS Optimized AMI the volumes will take space on the xvda volume and not the xvdcz
  • On Winodws they will be created inside C: drive

By default volumes in a container are managed by Docker and are empty on start and removed on container removal.

To use persistent storage you may use EBS or EFS


ECS CLI


ECS CLI provides the ability to create, update and monitor clusters and tasks.

ECS CLI can be used to push/pull Docker images from Amazon ECR repository.

ECS CLI supports Docker Compose with limits. Docker compose is used to define and run multi-container applications. It is used as an alternative to AWS Management console.

This section will walk you through the tutorial:

Creating a Cluster with an EC2 Task Using the Amazon ECS CLI
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-cli-tutorial-ec2.html


How to install the ECS CLI

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_CLI_installation.html


How to configure the Amazon ECS CLI

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_CLI_Configuration.html

The Amazon ECS CLI requires some basic configuration information before you can use it, such as:

  1. AWS credentials
  2. AWS region in which to create your cluster
  3. Amazon ECS cluster name


These configurations are passed on using $ ecs-cli configure command.

The settings are then stored in the ~/.ecs/config file.

There are three ways to set the AWS Credentials to be used with the $ ecs-cli configure command:

  1. You can set the environment variable AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY on your development computer, then when you call the ecs-cli command, those values will be passed to the command.
  2. You can use a named profile from the ~/.aws/credentials, both the access key and secret key will then be passed to the ecs-cli command.
  3. The last method is to pass the access key and secret key as arguments to the ecs-cli command using the --access-key and --secret-key options.


How to set environment variables: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html#envvars-set

How to configure:

$ ecs-cli configure profile --profile-name main_ecs --access-key $AWS_ACCESS_KEY_ID --secret-key $AWS_SECRET_ACCESS_KEY
$ ecs-cli configure --cluster SecondCluster --default-launch-type EC2 --region us-east-1 --config-name SecondCluster_config


ECS CLI configuration file: ~/.ecs/config

$ cat ~/.ecs/config
version: v1
default: SecondCluster_config
clusters:
  SecondCluster_config:
    cluster: SecondCluster
    region: us-east-1
    default_launch_type: EC2


ECS CLI credentials file: ~/.ecs/credentials

$ cat ~/.ecs/credentials 
version: v1
default: main_ecs
ecs_profiles:
  main_ecs:
    aws_access_key_id: AKIAZOEEYIZHRY376S4W
    aws_secret_access_key: 4B09uH11KLSWEtVz8aDYWiNpx8ZlVuRcihu8Xod2


How to create a cluster
$ ecs-cli up --keypair [your_key] --capability-iam --size 2 --instance-type t2.large
INFO[0000] Using recommended Amazon Linux 2 AMI with ECS Agent 1.40.0 and Docker version 19.03.6-ce 
INFO[0000] Created cluster                               cluster=SecondCluster region=us-east-1
INFO[0000] Waiting for your cluster resources to be created... 
INFO[0000] Cloudformation stack status                   stackStatus=CREATE_IN_PROGRESS
INFO[0061] Cloudformation stack status                   stackStatus=CREATE_IN_PROGRESS
INFO[0121] Cloudformation stack status                   stackStatus=CREATE_IN_PROGRESS
VPC created: vpc-00e137a44a3d76999
Security Group created: sg-07f3954ba17662632
Subnet created: subnet-0f250960e3e51e10a
Subnet created: subnet-021053587cc96f053
Cluster creation succeeded.
  • 2 instances were launched from the command above.
  • By default port 80 would be open on your new security group, if you prefer a different port, you can set it by using the flag --port.


How to deploy the Compose File to a Cluster

After you create the compose file, you can deploy it to your cluster with the ecs-cli compose up command. By default the command looks for a file called docker-compose.yml in the current directory, but you can specify a different file with the --file option. By default, the resources created by this command have the current directory in the title, but you can override that with the '--project-name project_name' option.

  • A Docker Compose file that creates 2 containers, WordPress and MySQL database:
$ sudo vi docker-compose.yml
version: "2"
services:
  wordpress:
    cpu_shares: 100
    image: wordpress
    links:
      - mysql
    mem_limit: 524288000
    ports:
      - "80:80"
  mysql:
    cpu_shares: 100
    environment:
      MYSQL_ROOT_PASSWORD: password
    image: "mysql:5.7"
    mem_limit: 524288000
  • Run: $ ecs-cli compose --file docker-compose.yml up to deploy the Compose File to a Cluster.
$ ecs-cli compose --file docker-compose.yml up
INFO[0000] Using ECS task definition                     TaskDefinition="ec2-user:1"
INFO[0000] Starting container...                         container=1824492b-1422-495d-b66d-8185c111bda6/mysql
INFO[0000] Starting container...                         container=1824492b-1422-495d-b66d-8185c111bda6/wordpress
INFO[0000] Describe ECS container status                 container=1824492b-1422-495d-b66d-8185c111bda6/mysql desiredStatus=RUNNING lastStatus=PENDING taskDefinition="ec2-user:1"
INFO[0000] Describe ECS container status                 container=1824492b-1422-495d-b66d-8185c111bda6/wordpress desiredStatus=RUNNING lastStatus=PENDING taskDefinition="ec2-user:1"
INFO[0012] Describe ECS container status                 container=1824492b-1422-495d-b66d-8185c111bda6/mysql desiredStatus=RUNNING lastStatus=PENDING taskDefinition="ec2-user:1"
INFO[0012] Describe ECS container status                 container=1824492b-1422-495d-b66d-8185c111bda6/wordpress desiredStatus=RUNNING lastStatus=PENDING taskDefinition="ec2-user:1"
INFO[0024] Describe ECS container status                 container=1824492b-1422-495d-b66d-8185c111bda6/mysql desiredStatus=RUNNING lastStatus=PENDING taskDefinition="ec2-user:1"
INFO[0024] Describe ECS container status                 container=1824492b-1422-495d-b66d-8185c111bda6/wordpress desiredStatus=RUNNING lastStatus=PENDING taskDefinition="ec2-user:1"
INFO[0030] Started container...                          container=1824492b-1422-495d-b66d-8185c111bda6/mysql desiredStatus=RUNNING lastStatus=RUNNING taskDefinition="ec2-user:1"
INFO[0030] Started container...                          container=1824492b-1422-495d-b66d-8185c111bda6/wordpress desiredStatus=RUNNING lastStatus=RUNNING taskDefinition="ec2-user:1"


How to view the running containers on the cluster

After you deploy the compose file, you can view the containers that are running on your cluster with the $ ecs-cli ps command.

Run: $ ecs-cli ps

$ ecs-cli ps
Name                                            State    Ports                     TaskDefinition  Health
1824492b-1422-495d-b66d-8185c111bda6/mysql      RUNNING                            ec2-user:1      UNKNOWN
1824492b-1422-495d-b66d-8185c111bda6/wordpress  RUNNING  34.236.254.87:80->80/tcp  ec2-user:1      UNKNOWN


How to scale the Tasks on a Cluster

You can scale your task count up so you could have more instances of your application with the $ ecs-cli compose scale command. In this example, you can increase the count of your application to two.

$ ecs-cli compose scale 2 --cluster-config SecondCluster_config --ecs-profile main_ecs
INFO[0000] Starting container...                         container=5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/mysql
INFO[0000] Starting container...                         container=5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/wordpress
INFO[0000] Describe ECS container status                 container=5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/wordpress desiredStatus=RUNNING lastStatus=PENDING taskDefinition="ec2-user:1"
INFO[0000] Describe ECS container status                 container=5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/mysql desiredStatus=RUNNING lastStatus=PENDING taskDefinition="ec2-user:1"
=INFO[0012] Describe ECS container status                 container=5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/wordpress desiredStatus=RUNNING lastStatus=PENDING taskDefinition="ec2-user:1"
INFO[0012] Describe ECS container status                 container=5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/mysql desiredStatus=RUNNING lastStatus=PENDING taskDefinition="ec2-user:1"
INFO[0024] Describe ECS container status                 container=5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/wordpress desiredStatus=RUNNING lastStatus=PENDING taskDefinition="ec2-user:1"
INFO[0024] Describe ECS container status                 container=5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/mysql desiredStatus=RUNNING lastStatus=PENDING taskDefinition="ec2-user:1"
INFO[0030] Started container...                          container=5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/wordpress desiredStatus=RUNNING lastStatus=RUNNING taskDefinition="ec2-user:1"
INFO[0030] Started container...                          container=5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/mysql desiredStatus=RUNNING lastStatus=RUNNING taskDefinition="ec2-user:1"

Now you should see two more containers in your cluster by running: $ ecs-cli ps
Example 1:

$ ecs-cli ps
Name                                            State                Ports                     TaskDefinition  Health
1824492b-1422-495d-b66d-8185c111bda6/mysql      RUNNING                                        ec2-user:1      UNKNOWN
1824492b-1422-495d-b66d-8185c111bda6/wordpress  RUNNING              34.236.254.87:80->80/tcp  ec2-user:1      UNKNOWN
5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/wordpress  RUNNING              3.88.22.104:80->80/tcp    ec2-user:1      UNKNOWN
5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/mysql      RUNNING                                        ec2-user:1      UNKNOWN

Example 2:

$ ecs-cli ps --cluster-config SecondCluster_config --ecs-profile main_ecs
Name                                            State                Ports                     TaskDefinition  Health
1824492b-1422-495d-b66d-8185c111bda6/mysql      RUNNING                                        ec2-user:1      UNKNOWN
1824492b-1422-495d-b66d-8185c111bda6/wordpress  RUNNING              34.236.254.87:80->80/tcp  ec2-user:1      UNKNOWN
5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/wordpress  RUNNING              3.88.22.104:80->80/tcp    ec2-user:1      UNKNOWN
5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/mysql      RUNNING                                        ec2-user:1      UNKNOWN


How to create an ECS Service from a Compose File

Now that you know that your containers work properly, you can make sure that they are replaced if they fail or stop. You can do this by creating a service from your compose file with the $ ecs-cli compose service up command. This command creates a task definition from the latest compose file (if it does not already exist) and creates an ECS service with it, with a desired count of 1.

Before starting your service, stop the containers from your compose file with the $ ecs-cli compose down command so that you have an empty cluster to work with. Example 1:

$ ecs-cli compose --file docker-compose.yml down

Example 2:

$ ecs-cli compose down --cluster-config SecondCluster_config --ecs-profile main_ecs
INFO[0000] Stopping container...                         container=1824492b-1422-495d-b66d-8185c111bda6/mysql
INFO[0000] Stopping container...                         container=1824492b-1422-495d-b66d-8185c111bda6/wordpress
INFO[0000] Stopping container...                         container=5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/wordpress
INFO[0000] Stopping container...                         container=5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/mysql
INFO[0000] Describe ECS container status                 container=1824492b-1422-495d-b66d-8185c111bda6/mysql desiredStatus=STOPPED lastStatus=RUNNING taskDefinition="ec2-user:1"
INFO[0000] Describe ECS container status                 container=1824492b-1422-495d-b66d-8185c111bda6/wordpress desiredStatus=STOPPED lastStatus=RUNNING taskDefinition="ec2-user:1"
INFO[0000] Describe ECS container status                 container=5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/wordpress desiredStatus=STOPPED lastStatus=RUNNING taskDefinition="ec2-user:1"
INFO[0000] Describe ECS container status                 container=5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/mysql desiredStatus=STOPPED lastStatus=RUNNING taskDefinition="ec2-user:1"
INFO[0006] Stopped container...                          container=1824492b-1422-495d-b66d-8185c111bda6/mysql desiredStatus=STOPPED lastStatus=STOPPED taskDefinition="ec2-user:1"
INFO[0006] Stopped container...                          container=1824492b-1422-495d-b66d-8185c111bda6/wordpress desiredStatus=STOPPED lastStatus=STOPPED taskDefinition="ec2-user:1"
INFO[0006] Stopped container...                          container=5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/wordpress desiredStatus=STOPPED lastStatus=STOPPED taskDefinition="ec2-user:1"
INFO[0006] Stopped container...                          container=5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/mysql desiredStatus=STOPPED lastStatus=STOPPED taskDefinition="ec2-user:1"
$ ecs-cli ps
Name                                            State                Ports                     TaskDefinition  Health
1824492b-1422-495d-b66d-8185c111bda6/mysql      STOPPED ExitCode: 0                            ec2-user:1      UNKNOWN
1824492b-1422-495d-b66d-8185c111bda6/wordpress  STOPPED ExitCode: 0  34.236.254.87:80->80/tcp  ec2-user:1      UNKNOWN
5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/wordpress  STOPPED ExitCode: 0  3.88.22.104:80->80/tcp    ec2-user:1      UNKNOWN
5e0a1741-5f4b-4621-8ed6-d69aa6e13a6f/mysql      STOPPED ExitCode: 0                            ec2-user:1      UNKNOWN

Now you can create your service using the $ ecs-cli compose service up command. Once ran you should be able to see the service from the ECS console:

$ ecs-cli compose service up --cluster-config SecondCluster_config --ecs-profile main_ecs
INFO[0000] Using ECS task definition                     TaskDefinition="ec2-user:1"
INFO[0005] (service ec2-user) has started 1 tasks: (task 3a18eb82-2d13-4697-b326-9024bd4fb497).  timestamp="2020-06-15 21:39:16 +0000 UTC"
INFO[0010] Service status                                desiredCount=1 runningCount=1 serviceName=ec2-user
INFO[0010] ECS Service has reached a stable state        desiredCount=1 runningCount=1 serviceName=ec2-user
INFO[0010] Created an ECS service                        service=ec2-user taskDefinition="ec2-user:1"


How to clean up resources

To delete the service so that it stops the existing containers and does not try to run any more tasks using $ ecs-cli compose service rm command:
Example 1:

$ ecs-cli compose --file docker-compose.yml service rm

Example 2:

$ ecs-cli compose service rm --cluster-config SecondCluster_config --ecs-profile main_ecs


Now, take down your cluster, which cleans up the resources that you created earlier with $ ecs-cli up.
Example 1:

$ ecs-cli down --force
INFO[0000] Waiting for your cluster resources to be deleted... 
INFO[0000] Cloudformation stack status                   stackStatus=DELETE_IN_PROGRESS
INFO[0060] Cloudformation stack status                   stackStatus=DELETE_IN_PROGRESS
INFO[0121] Cloudformation stack status                   stackStatus=DELETE_IN_PROGRESS
INFO[0151] Deleted cluster                               cluster=SecondCluster

Example 2:

$ ecs-cli down --force --cluster-config SecondCluster_config --ecs-profile main_ecs


Other Notes


Troubleshooting Basics

Amazon ECS Troubleshooting: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/troubleshooting.html

When troubleshooting a container instance, some log analysis might be required. Here are the most common logs:

  • /var/log/messages
  • /var/log/dmesg
  • /var/log/cloud-init(-output).log
  • docker logs <container id> - note these logs are only available when using the default 'json-file' log driver. If you are using 'awslogs' as many customers do, the CloudWatch log group and stream (as configured in the task definition) can be checked.

All logs can be collected for analysis using the ecs-log-collector tool: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-logs-collector.html

Say, you noticed that the ECS container agent was not 'connected', so we need to see why. There are many reasons why the agent may not be connected to the ECS service:

  • Is the instance currently running?
  • Can you SSH to the container instance?
  • Is the Docker daemon running?
$ ps aux | grep -i docker
$ docker ps
    • If that is the case and you need to further troubleshoot the docker daemon, you could set the docker logs to debug and restart the service. Enable Debugging (ecs-optmized) by adding line below, and then restart docker:
$ sudo vi /etc/sysconfig/docker
...
OPTIONS="-D --default-ulimit nofile=1024:4096"
$ sudo service docker restart

Click here to view more Docker troubleshooting tips.

If the ECS container agent is running, but you still see it as not connected, run $ docker ps and check it. If it shows the task is only up for 1 second that means it was just restarted. If you seeing it like this constantly then this suggests that the ECS container agent is constnatly restarting.


Where to check what task a local ecs-agent is tracking
$ curl http://localhost:51678/v1/metadata | python -mjson.tool
$ /var/log/ecs/ecs-agent.log*
$ /var/lib/ecs/data/ecs_agent_data.json
$ /var/lib/ecs/data/agent.db # replaces the above in later agent versions


Useful 'aws ecs' AWS CLI commands

How to describe your cluster using $ aws ecs list-container-instances:

$ aws ecs list-container-instances --cluster default
{
    "containerInstanceArns": [
        "arn:aws:ecs:us-east-1:648818476623:container-instance/1542e693-2d21-4ccd-b476-455faef68e68",
        "arn:aws:ecs:us-east-1:648818476623:container-instance/99c833f7-5561-4aa3-92d6-ae18608f06e5"
    ]
}


Now that you have the ARN or ID of a container instance, you can use the $ describe-container-instances command to get valuable information on the instance, such as remaining and registered CPU and memory resources:

$ aws ecs describe-container-instances --container-instances 1542e693-2d21-4ccd-b476-455faef68e68
{
    "containerInstances": [
        {
            "containerInstanceArn": "arn:aws:ecs:us-east-1:648818476623:container-instance/1542e693-2d21-4ccd-b476-455faef68e68",
            "ec2InstanceId": "i-0c567d7b38900c18c",
            "version": 95,
            "versionInfo": {
                "agentVersion": "1.40.0",
                "agentHash": "17e8d834",
                "dockerVersion": "DockerVersion: 19.03.6-ce"
            },
            "remainingResources": [
                {
                    "name": "CPU",
                    "type": "INTEGER",
                    "doubleValue": 0.0,
                    "longValue": 0,
                    "integerValue": 1024
                }, 
                {
                    "name": "MEMORY",
                    "type": "INTEGER", 
[......]


You can list task definitions by using the list-task-definitions command. The output of this command shows the family and revision values that you can use together when calling run-task or start-task:

$ aws ecs list-task-definitions


How to bootstrap an ECS contianer instance to mount an existing EFS volume to the instance

An EFS file system can be mounted on to container instances and the mount points from host instance can be shared with the containers running on the host.

Using EFS file system with Docker containers provides an advantage of not running out of host volume space and the same EFS file system can be shared with containers running on multiple hosts.

  • Boothook example:
    • The EFS file system is mounted through cloud-boothook. EFS file system is mounted to directoy /efs on the host instance. When the instance is up we can SSH into the instance and create a couple empty files in /efs directory, for example use the below command to create 2 files named "123" and "abc" -
$ touch 123 abc
    • You can then launch an nginx contianer and mount the /efs directory to the directory /mnt on the container as shown below. Inside the container we should be able to see the 2 example files that we created in /mnt directory.
#cloud-boothook
# Install nfs-utils
cloud-init-per once yum-update yum update -y
cloud-init-per once install_nfs_utils yum install -y nfs-utils
# Create /efs folder 
cloud-init-per once mkdir_efs
mkdir /efs
# Mount /efs
cloud-init-per once mount_efs echo -e 'fs2aa70323.efs.us-east-1.amazonaws.com:/ /efs nfs4 nfsver=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 0 0' >> /etc/fstab
mount -a
#Then you can run:
$ docker run -it -v /efs:/mnt nginx bash
# cd /mnt/
# ls
123  abc


How to find PENDING tasks count per cluster, service and instance? (using CLI)
  • Describe the CLUSTER to find the number of tasks in the CLUSTER that are in the RUNNING and PENDING state:
$ aws ecs describe-clusters --cluster default --region us-east-1 --query "clusters[].[pendingTasksCount,runningTasksCount]"
[
   [
      0,
      2
   ]
]
  • Describe a CONTAINER INSTANCE to find the number of tasks on the CONTAINER instance that are in RUNNING and PENDING status:
$ aws ecs describe-container-instances --cluster default --container-instances 2131234-3243243-af34-af23-vda3-1431gds --region us-east-1 --query "containerInstances[].[runningTasksCount,pendingTasksCount]"
[
   [
      1,
      0
   ]
] 
  • Describe an ECS SERVICE to find the number of desired tasks in the service as well as the actual number of RUNNING and PENDING state:
[
   [
      2,
      2,
      0
   ]
]


Capacity Providers



MUST READ/ALWAYS READ THIS:: https://aws.amazon.com/blogs/containers/deep-dive-on-amazon-ecs-cluster-auto-scaling/

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cluster-capacity-providers.html

WHAT IS A CAPACITY PROVIDER?
[+] capacity providers are used to manage the infrastructure the tasks in your clusters use.

[+] Per Gilbert "Service auto scaling is responsible for scaling additional tasks when CPU utilization is 60%, the capacity provider simply scales the cluster in relation to the "desired tasks" number."

[+] A capacity provider is associated with a cluster and is used in a capacity provider strategy to determine the infrastructure that a task runs on.

[+] For Amazon ECS on Amazon EC2 users, a capacity provider consists of a capacity provider name, an Auto Scaling group, and the settings for managed scaling and managed termination protection. With managed scaling, Amazon ECS manage the scale-in and scale-out actions of the Auto Scaling group which provides auto scaling for your cluster's infrastructure.

AUTO SCALING GROUP CAPACITY PROVIDERS:
[+] Amazon ECS capacity providers can use Auto Scaling groups to manage the Amazon EC2 instances registered to their clusters. You can use the managed scaling feature to have Amazon ECS manage the scale-in and scale-out actions of the Auto Scaling group or you can manage the scaling actions yourself.

AUTO SCALING GROUP CAPACITY PROVIDERS CONSIDERATIONS
[+] It is recommended that you create a new empty Auto Scaling group to use with a capacity provider rather than using an existing one. If you use an existing Auto Scaling group, any Amazon EC2 instances associated with the group that were already running and registered to an Amazon ECS cluster prior to the Auto Scaling group being used to create a capacity provider may not be properly registered with the capacity provider. This may cause issues when using the capacity provider in a capacity provider strategy. The DescribeContainerInstances API can confirm whether a container instance is associated with a capacity provider or not.

[+] When using managed termination protection, managed scaling must be enabled otherwise managed termination protection will not work.

[+] When using managed scaling, the Auto Scaling group shouldn't have any scaling policies attached to it other than the ones Amazon ECS creates, otherwise the Amazon ECS created scaling plans will receive an ActiveWithProblems error.

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/asg-capacity-providers-update-capacity-provider.html:

  • For Target capacity %, if managed scaling is enabled, specify an integer between 1 and 100. The target capacity value is used as the target value for the CloudWatch metric used in the Amazon ECS-managed target tracking scaling policy. This target capacity value is matched on a best effort basis. For example, a value of 100 will result in the Amazon EC2 instances in your Auto Scaling group being completely utilized and any instances not running any tasks will be scaled in, but this behavior is not guaranteed at all times.

Other notes from Nitheesha:

  • so 80% utilization/reservation. 80% can be 8 instances more or less, depending on how much memory, cpu they have left
  • if we use 100% capacity, it means you are saying capacity provider to use 100% of resources available, dont leave any excess. only launch more when required and utilize the reosurces to 100%.
  • if you say 50% target capacity, you are instructing capacity provider to make sure the cpu/memory combined reservation is 50% and you aleays have 50% excess available in ASG.
  • This cpu/memory combined is called caapcity provider reservation