Autoscaling
One of the original benefits that customers gained by using public cloud computing providers was the ability to scale up server resources to meet large spikes in traffic. There are many stories of startups using AWS to scale up to hundreds if not thousands of server instances to meet their wildly successful projects. In addition to scaling server resources, companies were able to gain the benefit of only paying for what they used.
When EC2 first launched customers had to create their own way to add and remove server resources to their applications. This usually involved custom scripts, the use of worker queues, or third party services.
Eventually AWS saw how critical this feature was to EC2 and created the Autoscaling service. In this lesson we will explore this service to fully understand how it works and how to use it.
How Autoscaling Works
The basic premise is that autoscaling will dynamically manage the number of servers in a cluster depending on a set of pre-established conditions. Often the condition is the utilization of an infrastructure resource, such as CPU utilization or network I\/O. But it can also include custom metrics at the OS or application level.
To achieve this, Autoscaling works closely with the CloudWatch service. CloudWatch is a monitoring service that is integrated with many AWS services, including EC2. CloudWatch will feed metric data to Autoscaling. Autoscaling will use that data to make decisions, such as to add or remove servers to a cluster.
Autoscaling is also integrated with the Elastic Load Balancer (ELB). If server group is using ELB for traffic distribution, then it is important for the ELB to know when new servers are added or removed from the group. Autoscaling and ELB work together to achieve smooth operations.
Autoscaling Anatomy
Below is a diagram showing an Autoscaling configuration. Take a moment to review the diagram and read through the review below it.
Diagram Review
- Autoscaling Group: An Autoscaling Group is created for each cluster, or group of servers, that you want to dynamically scale.
- EC2 Instances: Autoscaling works solely with EC2 instances. They are at the core of Autoscaling.
- Capacity Settings: Each Autoscaling group requires a min, desired, and max value. The min value is used to set the absolute lowest number of servers in the group. The desired value is the number of instances you want right now; this value can change if scaling plans are created and conditions are met. The max value defines the upper of servers allowed in the group.
- Scaling Plan: The scaling plan sets the conditions and actions for dynamic scaling. For example, you can create a plan that adds 2 server instances when CPU is > 70% utilized.
- Launch Configuration: The Launch Configuration defines the settings of the EC2 instances launched in the Autoscaling Group. This includes the AMI, instance type, security group, storage size, and key pair.
- Elastic Load Balancer: Autoscaling Groups can be configured to work with ELB. You need to explicitly define which one.
Setting up Autoscaling
Setting up Autoscaling is a multi-step process. Here is a summary of those steps with links to how to perform them.