4 min read

AWS EC2 Scaling gotchas no one told you

AWS EC2 Scaling gotchas no one told you

Here are some things I wish someone told me before I needed to manage over 5 million connections a day, 35k concurrently on average (Melbourne Cup). If that sounds like a lot to you, it is, the Commonwealth Bank had less than 1 million hits a day this month and other than myself there was only 1 other tech person in my little start up team to deal with the complexities of scaling.

AWS can look overwhelming. There are now over 40 different AWS services, and many concepts to think about when building your app. AWS provides excellent documentation on each of their services, helpful guides and white papers, reference architectures for common apps, and even a program especially for startups where startups can receive one month free to speak with AWS Cloud Support Engineers.

My target audience is the developer who has started using AWS with scalability in mind, or perhaps one that has used AWS for a long time for some small app but is looking to scale it now.

AWS involves a paradigm shift in thinking

Moving from physical servers to the "cloud" is not always obvious what needs to be done. Architecting your service for scalability is less obvious than that still.

Scalability is not so much of "what" you are using, but rather "how" you use it

It's one thing to architect a stack to be ready for scale, and another to build an application that is scalable within that stack. You've probably chosen all of the correct services already, configured them in a fair way to address highly available and fault tolerance, but your apps seem to under perform or the AWS bill seem disproportionally high to what was expected.

It's time to analyse where you went wrong.

Application Development

You've handled the obvious things like storing no application state on your ephemeral servers, centralised logs, in production disabled SSH access to all servers to focus on CI/CD, avoided EIP for auto scaled instances, scaled horizontally, scale granularly, and Multi-AZ.

Now let's talk about some of the concerns that are not in the mainstream.

Production code release to an auto-scaled stack

There are 2 main methods to do this without taking your autoscale group offline, and which you choose to do is comes down to only 1 consideration;

  • Are your AMI's created with the code on it, ready to run immediately after provisioning?
  • OR; Is your AMI simply the environment, and it will pull in the latest code when it is provisioned?

With this answered, you are going to need to perform the following corresponding methods;

  • Build the new AMI with the new code, update the autoscale group configuration with the new AMI ID (CloudFormation template), launch it into the autoscale group by detaching an existing instance that it replaces. This is called cycling your autoscale group.
  • Using Git, synchronise with all servers, git pull the repo files in a temp directory and when you're files are ready to run, link them to your working directory being served. Doing this allows any running scripts to complete in memory on the old files, and any new ones utilise the new code.

Gotchas for the first option; doing this also requires you to utilise connection draining on the ELB. Failing to do this terminates open customer connections abruptly, giving the appearance of being offline. If you use a cache in front of your instances (and you should) you run the risk of an error page being cached for a long time. Which to users has the appearance of an outage.
Another drawback of the first option is having to cycle servers, leaving some servers on old code whilst new ones are introduced to the stack. This can have unforeseeable symptoms, often leading to unreproducible bug reports.

The second option is by far what you should attempt to achieve, but it is the least traditional and seemingly risky until you understand how inode components get manipulated. This technique is synchronised so no consistency issues or error pages to concern you, there isn't ELB connection draining. This method depends on your server side language to serve each request in its own process. Having Apache/Nginx in front of a language like PHP are perfect, but if your server side language handles all requests through the main process (Ruby, Node.js) this option if off the table for you entirely.

Granular Auto scaling rules

We all want to be granular, and if we are on AWS we are not running our stacks 24/7 or we'd have chosen the cheaper dedicated hardware. No, in AWS we know we have large periods of low activity. Sometimes we may encounter scale up followed by an instant scale down event. This symptom is consistently due to poorly considered scale policies.

Here is a simple policy based on CPU;

  • Add 1 instance when Average CPU > 60% for 3 consecutive periods of 5mins.
  • Remove 1 instance when Average CPU < 60% for 3 consecutive periods of 5mins.

ec2 scale policy data

The above is an example where server load steadily increased and yet we encounter a scale up followed by a scale down and then an out of CPU capacity event.

This is a very common example, and it is always caused due to reliance on the base EC2 CPU metric. Don't fall into the trap, scale your stack based on metrics that make sense for your business case, report these metrics to CloudWatch, and create one (and only one) scale policy for each scale up and scale down event.

Create your own predictive algorithm

If you finally decide that CPU is all you need, might I suggest you compose your own CPU metric using a basic algorithm to predict the state of the stack in its current load:- what it would look like for the remaining instances if you were to reduce its capacity. Use this new CPU based metric to decide if a scale event is going to be a positive outcome.


Please Leave comments below or reach me on twitter.

I will be adding more to this article in the coming days.