6 min read

Private AWS S3 - How hard could that be?

Applying private routing to AWS management APIs is hard. AWS S3 has had some poor press coverage, but to Amazon's credit it has always been subject to authentication by default and not leaked any customer data.

Before I begin I’d first like to clarify a few things:

  • I have used AWS for almost a decade, since 2011, yes before the Sydney region release (good times)
  • I will continue to use AWS; personally and professionally
  • Yes, S3 and AWS as a whole will continue to be recommended by me; knowing is not hating.
  • I value truth, secrecy is harmful

Critical thinking is virtuous. Without it, we have no intelligence or innovation.
Being critical is a positive, not a negative - the negative attitude in technical conversation would be showing dogmatic or ignorant opinions.

You think you know the S3 history, right?

You're probably wrong about S3 past and current insecurities.

For a long time it wasn't ever possible to be private, yet S3 has always been secured from the public by default.
Data leaks were always the result of users removing authentication, making all of the objects stored in the S3 bucket accessible by anyone.

This was initially an ACL issue, and Amazon addressed this by introducing S3 resource policies. Since their release, these policies have been continuously updated, providing new and improved options to help secure the data they govern.
More recent changes in the Console UI now tells users, in the most obvious way, that you have made your data Public. With various scanning tools like AWS Config that scan your accounts to identify risky configurations, users can now identify leaky buckets at the time they are created, perhaps unintentionally made public.
These additional services and improvements by AWS are not them fixing anything wrong with S3, it is quite bluntly AWS screaming out to customers "Hey, you have made this public yo! Fix it?"

S3 was always subject to authentication by default.

We all know that everything in AWS is an API right? Read this post to better understand why this is an important security consideration.

The Query Request API is public, all AWS service endpoints are Public.
Meaning the entire management plane is Public facing inherently - by design - but all requests are subject to authentication checks. Some other services, like Elasticsearch and Memcache for Elasticache to name a couple, have their data plane Public also, but not S3.

Can AWS S3 be made Private?

First what does it mean to be private?

  • Confidential data we transmit cannot be intercepted and read without authorisation
  • Internal Address unless customer configuration specifies otherwise, the IP resolved for the endpoint is internal to AWS when the caller is inside the same AWS account, transmissions do not leave our network
  • Default Protections unauthenticated external actors cannot identify it's existence, i.e. Public. So no public endpoint urls appear in public indexes

So, can AWS meet these? The answer is simply; not by default
Is it easy, straight forward to configure AWS to be private? No, there are too many resource level configurations with different names depending on the service context, and more confusing is the various client configurations needed for specific scenarios that are actually some of the most common use cases. That is not even the most frustrating thing about private endpoints in AWS, even if you know how to configure the AWS resource and your client, the Public route is still the default and it remains active!
Did I say that was the last problem? no, no ,no, we still haven't discussed troubleshooting all of that mess above have we? If the service is EC2, i.e. it allows you to run traceroute or similar you're going to have no trouble. But if the caller service is highly abstracted like Fargate, you might try collecting VPC Flow data or use the new VPC Interface for network packets (both cost you extra). If you are advanced enough to analyse this data, you might be able to observe if the private route is being utilised.

So what are all of the private options?

  • AWS PrivateLink: creates an Elastic Network Interface (ENI)
  • Interface VPC Endpoint: provides connection to AWS PrivateLink
  • Gateway VPC Endpoint: configures VPC route table to supported AWS Resources
  • AWS Direct Connect: physical network backbone connectivity from a non-AWS data centre to an availability zone or region
  • AWS Site-to-Site VPN: an encrypted tunnel over a the public internet that terminates in your VPC and can be configured to resolve VPC private routes from the origin end of the tunnel

Since 2015, S3 has used the Gateway VPC Endpoint method, which in turn can be utilised with VPN and DirectConnect. You can use the describe-vpc-endpoint-services command to get a list of other available services.

More often than not, private routing is configured but is never utilised in practice. It just sits there looking good for your audit while public routes are still being used.

See the confusing troubleshooting guide published by AWS called Why can’t I connect to an S3 bucket using a gateway VPC endpoint? Yes it states "Outbound rules with Destination as the public IPs used by Amazon S3" for your NACL.

Even AWS can't effectively help customers make Gateway Endpoint private routes work, their instructions tell customers to route traffic publicly. They instruct you to open your firewall rules to the public IP address to get it (Gateway Endpoint private route) to work, effectively by-passing the Gateway Endpoint. When you follow AWS guidance and fix the routing issue, you did not get the private route you wanted, your Gateway Endpoint remains idle, unused, redundant.

Configuring a Gateway VPC Endpoint

There are of course prerequisites and constraints, which I'll raise as they relate to specific steps.

Using the Console, common (but not recommended) method to use AWS. I encourage the use of Infrastructure as Code tools such as CloudFormation, AWS CDK, and Terraform by Hashicorp.

1. From the VPC Dashboard, select Endpoints, and Create Endpoint. You'll have to chose the VPC and S3 service

The endpoint needs to be created in the same region as the S3 bucket

2. Specify the optional Custom access policy to define your own using the following JSON as a base, adjust to your needs

{
   "Statement": [
     {
       "NotPrincipal": {
         "AWS": "1234567890"
       },
       "Action": "*",
       "Effect": "Deny",
       "Resource": [
         "arn:aws:s3:::targetbucket",
         "arn:aws:s3:::targetbucket/*"
       ]
     }
   ]
}
Important: The default access policy allows all users, all services, all AWS accounts

3. Gateway VPC Endpoints configure your route tables, and as such can provide routes to S3 limited to certain subnets. If you're doing this you probably only want to choose a subnet that doesn't have a NAT Gateway. Save that and take note of the vpce-xxxx identifier.

4. This next step is necessary to ensure the default public route to the S3 bucket is not utilised, even when you think you are using the private route.

Caution: Enforcing this will likely break things, particularly things you think are using the private route will stop working too

Now go to S3, choose the target bucket, choose Bucket Policy, use the following JSON as a base, adjust as needed.

{
   "Version": "2012-10-17",
   "Statement": [
     {
       "Principal": "*",
       "Action": "*",
       "Effect": "Deny",
       "Resource": [
         "arn:aws:s3:::targetbucket",
         "arn:aws:s3:::targetbucket/*"
       ],
       "Condition": {
         "StringNotEquals": {
           "aws:sourceVpce": "vpce-xxxx"
         }
       }
     }
   ]
}

5. Configure the clients. For demonstration purposes I chose the lowest common service EC2, which other services sit on top of.

Only IPv4 is supported when using a Gateway VPC Endpoint

Check the following;

  • EC2 instance is in the correct subnet (inherently the right VPC and region)
  • Configure the default AWS region for your tools or SDK. If you don't use profiles you can set the --region option in each command.
  • Verify IPv6 is not used aws configure get default.s3.use_dualstack_endpoint should be false (also check profiles if they are used). When dualstack is false communication stays on IPv4
  • If you're using your own DNS server, be sure DNS requests to AWS services resolve to AWS-maintained IP addresses.
  • Check outbound rule of Security Groups allowing traffic from the ID of the prefix list associated with the gateway VPC endpoint.
Important: DNS resolution must be enabled in your VPC

Now you have a private route from EC2 to your S3 bucket. Easy right?

Apply better security options

Focusing on the data transmission topic of this article, you might want to also consider a few additional configuration changes to improve the public S3 endpoint - private communication can also benefit if your threat model includes any assumed breach vectors, or your risk posture demands defence in depth.

  1. Force HMAC-SHA256: Specifying the Signature Version in Request Authentication using aws configure set default.s3.signature_version s3v4 ensures Signature Version 2 is never used, which implements the deprecated SHA1 hashing algorithm.
  2. Force Forward Secrecy: Many SDKs can't do this including the python SDK (botocore and Boto3), which the AWS Command Line tools use, and is built into all of AWS compute services based on Amazon Linux 1/2 AMIs (so almost all of them).
    To get around this, I have developed an optional extension to various libraries which introduces a feature flag to support Forward Secrecy. I submitted this change to AWS via PR but it is yet to be merged. Hopefully AWS developers see this as an important feature and accept this optional feature for all to use.
    If however you are using a tool or SDK that supports custom ciphers, you might try specifying the following Forward Secrecy cipher suite; kEECDH:kEDH:!aNULL:!eNULL:!DES:!3DES:!RC4

While AWS offers Forward Secrecy, it is near impossible to enforce in most places. I only know how to do it when manually interacting with the Query Request API using a HTTP client, and not any SDK or command line tool.