May 12, 2018 6 min read DevOps

Software Engineers guide to AWS Solution Architecture

So you're a developer or operations engineer (or both, DevOps) and work in a small team that either has no access to a Solution Architect.

or you do but expected to re-architect solutions for security and reliabilty

Get to know the AWS Well-Architected Framework and the new Operational Excellence Pillar white papers well. Do a review of your apps individually against this framework, and don't worry if you find serious issues because it's well understood within AWS that even the AWS Solution Architects 'always' find critical problems in every workplace they visit to review.

In your architecture diagrams be sure to include the consistently forgotten critical infrastructure components such as;

How AWS Accounts connect

Define these properly, show each account ID visually, and show not just VPC but also the availability zones and their subnets and endpoints.

If a RDS is multi-az, show this as 2 or 3 RDS instances not one, same for EC2 Auto-scale groups, show the EC2 instance in each AZ.
For Lambda it helps to know which subnet it deploys too and if there are VPC endpoints for things like S3, because accessing S3 over-the-internet is sometimes not ideal

Host OS agents

AWS CloudWatch logs is extremely easy to setup/install and will eliminate the need to ever SSH into an instance again. Once a log group or stream is created you can easily set an expiry so the archived logs are reliably purged via a 1-time cli command or in the console.

The SSM agent is another great tool to allow easy patching and provide reporting on compliance with AWS Config and Inspector.

And don't forget any 3rd party intrusion prevention or endpoint protection agents such as Trend Micro Deep Security.

Single-Tenant vs Multi-tenant

Make sure you show tenancy aspects of the solution design.
This is key for any ISO (Information Security Officer) or auditor to know from you eventually anyway, so be sure to properly understand your choices earlier in the planning stage rather than have to resort to compensating controls much later in the implementation - which usually come at the highest developer time/cost when there is little or no time left on the project.

Secrets management

Be sure to identify and visually represent SSM Parameter Store for encrypted secret strings such as tokens used by 3rd party integrations, and Secrets Manager for managing database passwords seamlessly.

KMS or CloudHSM and the requirements of Key management such as revokation and rotation that developers will be required to implement should be clear visually represented and such requirements annotated as these will likely require some additional infrastructure (Lambda) to achieve.

Events and Integration Hooks

Just like in the key management point, we have many used and unused CloudWatch Events, S3 lifecycle, and service specific events (like Kinesis and AWS Glue).

These events are usually of very high business value, under utilised, and extremely easy for a developer to leverage in most cases. But if a requirement never makes it's way to a developer because a business unit was not aware of the capability - then as an architect you've pretty much failed to identify and produce value from an essentially free resource that has potential for some ground breaking user experiences and statistical or behavioural data.

Alerts

Monitoring is usually an afterthought.

But alerting capabilities are often never given a first thought.

SNS Topics, Pager Duty, and Raygun.io are all great, but what I am talking about is CloudWatch Rules. Just like some of the events mentioned earlier, these can alert you for actions that CloudWatch Alarms cannot. One of my favorite uses for a CloudWatch Rule is with Glue Data Catalog because the Apache Hive Metadata store generated is useful for EMR, Athena, Redshift Spectrum, and Glue ETL - and with a CloudWatch Rule I am notified when interesting things happen like a Database schema change or a failed crawl of S3 so I can automate a retry or changes to services downstream in my Athena SQL or Models running in EMR.

Tips by service

AWS Shield

Free service
Protects: ELB, CloudFront, Route 53
- SYN/UDP floods
- Reflection attacks
- Layer ¾ attacks

Pen Testing

Can only be carried out on certain services;
- EC2, RDS, Aurora, CloudFront, API Gateway, Lambda, Lightsail, DNS Zone Walking.
- Small or micro RDS instance not permitted.
- m1.small, t1.micro or t2.nano EC2 not permitted

AWS Certificate Manager

Cannot export
Only for Route53 registered domains
Works with CloudFront and Load Balancers

API Gateway

Always throttles requests (default max 10k per sec)
If burst-limit exceeds 429 Too Many Request is returned
Use TTL caching to mitigate, max 3600 sec

Systems Manager

Parameter Store: EC2 (RunCommand), Lambda, CloudFormation, API
RunCommand works on instances or Tags, and execute as root

S3

Replication uses SSL by default, versioning must be enabled at both ends
Object DELETE are replicated, versioned DELETE are not
Enforce SSL access using Condition aws:SecureTransport bucket policy
Pre-signed URL default 1hr, you can pre-sign PutObject data too

CloudTrail

Delivered to S3 every 5mins, with 15min delay
Only logs API, does not record instance-data, ssh or rdp
Logs include api request metadata, identity, time, sourceIP, parameters, and response elements
Use DIGEST for integrity validation of logs using SHA-256 hashing with RSA for signing

CloudWatch

Events are near real-time of: resource changes, CloudTrail, scheduled, or custom from code
Rules match incoming Events, Rule Targets can be Lambda, SNS, SQS, Kinesis

EC2

Dedicated instances are account locked and may share hardware with other instances that are not dedicated if they are of the same account
AWS Staff have access to Hosts and Hypervisors, they cannot access guest operating systems
RAM and storage are securely scrubbed before delivery to customers

AWS Config

Resource inventory, configuration history, with change notification
Trigger periodic, or filtered snapshots

AWS Inspector

Need EC2 agents installed on the assessment target
Create assessment templates to run and verify rules against findings
Detects common vulnerabilities, CIS benchmark, and runtime behaviour analysis of network, file, and process as well as advises remediation

Trusted Advisor

Cost optimisation, performance, availability, and limited security
Requires business/enterprise support for unlocked features

CloudHSM

Only CloudHSM (full control of keys) meet level 3 FIPS 140-2, KMS does not.
CloudHSM offers asymmetric encryption, KMS only has symmetric available.
CloudHSM is single tenanted, KMS is multi-tenanted.
4 main user type for keys;
- PRECO – Precrypto officer
  - User and password management
- PCO or CO -- Crypto officer
  - All Key management, key access issuing, material creation, signing, verifying, chaining
- CU – Crypto User
  - Basic key management (create, rotate), allowed key exporting, signing, verifying
- AU – Appliance User
  - Can perform cloning and sync operations in a cluster

KMS

KMS is region specific
KMS cannot be used for EC2 key/pairs, these are asymmetric, and allowing AWS to generate these would allow AWS access into the EC2 instances
KMS integrates with EBS, S3, Redshift, Elastic Transcoder, WorkMail, RDS, more
- Keys include; alias, create date, desc, state, material
- Cannot be exported
AWS-managed CMK have no material
Customer-managed CMK need symmetric 256-bit key material provided
- You can also build extra resiliency by storing the key yourself outside of AWS
- Avoids 7-30day wait when deleting keys
- No automatic key rotation

AWS Shield

On by default, $3000 a month for advanced options
- Incident response team
- in-depth reporting
- payment reductions when victim of an attack

AWS WAF

Regionally integrate with load balancers or associate with cloudfront distribution for global
IPv6 is supported, and CIDR blocks /8, /16, /24, /32
Allow or Block all except specified
Count requests based on properties matched
Properties: IP, Country, header values, body length, SQL with known exploits, scripts with known XSS

AWS Marketplace

AWS Partners and Authorised appliances may be used to conduct testing
Firewalls, Hardened OS, WAF, Antivirus, Security monitoring, etc
CIS Benchmarked OS

VPC

Ensure default route table has no public route out to the internet so new subnets that are associated by default are secure by default (AWS default route tables are not secure by default)
- The same applies to the NACL allowing all inbound and outbound traffic
NAT instances must be behind a security group and in a public subnet, disabled source/destination checks, are not patched, is not scalable, is single AZ, and single subnet.
- Prefer NAT Gateways that are patched by AWS, scalable to 10Gbps, no security groups associated, no need to disable source/destination checks, apply to a route table, and get a public IP by default
ALBs need at least 2 public subnets
Flow logs cannot be tagged, and the IAM role or configuration cannot be edited
VPC Peered Flow logs only work when both VPCs are in the same account
Flow logs do not monitor DHCP, DNS, instance metadata address 169.254.169.254, Windows activation, or the AWS reserved IP addresses
VPC Endpoints are either;
- Interface: single ENI
- Gateways: more durable (not single ENI)

General Security

CIA: confidentiality, integrity, availability
AAA: authentication, authorization, accounting
Non-repudiation = cannot deny a fact
Shared Responsibility model changed per Infrastructure, container, abstracted levels;
- Infrastructure (EC2, VPC, EBS)
- Container (RDS, EMR)
- Abstracted (S3, Dynamo, SQS)
ECDHE for Perfect Forward Secrecy; Elliptic Curve DHE (Diffie-Hellman Ephemeral) key exchange.

How AWS Accounts connect

Host OS agents

Single-Tenant vs Multi-tenant

Secrets management

Events and Integration Hooks

Alerts

Tips by service

AWS Shield

Pen Testing

AWS Certificate Manager

API Gateway

Systems Manager

S3

CloudTrail

CloudWatch

EC2

AWS Config

AWS Inspector

Trusted Advisor

CloudHSM

KMS

AWS Shield

AWS WAF

AWS Marketplace

VPC

General Security

Links

You might also like...

Private AWS S3 - How hard could that be?

Everything in AWS is an API, is it secure?

PCI DSS - Are AWS KMS and CloudHSM suitable?

Misunderstood Business Continuity risk domain

Problems with AWS API Gateway stemmed from CloudFront