A look into the suitability of AWS KMS and CloudHSM for use with workloads in-scope of PCI DSS.
Who owns my encryption key in AWS?
By owning we might think of ownership as who has potential to access the key to decrypt the data. I like to think of ownership being based on who creates the keys used for encryption.
Others may also consider ownership being anyone with the ability to create and revoke keys on-demand, but who has the ability to decrypt the data, for me, is far more synonymous with ownership.
Owning can mean different things based on different audit requirements, or the frameworks and standards you implement
If we look at ownership in AWS, Amazon staff technically can access your systems. We have enough evidence to trust that AWS have done all they can to ensure that staff will not access our operating systems, and data.
Evidence provided in the form of third-party attestations in AWS Artefact.
Important: get familiar with how AWS decided to use the term CMK in their documentation
I will delineate between "Customer-managed CMK" and "AWS-managed CMK" in this article, which in my opinion is an oxymoron given "AWS" managed "Customer" master keys are not actually keys distinct in any way related to a "customer". Only when a Customer provides a key material to a CMK is there any change to ownership but both scenarios where you do and do not provide key material are still called a "Customer-managed CMK".
Are you confused? I was, and most people are
Don't worry, most people aren't aware there is even an option for a customer to provide key material to a "Customer-managed CMK", let alone any of the differences between customer or AWS managed CMKs.
What do I need for PCI DSS?
Before we look at suitability, just what are the PCI DSS requirements that effect these service directly?
In the current PCI DSS version 3.2.1 there is one particular requirement group;
- 3.5 Protect cryptographic keys used for encryption of card holder data against both disclosure and misuse
Specifically in 3.5 there are these two important requirements;
- 3.5.1 restrict access to cryptographic keys to the fewest number of custodians necessary
- 3.5.2 store cryptographic keys securely in the fewest possible locations and forms
There are several other service that must meet these requirements also, such as CodeCommit or EC2, KMS, CloudHSM, and ACM.
It is important to understand this does not apply to public keys in a key/pair scenario like RSA used with SSH, as they are intended for public use encrypt only, not decrypt at all.
For other PCI requirements related to these service, see the Automatic deficiencies section for a detailed analysis of PCI requirements related to other AWS services used when KMS and CloudHSM are in-scope of PCI.
The AWS Artefact collateral will inform you these requirements are the customer, or shared, responsibility
If we look deeper into PCI to find what type of cryptography should we use? Requirement 3.5.3 says secret and private keys used to encrypt/decrypt card holder data in one or more of the following forms at all times;
- Encrypted with a key-encrypting key that is at least as strong as the data encrypting key, and that is stored separately from the data-encrypting key
- Within a secure cryptographic device, such as a hardware (host) security module (HSM) or PTS-approved point-of-interaction device
- As at least two full-length key components or key shares, in accordance with an industry accepted method
Firstly, the statement "stored separately from the data-encrypting key" referring to a "key-encrypting key" (KEK) must work directly with requirement 3.6.6 also because;
- the data-encrypting key (DEK) provided by KMS includes clear-text
- the KEK simply put is a CMK, and for ownership we will be generating the CMK with key material. If material generation is not done via CloudHSM it would be done outside of AWS
Therefore having KMS responsible for both DEK and KEK, both concerned with clear-text, does not satisfy this requirement.
Next, looking at the PTS-approved point-of-interaction device list by company name, there are no AWS offerings at the time of writing, leaving the HSM as a prescriptive requirement.
Lastly, "industry accepted method" and "full-length" are not prescriptive, and not quantifiable, so refer to your QSA to clarify the requirement at the time of the audit.
Only CloudHSM provides full control of keys.
Not PCI requirements, however still interesting;
- CloudHSM meets level 3 FIPS 140-2,
KMS does notKMS also meets this now.
- CloudHSM offers asymmetric encryption, KMS operates symmetric encryption
- CloudHSM is single tenanted, KMS is multi-tenanted
- CloudHSM is limited to a single VPC, therefore requires a VPC and applications must be able to route to the IP of all HSMs in your cluster.
- The control plane of CloudHSM (the service) is the AWS Query API (public internet), with no VPC Endpoint available, whereas KMS no offers a VPC Endpoint.
While KMS is entirely suitable overall, it requires far more effort to achieve a state of compliance.
KMS requires you to, when providing key material to a "Customer-managed CMK", to apply additional workstation security for managing key material generation.
An employee physical and logical access comes into scope, and device hardware with wireless interfaces or radios described in requirement 9 can also be a challenge.
The largest concern for you to address is whether or not you will use a Customer-managed CMK with provided key material that is not generated by CloudHSM.
Generating key material outside of AWS will inherently broaden the scope of PCI to employee workstations.
Note: at the time of writing there are no known Certificate Authority (CA) that will integrate with the KMS endpoints to supply key material directly. The majority of CAs do not have programmatic access to the key material either, needed so that you might automate the process.
Using any HSM (external to AWS) is an alternative.
To mitigate some concern when broadening the scope outside AWS, you might already apply appropriate governance and process to manage encryption keys, and use a HSM of your own. If not, It is recommended to use the CloudHSM.
You can also simply choose not to utilise Customer-managed CMK with provided key materials at all. It is an option, but that option would require you to complete an entry in the Compensating Controls Workbook (CCW) to meet requirement 3.5 beyond the prescriptive intent and to the satisfaction of the QSA's discretion.
Some key considerations for KMS;
- KMS cannot be used for EC2 key/pairs, these are asymmetric
- No automatic key rotation for a customer-managed CMK when providing key materials
- For CMKs where you do not provide key material, AWS generates key material for you, potentially allowing malicious insiders access to decrypt through vectors such as factorisation, derive the secrets with additional knowledge, or other implanted vulnerabilities. A risk you may chose is acceptable given the attestations.
- It is common to set KEY_MATERIAL_DOES_NOT_EXPIRE for CMKs
- AWS documentation informs you to use SHA1 for the wrapping key, specifically RSAES OAEP with SHA1 - in 2005 a paper (ISBN 978-3-540-31870-5) demonstrated weaknesses in SHA1, followed by an RSA conference talk by Daniel R. L. Brown in 2007 that demonstrated secure RSAES OAEP implementations and SHA1 should not be used
Not PCI requirements, however still interesting;
- Keys cannot be "exported", which is to say both the key-signing keys and the private key are secret.
- AWS-Managed CMK cannot be shared cross-account, meaning anything encrypted by one cannot be migrated as-is because the target account would not be able to decrypt. So using a Customer-managed CMK is best from a security and operation use perspective.
- Key material for Customer-managed CMKs require a symmetric 256-bit key material to be provided. Note: a 3072-bit RSA key is equivelent to a 128-bit symmetric key
Great parts of KMS
- Effortless to use CMKs when not providing key material
- KMS is region specific, whereas ClousHSM is limited to a VPC
- Keys can be aliased
- AWS-managed CMK require nothing more than a configuration value, great if you just want to encrypt things (outside of your usual obligations)
There are several deficiencies you should consider when using AWS, these are due to the inherent nature of the services AWS provide and their implementation detail that is by-design.
Some PCI requirements automatically require an entry in the CCW, a detailed analysis of how we should reason with these concerns can be seen the table of specific PCI requirements below;
|1.3||Prohibit direct public access between the Internet and any system component in the cardholder data environment||All AWS APIs are inherently public internet addressable, and services such as KMS and CLoudHSM offer no controls to avoid this. While a VPC Endpoint can be used for your implementation, the inherent public nature is unchanged and communications cannot be restricted to be only via the VPC Endpoint.|
|2.3||Encrypt all non-console administrative access using strong cryptography||While this point is contentious due to the AWS Query request APIs (the main HTTP API) itself provides you the ability to enforce the prescriptive controls, the AWS CLI built in to EC2 by default allows SSL/early TLS and can not currently support Perfect Forward Secrecy (PFS) at all, due to it's reliance on boto3/botocore python libraries (I'm trying to fix this). Additionally, most SDKs do not allow configurations that can be configured to restrict AWS Query API requests directly, therefore these SDKs defaults, while varied, are also mostly allowing SSL/early TLS. Use of SDKs may require you to address the requirements in Appendix A2|
|3.5||Document and implement procedures to protect keys used to secure stored cardholder data against disclosure and misuse||This applies in 2 scenarios; 1) KEK conditionally to the use of customer provided key material for Customer-managed CMKs when not integrated with CloudHSM. 2) DEK providign clear-text provided by the AWS APIs traverse the public internet and require a VPC Endpoint. An entry to the CCW is needed for this which is a trivial exercise if all requesters are AWS services, non-trivial if any requester is a bespoke or external to AWS implementation|
|3.6.4||Cryptographic key changes for keys that have reached the end of their crypto period (for example, after a defined period of time has passed and/or after a certain amount of cipher-text has been produced by a given key), as defined by the associated application vendor or key owner, and based on industry best practices and guidelines (for example, NIST Special Publication 800-57)||In KMS there is an option, not the default, to never expire a CMK. For this reason AWS do not meet the requirement and it is the customer responsibility to ensure this requirement is met|
|3.6.5||Retirement or replacement (for example, archiving, destruction, and/or revocation) of keys as deemed necessary when the integrity of the key has been weakened (for example, departure of an employee with knowledge of a clear-text key component), or keys are suspected of being compromised||KMS automatic key rotation has no knowledge of your staff off-boarding process, therefore automatic key rotation is not sufficient alone, all data encrypted using keys the ex-employee has access to must be initiated for key rotation, including backups|
|3.6.6||If manual clear-text cryptographic key-management operations are used, these operations must be managed using split knowledge and dual control. Note: Examples of manual key-management operations include, but are not limited to: key generation, transmission, loading, storage and destruction.||Applies when choosing to provide key material to Customer-managed CMKs in KMS (without integration to CloudHSM) due to the necessity of having an external to KMS process to provide the key material which is inherently clear-text file key generation and transmission operation|
|4.1||Use strong cryptography and security protocols to safeguard sensitive cardholder data during transmission over open, public networks||as 2.3 above|
|8.1.6||Limit repeated access attempts by locking out the user ID after not more than six attempts||Not an available feature of the KMS API|
|8.1.8||If a session has been idle for more than 15 minutes, require the user to re-authenticate to re-activate the terminal or session||AWS provides configurations of the total session duration only, there is no idle settings available|
|8.2||In addition to assigning a unique ID, ensure proper user-authentication management for non-consumer users and administrators on all system components by employing at least one of the following methods to authenticate all users||While MFA is available to IAM Users, it is not the default, and IAM Users are not the best practice access pattern|
|8.3||Secure all individual non-console administrative access and all remote access to the CDE using multi-factor authentication||as 8.2 above|
|8.5||Do not use group, shared, or generic IDs, passwords, or other authentication methods||the default and general best practice access pattern is to use federated or assumed role based access which is violation of this prescriptive requirement|
|10.2||Implement automated audit trails for all system components to reconstruct the specific events||Both KMS and CloudHSM rely on CloudTrail to meet this requirement which is not available by default|
|10.3||Record at least the following audit trail entries for all system components for each event||as 10.2 above|
|10.5||Secure audit trails so they cannot be altered||While CloudTrail is not used by default, when it is in use the digest feature is required to meet this requirement and it is also not a default. Additionally S3 log delivery must be configured with controls in place to prevent tampering of the S3 Objects that represent both the log data and digests|
|10.7||Retain audit trail history for at least one year, with a minimum of three months immediately available for analysis (for example, online, archived, or restorable from backup)||While not used by default, S3 life-cycle events and glacier storage classes are available to meet these requirements|
|11.5||Deploy a change-detection mechanism (for example, file-integrity monitoring tools) to alert personnel to unauthorized modification (including changes, additions and deletions) of critical system files, configuration files, or content files; and configure the software to perform critical file comparisons at least weekly||as 10.7 above while also considering CloudWatch usage for monitoring activities|
The above considerations are particularly critical when you are operating a Level 1 CDE.
You may chose to focus more on PCI requirement 11 if you operate a Level 2-5 CDE, leaving the SAQ and CCW efforts relatively relaxed. This would of course also be depending on your overall risk appetite, or other obligations outside PCI.
Basically, PCI DSS is a prescriptive standard but has two key facts which seem to be unknown to many organisations;
- The QSA has full discretionary decision making power
- There is the Compensating Controls Worksheet (CCW) that is available to you if you do not implement any of the prescriptive requirements.
Yes, you read these correctly.
You can actually achieve PCI DSS without meeting a single prescriptive requirement as set out in the PCI guidance material, if the QSA agrees you provided evidence of compensating controls in the CCW that meets or exceeds the intent of the requirements themselves.
So the verdict is; to have a quick audit without many questions from the QSA you want to avoid the CCW.
In context to this article, using the CloudHSM provides the highest degree of control and assurance to the QSA while also meeting the PCI prescriptive requirements for its feature set.
PCI is not all about requirement 3, Encryption.
But while we are on the topic; ensure you apply appropriate key management as briefly mentioned above.
There are three important factors to consider here;
- All keys are single workload use, so revoking a key won't take down more than one workload
- Have separate access patterns for key encryption, and key decryption. So that the key creators cannot also the read data. this is good for operations also due to a separation of concerns
- Be careful of scope increase due to access controls. This is commonly where access to the CDE is federated, so there might be too many users with permissions to access the CDE due to groups and time. Another is direct programmatic access being permitted to the CDE from devices which are not locked down to access the CDE alone. That is to say an employee workstation on wifi accessing the CDE is going to effect scope if they have the ability to read-only any cardholder data (which is transfered to the workstation to be viewed).
If you are not applying this level of completeness, prepare to spend a far longer time with your QSA addressing every detail of every one of the PCI requirements, or the entries in the CCW. They will scrutinise over all details for linked collateral entered into the CCW also, because it is their personal reputation on the line if they use their discretionary decision making power poorly.