Before we begin DevSecOps is analogous to AppSec for this discussion as both share the same principles even if we consider them to have certain distinctions in responcibilities or niche focus.
In practice they serve the same audience, and should operate with the same 3 principles.
The 3 principles
1. Developer Flow
2. Security Assurance
3. Operational Observability
What is 'Flow'?
Flow requires all developer activities to be efficient
Flow in DevSecOps means security must be non-blocking until the code release is considered ready by the developer
This dictates where breaking CI/CD occurs, i.e. never break builds
What does 'Assurance' mean in this context?
Assurance is how we think about risk and it is provided by all releases having the necessary attestations that for the assurance itself.
A release is where any person outside the members who contribute to CI/CD or where the released changes access data or systems.
An attestation is where the release has completed a required assurance activity, like a security scan on the signed commit hash being released.
Observability is not all about the metrics, it's about the story they tell.
Observability in platform engineering and site reliability engineering are covered at length elsewhere, we are here to discover observability in a DevSecOps context.
An example of a bad metric not the vulnerability counts because no one should care how many you encounter but rather care how efficient and effective your team is in addressing them, there's no way to avoid encountering vulns, so everyone does, unless your applying ignorance to your process. Measured and observable systems become resilient and security capabilities are transparent when we have the ability to share our efforts widely.
Rationale for breaking deployments
Distinct from breaking builds
High maturity organisations never break a pipeline during build steps as this has a delivery flow impacts.
All security policy evaluations should be run in report-only mode (at build) to enable developers to identify violations and fix them, and for the PR approver to make appropriate informed exception decisions.
Breaking occurs during the deployment stage. This means the developer was informed of all the policy violations (local and build) and the approver had been enabled to record the accepted exceptions if required (at PR approval). Therefor any violations encountered during deployment are fair to stop the deployment if any defective changes are still present.
The deploy stage is no longer considered the developer 'workflow' because it is 'shipped' (done) so no productivity complaints are valid (defendable), and no one can claim not knowing about security issues found.
Having a deploy step look for attestations and block any unauthorised deployment attempts is the only time blocking the release is needed, as it doesn't impact developer flow, and the policy violation has numerous locations to be found and fixed; developer IDE, developer local build, PR pre-approval build (non-breaking informational only scans), non production releases (testing environments), though it was ignored and not addressed as an organisational acknowledged risk exception.
What is a production environment
You may have many semantically named environments for different purposes; sandpit, dev, uat, staging, nonprod, etc. These are NOT simply not-production because you named them semantically something else.
A production system needs only 1 of the following production characteristics;
- Customer data; includes values pseudo-randomised (masked), these are reversible* and most privacy laws acknowledge them as PII too
- Public addressable; can be routed form the internet connected device, even if authentication is necessary (VPN auth, login, allow lists)
- Accessed by anyone who is not contributing to the release; any customer or team member who uses the system rather than directly changes they system is not a contributor (they are users)
All of these characteristics share the same risks as any production system, therefore from a risk perspective they are production systems that require production system security controls to mitigate production risk - even if you call them a semantic name like UAT.
*reversible; it is only considered non-reversible when you use a secure one-way hashing function, or an equal verifiable standard approach to produce one-way randomised values