9 min read

5 Myths of Software Composition Analysis (SCA)

Recently someone asked me to help them with an SCA tool (you know the name, I snicker a bit when I hear their name)

They had a bunch of findings in the SaaS dashboard that were confusing them, it was presented really nicely showing exploit samples, what good configuration looks like, and thousands of words to help with any context you might have.

Except it doesn't address the 5 main myths of SCA, and the confusion this person had was a CLASSIC myth..

I simply responded that:

Your SCA tool is not doing what you think it is

At first I thought I was explaining 1 myth, but after they asked me many clarifying question I learned from them that the myth is actually 5 very different myths that all play matter!

Let's explore them in logical order; from most common misconception to the myths that become more edge cases or may be SCA tool dependant.

Myth 1: You need an SCA tool

Obviously SCA is important, but do you actually need to acquire a specific tool to perform SCA or do you already do SCA and just not realise it?

If you write code and install dependencies, which is essentially standard practice  for any modern software development in most programming languages, then you probably use a package management tool. These package management tools often have a built-in capability to inform you of known vulnerabilities in the dependencies of their ecosystem. If it's not built into the tool they often track these known vulnerabilities and have a public advisory, and there's likely a popular add-on for that ecosystem to check if your dependencies have any known vulnerabilities from this advisory database.

All an SCA tool from a commercial vendor will do for you is analyse multiple programming languages in one single tool the same way you can use your own programming languages capability yourself. It's seamless for you so it is not too difficult for them to bring it all together either. Maybe they add some additional features not available, but that is the same as saying they could have contributed those features upstream to the ecosystem but chose to be commercial instead to take your money, they don't want to improve security for everyone.

Some examples of built-in SCA are;

  • Nodejs NPM includes npm audit
  • Python has pip-audit for PyPi advisory database
  • Java has the mvn verify with NVD feed via dependency-check
  • Go has vulncheck for their golang vulndb
  • Ruby has bundle-audit for the rubysec Advisory database

You get the idea

Myth 2: SCA detects known vulnerabilities

Okay to be fair it does analyse your software and the results will tell you if you have any known vulnerabilities, but the reality is the results are extremely narrow subset of what are considered 'known vulnerabilities'

SCA only detects OBVIOUS 'too hard to miss' known vulnerabilities

SCA will not give you 50% of known vulnerabilities.
It will not even give you 20% of known vulnerabilities.
I'd hesitate to even say it informs you of 10% of all known vulnerabilities!
Maybe, and I am really stretching it here for generosity sake, it will inform you of 5% of known vulnerabilities MAX.

The most obvious reason for this is most known vulnerabilities are not reported as vulnerabilities at all, and patches (if any are produced) are not released as security patches (rather just a normal patch).
These are still known to the developer, and any malicious actor or security professional looking at the project.

If the changes are treated as a security issue by the project they are almost never submitted in any vulnerability database and recorded as a vulnerability against the project. If it isn't in these vulnerability databases it won't be included in SCA tools databases either because they most just reference or collate these vulnerability databases.

For example a history of Pull Requests on Github has vulnerabilities that are well known, or the many bug bounty programs have 'bugs' that are just known weaknesses in the project participating in the bug bounty platform, but not submitted to CVE issuers and will not be in the SCA databases either.

Or a perfect example is a recent news article about zlib which Tavis Ormandy (Google Project Zero team) rediscovered a vulnerability, a fix was merged addressing the bug in 2004 but a patch was never made available even though zlib merged the fix 17 years ago (CVE-2004-0797), it was just "merged". It was publicly known the whole time and no SCA tools ever advised you and this was in the CVE database!

The obvious point here is the SCA tools are also going to have a limited scope of vulnerability database because they can't scale and have it all. They most likely won't track your technology directly because that's your job, they rely on the collection of vulnerability sources and will not be tracking individual projects you care about. Even when you have SCA you should still be monitoring your chosen technology directly because no one, not the SCA vendors, will do that for you.

Myth 3: SCA analyses entire dependency trees

Wrong again, but why is this a myth?

Simply put, SCA merely analyses the tree of the app in the following context:

  • You have fully and completely installed the app before it is scanned
  • It is built and run in production mode when scanned
  • If it runs on a specific Linux distribution in production, you must build and scan it on the same Linux distribution, not scan it on your Windows, Mac or other Linux distribution locally
  • The same architecture for production  is used when scanned
  • The exact same build tools are used to build before scanning, not just the high level tools but also the low level compiler and related tool chain on the operating system
  • The exact configuration is used for the build tools when scanning that would be used for building the application in production

Essentially, if you don't SCA scan in production you are likely going to miss something

The issues that arise for each point:

  • If it is not 'installed' exactly as it runs in production, it skips the tree not present in the environment if built in CI or local and different conditional dependencies deep in the dependency tree (that you don't control) are installed
  • If your app has a development and production mode, the dependency tree may be polluted in development mode and even force the dependency conflict resolver to install production dependencies with different version more compatible with development tool chain
  • If the SCA runs in non-production environments like Windows or Mac locally, the tree is not the same as would be built and run in production even if the build is done in a Linux environment it is not likely the local or CI pipeline runs the same Linux distribution, SCA testing in Alpine, Ubuntu, or Arch is not going to help you secure your dependencies that only exist in RHEL where your app runs in production
  • Most if not all dependencies are built upon install, and the build system will produce different dependencies for different architectures, for example an overflow in a 32bit architecture may not exist if built on a 64bit architecture, or the intel/amd/arm/m1 processors will introduce differences in complied binaries where a bug may not exist on all architectures
  • You can have a different dependency produced if you installed and are using GCC or Clang/LLVM as both will gladly compile the source of dependencies transparently but may result in more or less secure binary due to compiler differences
  • Similar to the above, your tool chain may be configured a certain way from system to system, even if both are using GCC they may be configured differently and will gladly compile the source of dependencies transparently but may result in more or less secure binary due to configuration differences

Essentially the SCA tool cannot know what it doesn't know, and if you did not build your software in a reproducible way then it can only scan your software given the way you built it, and may not represent the same scan results if you also run the SCA scan on the same software in production.

Myth 4: The SCA tool knows your dependency tree

We talked about how the SCA tool only see's the software in the state you had built it, but even then the SCA tool will ignore many dependencies that are actually present!

There are many ways to include a dependency in your software, the SCA tools typically standardise on package managers. For a single programming language there are many package managers, JavaScript has more than 20 I can name and even the rank 20 would have millions of active users but the SCA tools typically support the top 1 or 2 used package managers and users of the rest, even tens of millions of active users for the 3rd top package manager - is simply not supported.

This is a big problems because I have seen teams run SCA scans and on day 1 the report was 0 findings. Good right? It 0 because the SCA tool detected something it supports, but you used a different build process and what the SCA detected is a stub, used for something other than dependencies, or informational only file.

Lastly, even if you use the supported package manager you might also decide to introduce additional dependencies outside that package manager. You may include dependencies by "vendoring" them, directly embedding them as files copy/pasted, "bundling/mangling" (popular for JavaScript), or by other means include a dependency source code, binary, archive, etc.

Who does this!? AWS does it for botocore, they 'vendor' a subset of the popular requests library directly. There are many requests vulnerabilities and none would have ever been found by SCA tools scanning botocore. I pointed this out and it was soon refactored and requests was changed to be consumed using pip but for years it would have never identified any vulnerabilities I know existed (still do in some customer environments that have not upgraded including the AWS managed environment that build in old versions of AWS CLI like Ec2 AMI used for Lambda, ECS, AWS Batch, EMR, ect.

Bottom line, if this is a widespread issue for the largest cloud vendor, it is common and SCA tools can not and do not try to match to their vulnerability database unless you use the supported dependency management method, and only that method.

Myth 5: pinning dependencies is more secure

Many sources are going to tell you that pinning dependencies will prevent malicious software from being installed.

There is an SCA relationship with this topic, but first what is dependency pinning?

Software is often made up of mostly code you don't write; these are your dependencies.

These may be versioned or un-versioned, for this conversation we're only interested in versioned.

Pinning means your software will integrate only a specific version of a dependency, even if a security patch becomes available your software ignores it if you pin dependencies.

OWASP DSOMM claims this is a security practice:

Risk: Unauthorized manipulation of artifacts might be difficult to spot. For example, this may result in using images with malicious code. Also, intendend major changes, which are automatically used in an image used might break the functionality.

Didn't we just learn that pinning prevents security updates? This is mostly why SCA tools are ineffective - non-security aware developers believe pinning is secure so they will pin everything, then the SCA tools cannot safely update dependencies when security patches are identified.

Why would DSOMM recommend a clearly by-design insecure practice?

Opportunity: Pinning of artifacts ensure that changes are performed only when intended.

Myth 3 described reproducible builds, versioning really isn't close to addressing that. There's no versification built in to ensure the result is what you expected, so unauthorized manipulation of artifacts with the same version as you pinned can still occur and without verification you're completely unaware. Pinning did not help one iota, pinning or not didn't change a thing.

So we know you ignore security patches, and you don't get the benefit you thought you did, what exactly does pinning help with?

Maybe partially pinning is what DSOMM should have described, which only let's through the security patches if SemVer is followed. But this is not always true and any changes can make it through even if they're not security patches, because even partially pinning doesn't verify artefacts but at least we can get security patches as long as semantic versioning is followed.

Okay so what's negative about not pinning? Every time the software is built the dependencies will be updated to the latest version even if your usage has not taken changes into account which can cause your software to break, and malicious introductions as new versions can be in the new version too.

The main argument for not pinning requires the software testing to be trustworthy. Updating dependencies is fine if you test software before it is deployed.

People who prefer pinning describe how not pinning is dangerous, referring of course to lacking sufficient testing so they pin to a known good version for their implementation, not known good in terms of security.

Clearly they fail to consider they're still going to have dependencies break future builds because they lack testing; for example pinning still downloads the dependencies from the internet, what happens when it cannot be downloaded, or it does but the code is different even though the version changed (common on go that relies on git tags)?

Essentially pinning people what to have repeatable builds but do not understand the implementation needed to achieve repeatable builds.

To be fair pinning can also be dangerous if you do it blindly. But that is the same risk for pinning too. Pinning is not the cause or solution for this problem of verification. Not pinning can be a risk of malicious code being introduced, pinning also shares this risk. Pinning is not the cause or solution for this risk.
In fact, testing or not testing your code is related to pinning but the risks associated to pinning or not pinning is not correlated to whether or not you test. They're still valid risks whether you pin or not regardless if you test or not.

Pinning dependencies is the wrong conversation to have for a vulnerability related risk in the ecosystem.

Trust, and therefore verification, is the right conversation to have.

Pinning or not pinning is irrelevant, how do you assure trust has been established, ergo what validation do you do to ensure trustworthiness every time a dependency is built into your software?

Pinning is not the trust validation mechanism, because the artefact can change even when pinned.

Do you pin and blind trust?
Do you not pin and blind trust?
Both are blind trust, pinning is an irrelevant security conversation.

Do you validate every version you build into software? Yes? Then it doesn't matter how you validated, it's a yes you validated and pinning is irrelevant, because what you pin or don't pin is trusted and validated which prevents unexpected changes! Huzzah!

Dependency pinning is an irrelevant security conversation, unless of course you are pinning dependencies then you should consider how secure you will be when the pinned version is vulnerable and your SCA tool skips the security update.