Back in the summer I wrote a blog around capability versus usability, in which I highlighted that typically industry testing focuses on capability, despite one of the key challenges in the industry being skills. EDR by its nature, is a technical capability and as such the skills gap in this space is even greater. I will always remember a good friend sharing in his keynote, a number of years ago, that there is little point in buying a best of breed solution if you don’t have the people powers to actually use it.
In our recent SoC optimization research we saw that on average only 50-80% of alerts are processed the same day, false positives being a significant challenge and distraction for SoC analysts.
All of which makes me extremely happy to see that industry testing organizations, such as MITRE are expanding their scope to look increasingly at both capability and usability.
Capability
MITRE have increased the scope of capability testing, this year looking at both Ransomware and extortion from three differing actors, spanning across Windows, Mac and Linux systems. Whilst typically our eyes are drawn to the outcome, which is how much did you detect and how much did you prevent, there is a third metric which for me is the most important when looking at capability testing, this is “OUT OF THE BOX DETECTION COVERAGE”.
Why is this just as important as the actual capabilities? Firstly, if behavioral detection methods need to be tuned to weed out the false positives, this is a sign that for the testing evaluation, it required turning on detection controls which the vendor by default wasn’t confident enough that the capability would ‘cry wolf’ too many times. For the vendor this can result in lots of support calls, and for the customer can impact significantly on their ability to process all the alerts the same day.
Secondly, it increases the knowledge requirements for the customer. They need to have the skills to understand the value each detection capability has, then they need the expertise to build out the appropriate allow and block lists to get the value from the feature.
Usability
In this year's testing, MITRE actually tracked the volume of false positives, and the results varied from zero false positives, which I’m proud to share Cybereason achieved, to some that scored virtually as many false positives as actual detections. Put simply you would need double the amount of staff to get to the same outcome, which in reality isn’t even possible as typically it takes longer to prove something is a false positive, than it does to verify a genuine detection.
This naturally links into another important metric which is, what percentage of the steps of detection were seen ‘out of the box’. If an analyst is unsure of the results being presented to them, the first thing they will want to do is to go back to look at the evidence that proves or disproves the hypothesis. If the steps of the detection have a lower percentage, the more of the attack was missed. Put simply it’s like trying to guess the picture on a jigsaw when you haven’t got all the pieces. And as we all know the more you are missing pieces the harder it is to see the picture. The same applies to threat detection.
For me, the most interesting results are just how many alerts were triggered to detect the threats being utilized in the MITRE testing. We know that threats can be made up of hundreds of techniques and tactics and every endpoint solution is made up of many different layers and capabilities used to detect and prevent attacks. Likewise we know that often attacks go unnoticed and yet the negative business impact typically grows the longer the adversary can reside in your organization. Whilst the attack triggered alerts in the security solutions, each alert in isolation is not enough to define the detections as important, and therefore we rely heavily on the skills of the analysts to be the clairvoyant, i.e. be able to see beyond the raw alerts, through skills, intelligence and expertise, to join disparate bits of cyber security telemetry into something meaningful. So just how big is the potential alert overload? The alerts generated by Cybereason totaled just 18, to be able to detect all of the attacks, one hundred percent of them. The scope of the participants varied hugely, with the worst generating over 600,000 alerts (yes that’s correct it's not a typo)! Going back to our SoC optimization research, on average the MTTD/R is between 2-4 hours, so I will leave you to do the maths on how long it would take to process 100,000’s of alerts.
But you may already be thinking, what if I just focus on the high risk alerts? Most vendors were still in the hundreds of alerts and some well into the thousands. But regardless of that, by taking this approach you fall into the trap many adversaries hope you will. They know that organizations are typically struggling with alert fatigue and as such often aren’t concerned when lower risk alerts are triggered, the adversary assumes you simply won’t get to them and join the dots together. Can you imagine going to the business and saying sorry we missed the threat, we didn’t deem it to be important enough to focus on!
So why is Cybereason able to do this with so few alerts? It is our MalOp™ approach which is unique. We gather all the alert telemetry, we classify it against the MITRE ATT&CK framework and then use machine learning to effectively look left and right of that alert in the attack lifecycle to find the adjoining pieces, not just at a machine level but across all the systems you have. We call this MalOp™ Cross Machine Correlation (CMC).
Some of you may be using a managed endpoint service, and be thinking this isn’t my problem to solve, but in reality any managed service relies on the stuff behind it, and a managed service will scale better and respond to you faster, the more efficiently they can process all of their customers’ alerts. You should consider challenging them on all of the metrics in the MITRE ATT&CK enterprise evaluation before you sign up to any managed endpoint service, as it will give you a good leading indicator on their abilities to deliver to you an effective service.
Takeaways
It's clear that cyber threats continue to become more complex, as do the solutions we all use to detect and prevent them. Historically industry testing focused on the capabilities, and we each had to do our own usability testing.
Every cyber security team today is struggling with alert fatigue, a problem that is only going to become ever more prevalent as we have more to secure, the volume and complexity of threats grows, requiring ever more creative methods to detect them. At the same time the businesses and regulators expectations on MTTD/R continues to shrink, this is the Cyber time paradox.
MITRE, like many testers, continue to evolve their testing capabilities and it's great to see that usability is now becoming an integral part of testing. Many EDR tools especially require a lot of technical knowledge. I would challenge you to consider the following when you look at what metrics will help you find the right solution:
(1) What skill levels do you have in your cyber security team?
(a) How many alerts can they triage each day?
(b) What degree of knowledge depth can they get into?
(2) When looking at test results consider:
(a) How much tuning is required to get the best value from the solution?
(b) How many false positives do they produce?
(c) How many alerts are generated to get you to the point that you can actually
respond to the threat with confidence?
From this you should be able to determine the balance between capability and usability metrics for your business. What is clear is that both these metrics are critical to the success of any business’ endpoint cyber security strategy.