As we all move more toward an ever increasingly digital world, periodically every business steps back and asks itself a number of questions around its digital resilience. At, or close to the top, is the question: Are my response processes still fit for purpose?
This is almost the most fundamental agreement between security and the business on what is the expectation, not just today, but in the next 12-24-36 months. As the world is so dynamic I would challenge further whether it is just too hard for most businesses to predict.
Also, you should consider that the complexity of an incident is extremely varied. This is why most SOC’s have tiers of capabilities. As such it's worth considering the expected response levels at each level. This is in part why many are moving to a mixed model of in-house and as-a-service to leverage their own skills and capabilities, and where they best fit, and then supplement the gaps with external services. For many it's simpler to outsource the entire SOC into a managed service, so you hold someone else accountable to achieve your expected outcomes.
Either way, you need to consider what “response” means, typically metrics such as Mean Time to Detect (MTTD) and to Respond (MTTR) are key metrics used. However, in the Cyber Defenders Council we have been discussing what “response” really means. Is this tactical? i.e. we have stopped this iteration of the adversary, and as such they will be knocking on our door in hours again.
Or, have we stopped to think like the adversary and look at how more systemic controls have been put in place that means the adversary has to go back as far as possible to ground zero, i.e. they have to completely start again to be successful.
Take as an example ransomware, we know that 80% of those that paid ransomware were attacked again. The question being were all the systemic steps taken that could possibly be taken? All too often I have seen first hand that SOC teams are under pressure by the pure volume of security events, so it's a little like being on a production line, and no matter how hard or fast you work the events just keep on coming.
If we purely measure on aspects such as MTTD and MTTR, effectively we encourage the generic “production line” mentality. Instead we need to consider much as Henry Ford did as he built the first car production line: What are the efficiencies you are looking to achieve?
Ford had 3 key principles:
Over the years, we have continued to add to and modify our SOC production lines. The question should be: Is it optimised based on the skills and capabilities available today? And I would challenge what mindset and goals you have: What should be our few simple principles and metrics?
Not so long ago, I attended a tour of a very large German car manufacturer's production line. It was amazing to watch how every step had been carefully considered and optimised. And it made me think back to typical SOC production lines.
Right from the start of the car's production, there is an associated “build sheet” that ensures each specific part of that specific car arrives as the car moves along the line just in time to be used. It means the engineer isn’t searching through parts bins or deciding which part to use, they simply reach round and the right part is there, it's the only selection.
Now consider what happens in the SOC production line: There is an artefact discovered, the humans then have to go search to try and figure out if there is a product that aligns to the part they have found. If they can find that match, then they have to go find the production plans and start foraging effectively through every parts bin to find the other parts to build out the attack.
When you consider how we put together attacks versus how these cars were being made, we are far from having an efficient production line in the SOC. As such, my first principal and metric should be around the quality of what we give to analysts.
How many incidents are analysed per day that are not an incident? In the car world, this would be having extra parts in the bin that don’t fit. What's key is, this both slows down production of genuine threat analysis as you are trying to rule out if they fit.
When analysed in their own right these are time consuming, because it takes longer to verify it is not a threat than it does to confirm it is a threat. As such, one of the best ways to optimise is to weed out the noise.
We need to look at the whole picture and its outcomes, not just pieces of the puzzle, and typically what comes to a SOC team is a bucket full of bad stuff. Detection requires known Indicators of Compromise (IOCs) or unknown Indicators of Behavior (IOBs) based on chains of potentially malicious activity used during a compromise. But at the heart of every decision to take remedial action is the thought process: Do I have enough evidence to be confident that what I’m seeing is what I think it is?
Henry Ford's second principle is somewhat similar to the problem SOCs face. Data from multiple sources in multiple structures. To make it into a production line requires finding a production process that's scalable and easy to achieve. In recent years MITRE has come up with a scalable production blueprint which is their ATT&CK framework.
Having a structure allows you to more easily put together the pieces in the same context as the adversary. Key points of continual assessment are how quickly this can be done, and to what degree of confidence it gives you that the threat has been correctly identified. As the data pools can be large, computational power is increasingly important.
What is the longevity of the remediation steps? By having a common structure that allows you to map the threat by thinking like the adversary: How will the attack work end to end? What impact would it have on my business? Which is key to prioritisation?. And of course, what steps do you take to remediate?
It's typically easy to block a new binary or a registry key change, but at the same time these are the aspects that are easiest for the adversary to also change. As such, the question has to be where can you make easy changes that would have more long term impact?
For example, most attackers leverage either their own, or more typically, compromised public infrastructure to deliver the attack and enable command and control (C2) communications. Blocking, for example, their method of command and control requires far more effort on the adversary's part to resolve, as they now have to redesign how their attack functions rather than simply what it looks like when you block the binary.
SOC modernisation is (at least for the foreseeable future) an evergreen problem: Technology is increasingly key in business, so the time to respond is reducing. Complexity is growing in every aspect of how the technology is being used and how it is being protected against both the adversary and their techniques.
At the heart of this, we need to keep rethinking what our optimum production line looks like. There are huge amounts of data that can be generated; indeed at times many security products were seen as better if they produced more alerts. Today we need to move from an alert gathering and processing mindset to a threat disruption mindset.
Think like the adversary by understanding the threat world and its potential impact to your business. Build the production line (SOC) that allows you to confidently aggregate data into malicious operations (MalOps) and have an offensive mindset and ask yourself, how do I disrupt rather than simply block the adversary?
Cybereason is dedicated to teaming with Defenders to end attacks on the endpoint, across enterprise, to everywhere the battle is taking place. Learn more about AI-driven Cybereason XDR here or schedule a demo today to learn how your organization can benefit from an operation-centric approach to security.