Skip ahead if you have heard this story, but when I started in anti-virus at Dr Solomon’s, Alan Solomon would share how he moved from doing hard disk data recoveries into antivirus because he received a drive to recover and recognized the corruption was logical. As such to fix the damage he wrote an algorithm (he was a mathematician by education) to undo the corruption. A few months later he was recovering another drive and recognized the same logical corruption, which led him to write a new algorithm to detect this corruption; this was how he started Dr Solomon’s antivirus software. The point here is that traditional anti-virus has always been based on pattern matching. Find something unique to each attack in its code, then you can write an algorithm or more commonly called these days a signature to detect, block and repair the attack. I remember Alan saying effectively that signatures had solved the virus problem, the volume would continue to grow, as would the complexity, but the same signature solution would always apply.
Adversaries however never stand still, be it polymorphism, metamorphism, fileless attacks, attacks that compile real time in memory, AI generated attacks the game of cat and mouse has never shown any signs of abetment. Whilst most endpoint protection solutions still leverage signatures to help detect, protect and recover most solutions today will also be relying on multiple other methods of detection, all of which have been to plug gaps that traditional signature detection struggled to solve.
The first step to try and thwart signature based detection was polymorphism, attacks that were dynamically encrypted and decrypted on each compromised system. It introduced behavioral detection in many ways as this was how the decrypter loaders were identified, and once this had been done then traditional signature detections could be used on the encrypted attack.
It was however when attacks started to use vulnerabilities to gain access that true behavioral based detection really became a norm. At the time typically buffer overflow exploits were being used, and rather than keep kerning out signatures for exploits, the industry recognized that you could detect accurately based on the behavior of more memory being utilized than the process had requested (hence an overflow). This gave longevity of protection without the need for unique identification.
At the same time a silent battle occurring for endpoint vendors, the more signatures you added to the detection database the larger it gets, and today with hundreds of millions of threats detection signatures have moved from when I started fitting on one 360k floppy disk to being hundreds of megabytes in size, which for real time detection must be loaded into memory. As such the silent battle has been between the ongoing growth in compute power & memory space versus the amount traditional signatures require to function. There has always been a fine balance on how much impact anti-virus has on computing performance.
To combat this and speed up deployment of prevention controls; one of the key shifts in detection methodology became the use of multiple basic behavioral detection methods on the client to identify anything that looked suspicious, then sending the suspicious files & triggered behaviors to the cloud where much more compute power was available to run emulation tools against the file to validate if it was using known threat tactics and techniques and then join together the behaviors to better map against known attack families, whether it was McAfee GTI, Symantec GIN, Palo Alto Wildfire or another brand the cloud became the core hub of joining together behavioral detection techniques and validating new threat detections.
In today's language we would now call these now machine learning based detection techniques. And in the broader sense generative AI is sweeping across the world we get to what is for me a painful question considering my career, that is: Are signature detections no longer a front line of defense?
When moving from signatures to behavior based detections one other shift occurred. When you could put a name to an attack you could then go look up traditionally the attack in threat intel tools to understand what the attack might do, so you knew what else was required to recover from the attack. With behavioral based detections you often lose this ability to put a name to the attack. However in reality the volume of attacks and variance has become so great the ability for the industry to keep pace in terms of naming and identifying impact has long since become a lost battle. But EDR has really replaced this need in a more effective way. Whilst signatures told you what the attack should do, EDR tells you what it actually has done in your environment, as long as you can correlate this across all your business systems, you have a more accurate picture of what needs to be done to recover rather than what you might need to do to recover, which was the case with signatures.
Looking already at the evolution in generative AI we are seeing it become scalable, with Google’s PALM2 generative AI coming in differing sized versions that scale from on-device to in cloud. Generative AI models are also being trained to understand cybersecurity. Whilst when the cloud is available it makes sense to use it for the heaving lifting of behavioral threat detection, in today's reality there are too many instances where for numerous different reasons why the cloud either at times or all the time simply can’t be used. In such instances have we got to the tipping point that AI based behavioral detection can now do a better job than signatures?
There is an obvious yes answer when the connectivity to the cloud really is limited or non-existent, signatures rely on being able to be updated, whereas AI models once trained need far less tweaking/updates to be effective.
Over the last decade and this is the part that makes me feel like an old timer in the industry, I find myself saying more and more often that the techniques and tactics used by threat actors really hasn’t changed much for a long time now, yes they continue to evolve existing them and more often than not look for new opportunities be that technology platforms, markets or geographies to keep applying the same fundamentals successfully. At the same time the cyber security industry has got better at classifying these tactics and techniques, all of which if you think about it empower the use of AI by having better tag-able training data to build out new behavioral based detection algorithms.
Summary
Today endpoint protection has long since evolved from simple anti-virus based on a signature database. It uses multiple layers of detections that more and more are based on some level of machine based learning techniques. And most now combine (end point) protection (EPP) and Detection and response capabilities (EDR).
However at the heart, if you like the foundations of some solutions, are the last few legacy based signature engines (many vendors use the same OEM signature based engines). It's hard to let go of what’s helped protect us for three and a half decades, but for many years we have struggled with the concurrency of signatures and the overhead they bring to endpoint systems. The reality is that today the adversaries are using ML based tools and high degrees of automation that allow each attack to be unique, which renders the value of signature based detection limited.
Threat prevention today is a battle of:
With the focus on these factors its clear that behavioral detection better enables being left of bang, detection before the breach. And the scale and scope of such behavioral detection methods will only grow. The key shift now is how AI can help both in new detection methods and critically correlating together the outcomes of behavioral detections into more meaningful and actionable outcomes. And whilst the cloud gives us endless compute power to do this there is a clear race to miniaturize GenerativeAI, the same is occurring in how cyber security vendors leverage such capabilities for threat detection, prevention & response.