Generative AI & data: Potential in cybersecurity if the risks can be curtailed

Artificial intelligence (AI) in 2023 feels a bit like déjà vu to me. Back in 2001, as I was just entering the venture industry, I remember the typical VC reaction to a start-up pitch was, “Can’t Microsoft replicate your product with 20 people and a few months of effort, given the resources they have?” Today, any time a new company is pitching its product that uses AI to do ‘X,’ the VC industry asks, “Can’t ChatGPT do that?”

Twenty-two years later, Microsoft is at the table once again. This time they’re making a $13 billion bet by partnering with OpenAI and bringing to market new products like Security Copilot to make sense of the threat landscape using the recently launched text-generating GPT-4 (more on that below). But just as Microsoft did not inhibit the success of thousands of software start-ups in the early 2000s, I do not expect Microsoft or any vendor to own this new AI-enabled market. 

However, the market explosion and hype around AI across the business and investment spectrum over the past few months has led people to ask: what are we to make of it all? And more specifically, how do CIOs, CSOs, and cybersecurity teams learn to deal with technology that may pose serious security and privacy risks?

The good, the bad, and the scary

I look at the good, the bad, and the scary of this recent Microsoft announcement. What’s incredible about ChatGPT and its offspring is that it brings an accessible level of functionality to the masses. It’s versatile, easy to use, and usually produces solid results.

Traditionally, organizations have needed sophisticated, trained analysts to sort through, analyze, and run processes for their security data. This required knowledge of particular query languages and configurations relevant to each product, like Splunk, Elastic, Palo Alto/Demisto, and QRadar. It was a difficult task, and the available talent pool was never enough.   

That difficulty in SIEM (Security Information and Event Management) and SOAR (Security Orchestration, Automation, and Response) still exists today. SIEM helps enterprises collect and analyze security-related data from servers, applications, and network devices. The data is analyzed to identify potential security threats, alert security teams to suspicious activity, and provide insights into a company’s security defenses. SIEM systems typically use advanced analytics to identify patterns, anomalies, and other indicators of potential threats.

SOAR builds on SIM capabilities by automating security workflows and helping businesses respond more quickly and efficiently to security incidents. SOAR platforms can integrate with various security products, including enterprise firewalls, intrusion detection systems, and vulnerability scanners. SIEM/SOAR is where you orchestrate action for an incident response plan. Using those actions helps in the remediation process. Managing the process and products involved in remediation is difficult.

Now, Microsoft is putting a stake in the ground with its generative AI Security Copilot tool. With Security Copilot, the tech company is looking to boost the capability of its data security products for deep integrated analysis and responses. By integrating GPT-4 into Security Copilot, Microsoft hopes to work with companies to

more easily identify malicious activity;

summarize and make sense of threat intelligence;

gather data on various attack incidents by prioritizing the type and level of incidents; and

recommend to clients how to remove and remediate diverse threats in real-time.

And guess what? Theoretically, it should be easier to sort through all that data using GPT APIs and other tools or figure out how to leverage these on incident data. These systems should also make more automated response and orchestration much simpler.

Overall, the emergence of GPT-4 may be a step towards the industry’s dream of “Moneyball for cyber,” allowing for a more robust defensive posture by leveraging the experience and wisdom of the crowds. And it will allow for a stronger defense of smaller organizations that do not have sufficient resources and expertise today.

It’s all about trust

However, there are still significant obstacles to overcome regarding adoption and trust. First and foremost, there is still reluctance among many organizations to share their incident data with others, even if de-identified, as it could potentially lead to leaked information, bad press, and brand damage. Sharing has been talked about for years, but is rarely done in a systematic, or technology-delivered manner for these reasons. The best sharing practices followed today are industry CISOs talking amongst their tight peer group when something significant occurs. Thus, given the reluctance to share in any meaningful way previously, I suspect that the industry will take a long time to put their data in this or any third-party platform for fear that it exposes them in some way.

Another hurdle is overcoming hesitancy about privacy and security concerns. Microsoft claims that integrating data into its systems will maintain privacy and security. Security Copilot will not train on nor learn from their customers’ incident or vulnerability data. However, without full transparency, the market will have lingering doubts. Users may fear that attackers may use the same GPT-based platform to develop attacks that target the vulnerabilities in their systems that it has become aware of, no matter what the ELA states to the contrary.  Wouldn’t an attacker love to ask, “Write an exploit that allows me to navigate the defenses at Corporation X?”

There is also a question about how the system can learn from the newest attacks if it is not training on the data from customer organizations. The system would be more powerful if it did learn in the wild from customer incident and vulnerability data.

Even without specific details learned from any one customer, assuming full transparency on security and privacy is guaranteed, given the wide aperture of knowledge that can be obtained from other public and non-public sources, won’t this AI-based system become an adversary’s favorite exploit development tool?

Given all of this, there are potential risks and rewards involved in using ChatGPT in cybersecurity.

Microsoft has major ambitions for Security Copilot. It’s a tall order to fill, and I hope they get it right for everyone’s sake.

Know the potential consequences

GPT-4 under Microsoft auspices might be a great tool if it figures out ways to cut off all that potentially harmful activity. If it can train the system to focus on the positive and do it so that proprietary internal data is not compromised, it would be a potent tool for mainstream analysis of security incidents and security. To date, this has only been done with very sophisticated, high-priced people and complex systems that cater to the higher end of the market.

But suppose the mid-tier companies, who can’t afford top-quality cybersecurity resources or the best data security teams, choose to open up their data to Microsoft and GPT-4? In that case, I just hope they know there may be possible side effects. Caveat emptor!

Artificial Intelligence, Data and Information Security, Security