Check Point Research today published a study into OpenAI’s ChatGPT and Codex, highlighting how threat actors can use these tools to produce malicious emails, code, and a full infection chain.

ChatGPT is an interface for OpenAI’s large language model (LLM), which is a prototype chatbot whose “purpose is to assist with a wide range of tasks and answer questions to the best of my ability.” And Codex is an artificial intelligence- (AI-) based system that translates natural language to code, according to Check Point and OpenAI.

“Like any technology, ChatGPT’s increased popularity also carries increased risk,” Check Point Researchers noted, adding that examples of ChatGPT-generated malicious code or dialogues can be easily found on Twitter.

To illustrate this point, the research team created a full infection flow that includes a phishing email and a malicious Excel file weaponized with macros that downloads a reverse shell.

The team asked ChatGPT to write a phishing email that appears to come from a fictional web-hosting service — Host4u. Despite OpenAI noting that this content may violate its content policy, it still provided an answer.

After further integration to clarify the requirement, ChatGPT produced an “excellent” phishing email. Researchers used that email to create the malicious VBA code in the Excel document.

“ChatGPT proved that given good textual prompts, it can give you working malicious code,” the Check Point team noted.

Additionally, they tested Codex’s ability to produce a basic reverse shell using a placeholder IP and port, along with scanning and mitigation tools.

“Once again, our AI buddies come through for us,” Check Point Research noted. “The infection flow is complete. We created a phishing email with an attached Excel document that contains malicious VBA code that downloads a reverse shell to the target machine.”

“The hard work was done by the AIs, and all that’s left for us to do is to execute the attack,” they added.

The study also demonstrated that those AI tools can augment defenders. Researchers used Codex to generate two Python functions: searching for URLs inside files using the YARA package and VirusTotal for the number of detections of a specific hash.

“We hope to spark the imagination of blue teamers and threat hunters to use the new LLMs to automate and improve their work,” they wrote.

The research team wanted to underscore the importance of vigilance when using developing AI technologies that can change the cyberthreat landscape significantly.

“This is just an elementary showcase of the impact of AI research on cybersecurity. Multiple scripts can be generated easily with slight variations using different wordings. Complicated attack processes can also be automated as well using the LLMs APIs to generate other malicious artifacts,” they wrote.

“Defenders and threat hunters should be vigilant and cautious about adopting this technology quickly, otherwise our community will be one step behind the attackers,” Check Point Research warned.