Among the other attacks created by Bargury is a demonstration of how a hacker—who, again, must already have hijacked an email account—can gain access to sensitive information, such as people’s salaries, without triggering Microsoft’s protections for sensitive files. When asking for the data, Bargury’s prompt demands the system does not provide references to the files data is taken from. “A bit of bullying does help,” Bargury says.

In other instances, he shows how an attacker—who doesn’t have access to email accounts but poisons the AI’s database by sending it a malicious email—can manipulate answers about banking information to provide their own bank details. “Every time you give AI access to data, that is a way for an attacker to get in,” Bargury says.

Another demo shows how an external hacker could get some limited information about whether an upcoming company earnings call will be good or bad, while the final instance, Bargury says, turns Copilot into a “malicious insider” by providing users with links to phishing websites.

Phillip Misner, head of AI incident detection and response at Microsoft, says the company appreciates Bargury identifying the vulnerability and says it has been working with him to assess the findings. “The risks of post-compromise abuse of AI are similar to other post-compromise techniques,” Misner says. “Security prevention and monitoring across environments and identities help mitigate or stop such behaviors.”

As generative AI systems, such as OpenAI’s ChatGPT, Microsoft’s Copilot, and Google’s Gemini, have developed in the past two years, they’ve moved onto a trajectory where they may eventually be completing tasks for people, like booking meetings or online shopping. However, security researchers have consistently highlighted that allowing external data into AI systems, such as through emails or accessing content from websites, creates security risks through indirect prompt injection and poisoning attacks.

“I think it’s not that well understood how much more effective an attacker can actually become now,” says Johann Rehberger, a security researcher and red team director, who has extensively demonstrated security weaknesses in AI systems. “What we have to be worried [about] now is actually what is the LLM producing and sending out to the user.”

Bargury says Microsoft has put a lot of effort into protecting its Copilot system from prompt injection attacks, but he says he found ways to exploit it by unraveling how the system is built. This included extracting the internal system prompt, he says, and working out how it can access enterprise resources and the techniques it uses to do so. “You talk to Copilot and it’s a limited conversation, because Microsoft has put a lot of controls,” he says. “But once you use a few magic words, it opens up and you can do whatever you want.”

Rehberger broadly warns that some data issues are linked to the long-standing problem of companies allowing too many employees access to files and not properly setting access permissions across their organizations. “Now imagine you put Copilot on top of that problem,” Rehberger says. He says he has used AI systems to search for common passwords, such as Password123, and it has returned results from within companies.

Both Rehberger and Bargury say there needs to be more focus on monitoring what an AI produces and sends out to a user. “The risk is about how AI interacts with your environment, how it interacts with your data, how it performs operations on your behalf,” Bargury says. “You need to figure out what the AI agent does on a user’s behalf. And does that make sense with what the user actually asked for.”

Shares:

Leave a Reply

Your email address will not be published. Required fields are marked *