LLM risks from A to Z

Code reflected in a man's eyeglasses

A new paper from SRI and Brazil’s Instituto Eldorado delivers a comprehensive update on the security risks to large language models.


“I get a new pre-print paper about AI-related security risks in my inbox almost every day,” says SRI advanced computer scientist Briland Hitaj.

While that might seem like a good thing, it has its drawbacks. For researchers working on AI security, the danger of information overload is very real. And it’s not just a problem for researchers — it’s also problem for information security teams in organizations and governments. Security professionals are looking to the research community for both updates on emerging threats and data-driven analysis of how those threats might be disrupted or contained. A muddled information space makes their jobs that much harder.

To confront this information overload, researchers at SRI and Brazil’s Instituto Eldorado decided to collaborate on a paper that would provide the global cybersecurity community with a comprehensive analysis of every potential cyber risk that surrounds today’s large language models (LLMs).

“We wanted to make sense of all of that noise,” comments Hitaj.

The result is a timely paper analyzing more than 25 distinct threats that researchers and cybersecurity teams need to consider in order to secure LLM-related workflows.

The state of LLM risks

To understand the current state of risks around LLMs, SRI and Instituto Eldorado spent more than a year examining more than a thousand papers that captured relevant risks, ultimately down-selecting to the 300-or-so papers that represented the highest-quality scholarly work on those risks.

“We went really deep,” explains Instituto Eldorado researcher Vitor Hugo Galhardo Moia, “looking at the entire training-to-deployment pipeline. We wanted to identify and understand attacks and threats on all the different components of the pipeline and how distinct LLM use cases are affected.”

“LLMs provide this natural language interface where the right prompt can become a back door to more sophisticated, more complicated and sensitive systems within a network.” — Briland Hitaj

That meant looking at more than just the large language models themselves. The researchers considered the various software applications, data storage practices, and human actions that might compromise the output of LLMs. These threats range from data poisoning and various kinds of jailbreaking to strategies like “time consuming” and “token wasting,” which don’t necessarily impact the outputs of the model, but can exert a drain on the system, resulting in slow performance, inefficient energy use, and even outright service disruption.

All told, the team identified more than 25 threat vectors, providing an overall risk score for each vector. The team also documented nearly 50 classes of mitigation techniques, and mapped attack strategies with corresponding mitigation techniques.

How the paper advances AI security

The researchers at SRI and Instituto Eldorado see the paper as more than an academic exercise. The aim was to create a pragmatic resource for security practitioners who need some guidance in finding the best papers on AI-related risks. All of these individuals, the authors observe, are getting bombarded daily by research articles that may or may not reflect high-quality work.

“One of our major contributions,” says Ulf Lindqvist, senior technical director at SRI, “is making a conscious effort to curate the very best research currently available. If you want to accelerate your journey into AI security and AI red-teaming, you will know where to start and what to read.”

Another high-level takeaway from the paper is the growing recognition that the improving capabilities of LLMs themselves can, paradoxically, amplify the risks to LLMs.

“LLMs provide this natural language interface where the right prompt can become a back door to more sophisticated, more complicated and sensitive systems within a network,” Hitaj points out.

An early example, he points out, is the directive to “ignore all previous instructions,” a method that bad actors quickly discovered could cause LLMs to misbehave. As these tactics became more sophisticated, stronger security and privacy attacks like “membership inference” attacks were developed. These were shown to force LLMs to reveal the data that was used in their training, including sensitive data that can pose major privacy risks.

The biggest unknown, Hitaj observes, is that we simply can’t predict what the next natural language attack might look like.

“There’s always that next prompt,” he warns. “That next smart way to bypass safeguards. We’ve come a long way since those early natural language attacks, but that doesn’t mean that the problem is solved. This problem is very much still open. And it turns out that the more the model learns, the more it may become willing to reveal information. For an adversary, it just becomes a matter of patience, and how crafty they can be.”

“AI security must be at the core of technological development,” adds Mateus Pierre, R&D director at Instituto Eldorado. “With this work, we aim to support the community and our partners in creating and protecting generative AI solutions that combine power and reliability.”

Read the paper or learn more about SRI’s security-related innovations.


Read more from SRI