SRI researchers tap Large Language Models to improve password security

Take security seriously by using passphrases and two-factor authentication.

Using advanced AI to analyze billions of leaked passwords, SRI researchers reveal the risky human tendencies that make passwords far too easy to guess.

If you think that clever-yet-memorable password you just thought up for your bank accounts is fool-proof, SRI’s Briland Hitaj has a word of warning: Think again.

Hitaj is an Advanced Computer Scientist at SRI’s Computer Science Laboratory and a cybersecurity expert. Throughout several peer-reviewed studies, he has used emerging artificial intelligence methods — like the large language modeling (LLM) used to create ChatGPT — to crack a slew of human-generated passwords.

In one recent paper, Hitaj and his collaborators Javier Rando and Fernando Perez-Cruz from ETH Zurich and Swiss Data Science Center respectively, introduced PassGPT, which uses LLMs to reveal the all-too-human patterns people employ when creating passwords. Despite the password creators’ best intentions, these patterns actually make their passwords easier to guess.

“PassGPT was able to guess twice as many previously unseen passwords as any prior model,” Hitaj notes.

The adversary’s adversary

Hitaj imagines that an application like PassGPT can raise awareness of the concerns over password security more broadly, and also be used to develop new strength estimators and randomizers that help everyday users improve their passwords.

“AI reveals the weaknesses of human-generated passwords and can be used to warn people of weak passwords and to suggest better, stronger passwords,” he says.

He knows from his work for instance, that most people create passwords from other words that are important to them, a pet’s name or loved one, often substituting lookalike characters for standard letters and numbers — 4’s for A’s, zeros for O’s, 3’s for E’s and so forth.

“We’ve all heard the story of people who use ‘password123456’ as their password — which I don’t recommend, by the way — but if you think making your password ‘pa5swØrd’ is more secure, we should talk,” Hitaj says.

What’s more troubling than weak passwords, Hitaj says, is that many people will use their passwords for multiple accounts, further increasing the reward for bad actors to crack them. Once a cybercriminal knows a single password, they can gain access into all of that user’s important personal information and financial accounts.

“We’ve all heard the story of people who use ‘password123456’ as their password — which I don’t recommend, by the way — but if you think making your password ‘pa5swØrd’ is more secure, we should talk.” – Briland Hitaj

A light on the dark web

Hitaj notes that hackers have stolen vast databases of passwords and posted them on the dark web for other hackers to exploit. In fact, Hitaj himself used those very leaked databases as the basis for PassGPT’s analysis.

One of the largest password leaks yet, known as RockYou, included 14 million passwords. Hitaj’s analysis of the RockYou database revealed, for instance, that variations of “iloveyou” are among the most common passwords — ilovetyler4ever, ilovematt4eva, ilovehotmail, iloveyousomuch among them. Hitaj notes with chagrin that the suffixes “4ever” and “4eva” are particularly common variations.

Another of Hitaj’s AI-based algorithms is PassGAN, which uses a generative adversarial network (GAN) learning to autonomously analyze password leaks and generate high-quality password guesses based on the patterns AI has revealed.

“With PassGAN we trained a neural network to learn typical password characteristics and structures — the patterns people use to create passwords — and create guesses using those rules,” Hitaj explains. Often, those guesses turn out to be right.

Secure advice

As for advice on creating stronger passwords, Hitaj says, you may want to be cautious when trust estimators on a website tell you whether your password is “strong” or “weak.” AI-based password-guessing strategies, like PassGAN and PassGPT, are constantly pushing these boundaries.

He also recommends using passphrases instead of passwords. Unlike passwords of eight or 10 characters, passphrases are 20, 30, or more characters in length. They are memorable and much harder to guess. Better yet, he says, use the computer-generated randomizers that accompany most browsers today.

“If I have any advice, it is to randomize as much as possible. These passwords may not be memorable, but they will keep your personal data and your money safe,” Hitaj says, adding: “And, for your most important data — bank accounts, retirement savings, medical records, tax accounts, email — please, please set up two-factor authentication.”

Read more from SRI