• Skip to primary navigation
  • Skip to main content
SRI logo
  • About
    • Press room
    • Our history
  • Expertise
    • Advanced imaging systems
    • Artificial intelligence
    • Biomedical R&D services
    • Biomedical sciences
    • Computer vision
    • Cyber & formal methods
    • Education and learning
    • Innovation strategy and policy
    • National security
    • Ocean & space
    • Quantum
    • Robotics, sensors & devices
    • Speech & natural language
    • Video test & measurement
  • Ventures
  • NSIC
  • Careers
  • Contact
  • 日本支社
Search
Close
Story April 3, 2023

SRI taps AI to hunt for linguistic DNA that proves authorship

With bad information running rampant, the need to assign authorship is essential. SRI is developing the tools to make it happen.

In an age where unreliable or deceptive information threatens public health, economic well-being, and, perhaps, even democracy itself, verifying who wrote a given piece of content has become more valuable than ever.

Against that backdrop, a team of experts in linguistics and artificial intelligence led by SRI International is developing new ways of examining written text for its “linguistic DNA” that can positively identify an author’s identity. The program is known as Signature.

Every writer has certain tendencies—ways of phrasing things, common grammatical or spelling mistakes, capitalization choices, and even ways of laying out their arguments. “These patterns are linguistic DNA—unmistakable and unchangeable,” said Dayne Freitag, the principal investigator for Signature and technical director of SRI’s Artificial Intelligence Center. “Aided by AI, we can spot these patterns to determine who the author of a given text is with near-absolute accuracy.”

Signature works in both directions, Freitag says. It can say affirmatively that the named author of a piece did indeed write the piece. And it can take an anonymous piece and identify who wrote it.

Next-generation tools

Signature’s ability to examine and evaluate patterns in an anonymous author’s written style, punctuation, phrasing, average word length and vocabulary size is part of a field of linguistic research known as stylometrics, Freitag says. But stylometrics can only go so far in a world of 140-character tweets.

Freitag and team will push Signature beyond stylometrics and recently received a multi-year contract from the Intelligence Advanced Research Projects Activity (IARPA) HIATUS program to make it a reality.

Freitag will lead a team that includes SRI’s Aaron Lawson, Prof. Chris Reed of the University of Dundee, Profs. Alan Ritter and Wei Xu at the Georgia Institute of Technology, Prof. Yulia Tsvetkov at the University of Washington, linguist Natalie Schilling at Georgetown University, Dr. Adam Bradley of Uncharted Software, and attribution expert Prof. Benno Stein, founder of the PAN conference.

The Signature team will capitalize on a new approach that probes an author’s argumentative habits in addition to stylistic features. It’s an area of research Freitag calls discourse analysis and is being done in collaboration with colleagues at the University of Dundee, a subcontractor on the IARPA contract.

Patterns revealed by discourse analysis are more habitual and less conscious than stylistic patterns and are, therefore, more reliable than stylometrics in attributing authorship. They are also harder to consciously manipulate or mask without harming the text’s meaning.

“To our knowledge, Signature is the first time these high-level discourse features have been used in author identification, and we have barely scratched the surface of what is possible,” Freitag said. In that respect, the team at the University of Dundee has already identified at least 700 rhetorical patterns, some dating to the time of Aristotle, that may benefit discourse analysis. While not all those patterns are computable, Freitag believes other as-yet-unexplored patterns could increase its feature set into the many thousands.

Decisive edge

As with many new avenues of AI, the more data one is given, the higher the probability of accuracy becomes, but Freitag and team are finding ways to make accurate predictions with shorter segments of text—and with less content to compare it with. In a world where the pool of potential authors numbers in the millions, these tools could prove decisive.

Such technologies could be a boon in law enforcement, in matters of copyright infringement and plagiarism, and privacy protection. On the privacy front, Freitag thinks that a project like Signature could lead to applications that identify an author’s tendencies and suggest changes to mask their identity, all while maintaining the original intent of the piece. On a more topical front, Signature might even be used to identify content “written” by generative AI apps, like ChatGPT, to help spot falsified essays in academic settings and on college entrance applications.

“In the age of social media and instantaneous, anonymous global communication, author attribution is more critical than ever,” Freitag says. “Potential applications for Signature are wide open.”

Share this

How can we help?

Once you hit send…

We’ll match your inquiry to the person who can best help you.

Expect a response within 48 hours.

Career call to action image

Make your own mark.

Search jobs

Our work

Case studies

Publications

Timeline of innovation

Areas of expertise

Institute

Leadership

Press room

Media inquiries

Compliance

Careers

Job listings

Contact

SRI Ventures

Our locations

Headquarters

333 Ravenswood Ave
Menlo Park, CA 94025 USA

+1 (650) 859-2000

Subscribe to our newsletter


日本支社
SRI International
  • Contact us
  • Privacy Policy
  • Cookies
  • DMCA
  • Copyright © 2023 SRI International
Manage Cookie Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Manage options Manage services Manage vendors Read more about these purposes
View preferences
{title} {title} {title}