With funding from DARPA, researchers are building an AI technology designed to work alongside humans to promote more prosocial online behavior
Social media has connected billions of people around the world, but it has also become a hotbed of hostility and even hate speech. The phenomenon has raised an increasingly urgent question: How do you maintain the positive connections and flow of accurate information among participants while minimizing the significant harms that can come from antisocial behavior? Now, researchers at SRI International are taking on this challenge.
Working alongside scientists at the University of Pittsburgh and funded by the Defense Advanced Research Projects Agency (DARPA), they’re building what is called the Great Facilitator (TGF), an AI technology designed to work alongside humans to generate constructive dialogue among social media participants. “We don’t want to prevent people from expressing what they’re thinking, but we also do not want outright hate speech,” said Karan Sikka, senior computer scientist under SRI’s Center for Vision Technologies and PI for the project. “However, there’s a gray area, and we don’t want to mitigate expression excessively. So, how do you balance free speech versus hateful speech? A pragmatics-based approach became our differentiator.”
The Great Facilitator system comprises two key technologies: first, an automatic assessment of the intention and semantics of a social media post or group conversation; and second, an automatic generation of moderating content based on that assessment.
At the heart of TGF is top-down machine learning that learns and ultimately suggests constructive dialogue based on input from human moderators who make decisions about appropriate content. Content is deemed appropriate based on platform rules and guidelines for acceptable behavior. For example, a given social media technology may have norms that include prohibiting participants from asking others for money, advertising for products or engaging in hate speech or bullying. If a post triggers one of the two levels of restrictions—common civility (like offensive language or outright hate speech) and community guidelines—TGF generates a report for the user explaining why the post is outside the platform guidelines. Users are encouraged to modify the post themselves or use the TGF suggestions, which gives details about how to change the content so that it does not include hate speech, misinformation, bullying or any other content deemed unfit. For example, TGF may suggest, maybe you could replace your post with (an example).
A critical feature of TGF is the dynamic interaction with the person behind the post. “TGF encourages users to get a thread-wide picture of the discussion and find a way forward that is constructive and conformant to the platform guidelines,” said Sikka.
“TGF is a big step up in actively intervening on social networks to maintain civility,” said Ajay Divakaran, Senior Technical Director of Vision and Learning under the Center for Vision Technologies. “The technology is part of a multi-pronged social media analytics and applications R&D effort at SRI spanning over a decade. We started out by jointly analyzing content posted and the connections between users then applied a framework that projected users’ video, audio, text and still imagery to a common geometric space.”
Preliminary results of The Great Facilitator
TGF proof of concept tests were ran on the social media platform Reddit, a host to thousands of active communities. Sikka and his team created a subreddit—a forum dedicated to a specific topic on the site—to conduct studies on real user interactions. The subreddit allowed the team to measure if users took into account suggested paraphrased content or if they developed personal paraphrases. Successful interactions included posts that altered original phrasing or new, improved phrases as well as the frequency of changes based on suggested phrasing. The preliminary evidence shows that gently engaging people successfully encouraged a more civil tone in their communication. “This result is great,” said Sikka. “We hope that if people communicate civilly, they communicate more deeply with each other.”
The next step, he said, is to beta test the system and test it against a variety of social platforms to maximize transparency and user control in an efficient, streamlined way—while also reducing the load on human moderators. To this end, Sikka and his team hope to make the technology an intrinsic part of web browsers, which he expects will help to propagate the technology, improve transparency and help with user accessibility and usability. Direct browser integration could provide a standardized filter not just for social platforms, but in comment sections for digital publications, public forums or in online reviews to help facilitate more thoughtful and respectful online communities.
This material is based upon the work supported by the Defense Advanced Research Projects Agency (DARPA) under Agreement No. HR00112290024. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the position or policy of the U.S. Air Force Research Lab, DARPA and DoD and no official endorsement should be inferred.