Sikka brings expertise in deep learning and multi-model learning to improve how social media operates, including how we use and communicate on it.
Karan Sikka is a computer scientist at SRI’s Center for Vision Technologies, where he focuses his expertise in vision, language, and behavior-understanding to improve social media algorithms, data, and the way users communicate. Here, Sikka discusses the value of collaboration and support to create innovative technologies:
I came to SRI as an intern six months before I finished my PhD in 2015 at the University of California San Diego. I heard of SRI through other students who had internships here and had their names in published papers—and thought: “SRI hires good people who do interesting work—I want to be there.”
Joining as an intern greatly influenced the focus for my career, both in what I wanted to do, and, knowing I wanted to be at SRI.
Early in my career, I specialized in facial emotion recognition and video behavior understanding; after joining SRI full-time, I remained broadly focused on vision and language but my main research involves social media, which wasn’t something I had a lot of experience in originally. Deep learning and multimodal learning were just starting to catch fire as important subjects at the time, so I was lucky to start with at least three programs that allowed me to do a lot of research.
The primary process we developed was multimodal embedding, where the key idea was to learn vector-representations for content and users that are in the same space, such that similar users and content are closer and vice-versa.
Our primary aim was to understand what users are interested in by having it review a large amount of data, without requiring humans to label the data.
We wanted to know: If a certain kind of post is sent to such-and-such user, will they be interested in it? We also wanted to be able to do this in many different ways, not just images or text or user-types but in combinations of all three, which is the multimodal aspect of the process. On the way to our goal, we ended up developing a lot of intellectual property and were able to publish a lot of other papers in addition to creating the embedding process.
The other big work I’ve been focusing on are based in algorithms and misinformation in media. DARPA-funded projects Civil Sanctuary and MIPS (Modeling Influence Pathways) are using a combination of AI and large language modeling to identify misinformation and messaging within and across platforms—like social media.
The projects have the potential to dramatically improve how we interact on these platforms (preventing hate speech) and the information we receive (preventing misinformation), and for me to be behind the technologies doing this is a big privilege.
Since I started full time at SRI, I’ve been lucky to work with amazing interns and have support from my manager, which is so vital with research in these areas, where you don’t quite know where the work will take you.
One of the things I’ve really enjoyed about working at SRI is that we can bring in these amazing interns. At our center, they are hands-on colleagues, who contribute to our work in behavior modeling, multi-model learning, computer vision, navigation, AI, and machine learning, where they can have their name published, which is important for their career growth. I continue to be driven by the collaborative environment here at the Center and the hands-on and impactful work we do.