EnCORE Workshop on Interpretability in Modern AI - Foundations, Methods, and Emerging Directions

πŸ“… Date: Feb 25-27, 2026 Β· πŸ“ Location: UC San Diego Campus (See below workshop schedule)


Objectitve

The goal of this EnCORE workshop is to bring together researchers working on different aspects of interpretability in modern AI systems to enable Trustworthy AI. We aim to discuss recent advancements, theoretical foundations, and emerging directions across topics such as automated interpretability methods, representation and concept-level analysis, next-generation interpretable model architectures, trustworthiness and limitations of explanations, and new tools for understanding both deep vision models and large language models.

More broadly, the workshop seeks to consider how interpretability supports reliability, safety, and effective human oversight in AI. It will also highlight the growing significance of interpretability for the theoretical computer science and data science communities, where questions related to model structure, guarantees, abstraction, and reasoning have become increasingly central. By bringing these perspectives together, the workshop aims to foster deeper dialogue and shape future research directions in the study of transparent and understandable AI systems.

Overview

Workshop Schedule

πŸ“ Note that the workshop location will change on Day 3:

Day 1 β€” Wednesday, Feb 25 (Atkinson Hall, 4F)
Time Session
8:00 – 8:45 πŸ₯ Breakfast
8:45 – 9:00 Opening Remark β€” Rajesh Gupta (Dean of School of Computing, Information and Data Sciences)
9:00 – 9:30 Overview β€” Lily Weng (UCSD)
9:30 – 10:15 Keynote 1 β€” Mikhail Belkin (UCSD), Feature Learning and the linear representation hypothesis ▢️
10:15 – 10:45 Invited Talks 1 β€” Tuomas Oikarinen (UCSD), Towards Automated Mechanistic Interpretability for Deep Neural Networks ▢️
10:45 – 11:15 Invited Talks 2 β€” Lesia Semenova (Rutgers), Interpretability as Personalized Alignment under the Rashomon Effect ▢️
11:15 – 11:45 Invited Talks 3 β€” Laura Ruis (MIT), Hidden Computations: Planning and Reasoning in the Forward Pass ▢️
12:00 – 2:00 🍽️ Lunch
2:00 – 2:45 Keynote 2 β€” Yan Liu (USC), Interpretability through the Lenses of Feature Interaction ▢️
2:45 – 3:15 Invited Talks 4 β€” Vatsal Sharan (USC), On the Hope and Challenge of Understanding LLM Mechanisms ▢️
3:15 – 3:45 Invited Talks 5 β€” Freda Shi (U Waterloo), Tracing the Mechanistic Emergence of Symbol Grounding in Multimodal Language Models ▢️
3:45 – 4:15 Invited Talks 6 β€” Leilani Gilpin (UCSC), Expanding Coverage and Optimality for Neuron Explanations ▢️
4:30 – 6:00 πŸ₯‚ Welcome Reception
Day 2 β€” Thursday, Feb 26 (Atkinson Hall, 4F)
Time Session
8:15 – 9:00 πŸ₯ Breakfast
9:00 – 9:45 Keynote 1 β€” Cho-Jui Hsieh (UCLA), Interpretable Learning with Large Language Models ▢️
9:45 – 10:15 Invited Talks 1 β€” Jeremias Sulam (JHU), Testing Semantic Importance via Betting ▢️
10:15 – 10:45 Invited Talks 2 β€” Eric Enouen (Cornell), Debugging and Steering Concept Bottleneck Models ▢️
10:45 – 11:15 Invited Talks 3 β€” Ge Yan (UCSD), Faithful Interpretation for Deep Networks via Human Understandable Concepts ▢️
11:15 – 11:45 Invited Talks 4 β€” Josh Engels (DeepMind), A Pragmatic Vision for Interpretability ▢️
12:00 – 2:00 🍽️ Lunch
2:00 – 2:45 Keynote 2 β€” Sameer Singh (UCI), Explaining in the Dark: Perils of Interpretability Without Training Data ▢️
2:45 – 3:45 Student Poster Lightning Talks ▢️
4:00 – 5:30 πŸ“Œ Poster Reception
Day 3 β€” Friday, Feb 27 (Jacobs Hall, 1F Qualcomm Conference)
Time Session
8:15 – 9:00 πŸ₯ Breakfast
9:00 – 9:45 Keynote 1 β€” Kai-Wei Chang (UCLA), Verbalized Representation Learning for Explainable Inference ▢️
9:45 – 10:30 Keynote 2 β€” Kate Saenko (Boston U / Meta), Segment Anything Model 3: A new tool for explainability in vision and language models ▢️
10:30 – 11:00 Invited Talks 1 β€” Robin Jia (USC), Are We Having the Wrong Dreams about Interpretability? ▢️
11:00 – 11:30 Invited Talks 2 β€” Deepti Ghadiyaram (Boston U), Inside the minds of generative models ▢️
11:30 – 12:00 Invited Talks 3 β€” Emanuel Moss (Intel AI), Social Dimensions of Trustworthy AI on the Factory Floor ▢️
12:00 – 12:30 Invited Talks 4 β€” Chung-En Sun (UCSD), Training Inherently Interpretable Large Language Models ▢️
12:30 – 2:00 🍽️ Lunch
2:00 – 2:45 Keynote 3 β€” Judy Hoffman (UCI), Building Trustworthy Foundations: Scaling Collaboration, Demonstrations, and Integration ▢️
2:45 – 3:10 Invited Talks 5 β€” Akshay Kulkarni (UCSD), Interpretable-by-Design Generative Models ▢️
3:10 – 3:35 Invited Talks 6 β€” Ali Arsanjani (Google), Compositional Interpretability in Agentic Architectures ▢️
3:35 – 4:00 Invited Talks 7 β€” Jiaxin Zhang (Salesforce AI), Building Reliable Long-Horizon Agents ▢️
4:00 – 4:25 Invited Talks 8 β€” Shruti Palaskar (Apple), VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety ▢️
4:25 – 4:50 Invited Talks 9 β€” Tz-Ying Wu (Intel AI), Interpretability as Factual Diagnosis for Video Captioning: Reference-Free Evaluation with Explanations ▢️
4:50 – 5:00 🎀 Concluding Remark

Invited Speakers & Participants

Yan Liu
Yan Liu
USC
Mikhail Belkin
Mikhail Belkin
UC San Diego
Kate Saenko
Kate Saenko
Boston U & FAIR
Sameer Singh
Sameer Singh
UC Irvine
Kai-Wei Chang
Kai-Wei Chang
UCLA
Cho-Jui Hsieh
Cho-Jui Hsieh
UCLA
Judy Hoffman
Judy Hoffman
UC Irvine
Vatsal Sharan
Vatsal Sharan
USC
Robin Jia
Robin Jia
USC
Leilani Gilpin
Leilani Gilpin
UC Santa Cruz
Deepti Ghadiyaram
Deepti Ghadiyaram
Boston U
Jeremias Sulam
Jeremias Sulam
Johns Hopkins U
Freda Shi
Freda Shi
U Waterloo
Lesia Semenova
Lesia Semenova
Rutgers
Nghia Hoang
Nghia Hoang
Washington State U
Laura Ruis
Laura Ruis
MIT
Tuomas Oikarinen
Tuomas Oikarinen
UCSD
Ge Yan
Ge Yan
UCSD
Chung-En Sun
Chung-En Sun
UCSD
Akshay Kulkarni
Akshay Kulkarni
UCSD
Eric Enouen
Eric Enouen
Cornell
Ali Arsanjani
Ali Arsanjani
Google
Tz-Ying (Gina) Wu
Tz-Ying (Gina) Wu
Intel AI Lab
Giuseppe Raffa
Giuseppe Raffa
Intel Labs
Lam Nguyen
Lam Nguyen
IBM Research
Josh Engels
Josh Engels
DeepMind
Jiaxin Zhang
Jiaxin Zhang
Salesforce AI Research
Emanuel Moss
Emanuel Moss
Intel Labs & UVA
Shruti Palaskar
Shruti Palaskar
Apple

Organizers

Lily Weng
Lily Weng
UC San Diego
Sanjoy Dasgupta
Sanjoy Dasgupta
UC San Diego

Contact

πŸ“§ Lily Weng (lweng@ucsd.edu), Sanjoy Dasgupta (sadasgupta@ucsd.edu)