EnCORE Workshop on Interpretability in Modern AI - Foundations, Methods, and Emerging Directions

📅 Date: Feb 25-27, 2026 · 📍 Location: UC San Diego Campus (See below workshop schedule)

Objectitve

The goal of this EnCORE workshop is to bring together researchers working on different aspects of interpretability in modern AI systems to enable Trustworthy AI. We aim to discuss recent advancements, theoretical foundations, and emerging directions across topics such as automated interpretability methods, representation and concept-level analysis, next-generation interpretable model architectures, trustworthiness and limitations of explanations, and new tools for understanding both deep vision models and large language models.

More broadly, the workshop seeks to consider how interpretability supports reliability, safety, and effective human oversight in AI. It will also highlight the growing significance of interpretability for the theoretical computer science and data science communities, where questions related to model structure, guarantees, abstraction, and reasoning have become increasingly central. By bringing these perspectives together, the workshop aims to foster deeper dialogue and shape future research directions in the study of transparent and understandable AI systems.

Workshop Schedule

📍 Note that the workshop location will change on Day 3:

Day 1 & 2 location: Atkinson Hall’s 4F EnCORE Room
Day 3 location: Jacobs Hall’s 1F Qualcomm Conference Center, Map

Day 1 — Wednesday, Feb 25 (Atkinson Hall, 4F)

Time	Session
8:00 – 8:45	🥐 Breakfast
8:45 – 9:00	Opening Remark — Rajesh Gupta (Dean of School of Computing, Information and Data Sciences)
9:00 – 9:30	Overview — Lily Weng (UCSD)
9:30 – 10:15	Keynote 1 — Mikhail Belkin (UCSD), Feature Learning and the linear representation hypothesis ▶️
10:15 – 10:45	Invited Talks 1 — Tuomas Oikarinen (UCSD), Towards Automated Mechanistic Interpretability for Deep Neural Networks ▶️
10:45 – 11:15	Invited Talks 2 — Lesia Semenova (Rutgers), Interpretability as Personalized Alignment under the Rashomon Effect ▶️
11:15 – 11:45	Invited Talks 3 — Laura Ruis (MIT), Hidden Computations: Planning and Reasoning in the Forward Pass ▶️
12:00 – 2:00	🍽️ Lunch
2:00 – 2:45	Keynote 2 — Yan Liu (USC), Interpretability through the Lenses of Feature Interaction ▶️
2:45 – 3:15	Invited Talks 4 — Vatsal Sharan (USC), On the Hope and Challenge of Understanding LLM Mechanisms ▶️
3:15 – 3:45	Invited Talks 5 — Freda Shi (U Waterloo), Tracing the Mechanistic Emergence of Symbol Grounding in Multimodal Language Models ▶️
3:45 – 4:15	Invited Talks 6 — Leilani Gilpin (UCSC), Expanding Coverage and Optimality for Neuron Explanations ▶️
4:30 – 6:00	🥂 Welcome Reception

Day 2 — Thursday, Feb 26 (Atkinson Hall, 4F)

Time	Session
8:15 – 9:00	🥐 Breakfast
9:00 – 9:45	Keynote 1 — Cho-Jui Hsieh (UCLA), Interpretable Learning with Large Language Models ▶️
9:45 – 10:15	Invited Talks 1 — Jeremias Sulam (JHU), Testing Semantic Importance via Betting ▶️
10:15 – 10:45	Invited Talks 2 — Eric Enouen (Cornell), Debugging and Steering Concept Bottleneck Models ▶️
10:45 – 11:15	Invited Talks 3 — Ge Yan (UCSD), Faithful Interpretation for Deep Networks via Human Understandable Concepts ▶️
11:15 – 11:45	Invited Talks 4 — Josh Engels (DeepMind), A Pragmatic Vision for Interpretability ▶️
12:00 – 2:00	🍽️ Lunch
2:00 – 2:45	Keynote 2 — Sameer Singh (UCI), Explaining in the Dark: Perils of Interpretability Without Training Data ▶️
2:45 – 3:45	Student Poster Lightning Talks ▶️
4:00 – 5:30	📌 Poster Reception

Day 3 — Friday, Feb 27 (Jacobs Hall, 1F Qualcomm Conference)

Time	Session
8:15 – 9:00	🥐 Breakfast
9:00 – 9:45	Keynote 1 — Kai-Wei Chang (UCLA), Verbalized Representation Learning for Explainable Inference ▶️
9:45 – 10:30	Keynote 2 — Kate Saenko (Boston U / Meta), Segment Anything Model 3: A new tool for explainability in vision and language models ▶️
10:30 – 11:00	Invited Talks 1 — Robin Jia (USC), Are We Having the Wrong Dreams about Interpretability? ▶️
11:00 – 11:30	Invited Talks 2 — Deepti Ghadiyaram (Boston U), Inside the minds of generative models ▶️
11:30 – 12:00	Invited Talks 3 — Emanuel Moss (Intel AI), Social Dimensions of Trustworthy AI on the Factory Floor ▶️
12:00 – 12:30	Invited Talks 4 — Chung-En Sun (UCSD), Training Inherently Interpretable Large Language Models ▶️
12:30 – 2:00	🍽️ Lunch
2:00 – 2:45	Keynote 3 — Judy Hoffman (UCI), Building Trustworthy Foundations: Scaling Collaboration, Demonstrations, and Integration ▶️
2:45 – 3:10	Invited Talks 5 — Akshay Kulkarni (UCSD), Interpretable-by-Design Generative Models ▶️
3:10 – 3:35	Invited Talks 6 — Ali Arsanjani (Google), Compositional Interpretability in Agentic Architectures ▶️
3:35 – 4:00	Invited Talks 7 — Jiaxin Zhang (Salesforce AI), Building Reliable Long-Horizon Agents ▶️
4:00 – 4:25	Invited Talks 8 — Shruti Palaskar (Apple), VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety ▶️
4:25 – 4:50	Invited Talks 9 — Tz-Ying Wu (Intel AI), Interpretability as Factual Diagnosis for Video Captioning: Reference-Free Evaluation with Explanations ▶️
4:50 – 5:00	🎤 Concluding Remark