The goal of this EnCORE workshop is to bring together researchers working on different aspects of interpretability in modern AI systems to enable Trustworthy AI. We aim to discuss recent advancements, theoretical foundations, and emerging directions across topics such as automated interpretability methods, representation and concept-level analysis, next-generation interpretable model architectures, trustworthiness and limitations of explanations, and new tools for understanding both deep vision models and large language models.
More broadly, the workshop seeks to consider how interpretability supports reliability, safety, and effective human oversight in AI. It will also highlight the growing significance of interpretability for the theoretical computer science and data science communities, where questions related to model structure, guarantees, abstraction, and reasoning have become increasingly central. By bringing these perspectives together, the workshop aims to foster deeper dialogue and shape future research directions in the study of transparent and understandable AI systems.
π Note that the workshop location will change on Day 3:
| Time | Session |
|---|---|
| 8:00 β 8:45 | π₯ Breakfast |
| 8:45 β 9:00 | Opening Remark β Rajesh Gupta (Dean of School of Computing, Information and Data Sciences) |
| 9:00 β 9:30 | Overview β Lily Weng (UCSD) |
| 9:30 β 10:15 | Keynote 1 β Mikhail Belkin (UCSD), Feature Learning and the linear representation hypothesis βΆοΈ |
| 10:15 β 10:45 | Invited Talks 1 β Tuomas Oikarinen (UCSD), Towards Automated Mechanistic Interpretability for Deep Neural Networks βΆοΈ |
| 10:45 β 11:15 | Invited Talks 2 β Lesia Semenova (Rutgers), Interpretability as Personalized Alignment under the Rashomon Effect βΆοΈ |
| 11:15 β 11:45 | Invited Talks 3 β Laura Ruis (MIT), Hidden Computations: Planning and Reasoning in the Forward Pass βΆοΈ |
| 12:00 β 2:00 | π½οΈ Lunch |
| 2:00 β 2:45 | Keynote 2 β Yan Liu (USC), Interpretability through the Lenses of Feature Interaction βΆοΈ |
| 2:45 β 3:15 | Invited Talks 4 β Vatsal Sharan (USC), On the Hope and Challenge of Understanding LLM Mechanisms βΆοΈ |
| 3:15 β 3:45 | Invited Talks 5 β Freda Shi (U Waterloo), Tracing the Mechanistic Emergence of Symbol Grounding in Multimodal Language Models βΆοΈ |
| 3:45 β 4:15 | Invited Talks 6 β Leilani Gilpin (UCSC), Expanding Coverage and Optimality for Neuron Explanations βΆοΈ |
| 4:30 β 6:00 | π₯ Welcome Reception |
| Time | Session |
|---|---|
| 8:15 β 9:00 | π₯ Breakfast |
| 9:00 β 9:45 | Keynote 1 β Cho-Jui Hsieh (UCLA), Interpretable Learning with Large Language Models βΆοΈ |
| 9:45 β 10:15 | Invited Talks 1 β Jeremias Sulam (JHU), Testing Semantic Importance via Betting βΆοΈ |
| 10:15 β 10:45 | Invited Talks 2 β Eric Enouen (Cornell), Debugging and Steering Concept Bottleneck Models βΆοΈ |
| 10:45 β 11:15 | Invited Talks 3 β Ge Yan (UCSD), Faithful Interpretation for Deep Networks via Human Understandable Concepts βΆοΈ |
| 11:15 β 11:45 | Invited Talks 4 β Josh Engels (DeepMind), A Pragmatic Vision for Interpretability βΆοΈ |
| 12:00 β 2:00 | π½οΈ Lunch |
| 2:00 β 2:45 | Keynote 2 β Sameer Singh (UCI), Explaining in the Dark: Perils of Interpretability Without Training Data βΆοΈ |
| 2:45 β 3:45 | Student Poster Lightning Talks βΆοΈ |
| 4:00 β 5:30 | π Poster Reception |
| Time | Session |
|---|---|
| 8:15 β 9:00 | π₯ Breakfast |
| 9:00 β 9:45 | Keynote 1 β Kai-Wei Chang (UCLA), Verbalized Representation Learning for Explainable Inference βΆοΈ |
| 9:45 β 10:30 | Keynote 2 β Kate Saenko (Boston U / Meta), Segment Anything Model 3: A new tool for explainability in vision and language models βΆοΈ |
| 10:30 β 11:00 | Invited Talks 1 β Robin Jia (USC), Are We Having the Wrong Dreams about Interpretability? βΆοΈ |
| 11:00 β 11:30 | Invited Talks 2 β Deepti Ghadiyaram (Boston U), Inside the minds of generative models βΆοΈ |
| 11:30 β 12:00 | Invited Talks 3 β Emanuel Moss (Intel AI), Social Dimensions of Trustworthy AI on the Factory Floor βΆοΈ |
| 12:00 β 12:30 | Invited Talks 4 β Chung-En Sun (UCSD), Training Inherently Interpretable Large Language Models βΆοΈ |
| 12:30 β 2:00 | π½οΈ Lunch |
| 2:00 β 2:45 | Keynote 3 β Judy Hoffman (UCI), Building Trustworthy Foundations: Scaling Collaboration, Demonstrations, and Integration βΆοΈ |
| 2:45 β 3:10 | Invited Talks 5 β Akshay Kulkarni (UCSD), Interpretable-by-Design Generative Models βΆοΈ |
| 3:10 β 3:35 | Invited Talks 6 β Ali Arsanjani (Google), Compositional Interpretability in Agentic Architectures βΆοΈ |
| 3:35 β 4:00 | Invited Talks 7 β Jiaxin Zhang (Salesforce AI), Building Reliable Long-Horizon Agents βΆοΈ |
| 4:00 β 4:25 | Invited Talks 8 β Shruti Palaskar (Apple), VLSU: Mapping the Limits of Joint Multimodal Understanding for AI Safety βΆοΈ |
| 4:25 β 4:50 | Invited Talks 9 β Tz-Ying Wu (Intel AI), Interpretability as Factual Diagnosis for Video Captioning: Reference-Free Evaluation with Explanations βΆοΈ |
| 4:50 β 5:00 | π€ Concluding Remark |
π§ Lily Weng (lweng@ucsd.edu), Sanjoy Dasgupta (sadasgupta@ucsd.edu)