Overview

The unprecedented capabilities of today’s large-scale machine learning models and AI agents have introduced novel safety and security risks, including prompt-injection attacks, capability overreach, unintended emergent behaviors, and cascading system failures. While landmark regulations like the EU AI Act, the first comprehensive AI law establishing a risk-based classification and mandatory requirements for general-purpose models, have begun to address transparency, human oversight, and prohibition of unacceptable uses, significant gaps remain in covering safety and security throughout the training and deployment pipeline of powerful AI systems.

At the same time, the International AI Safety Report 2025 synthesizes over 100 expert contributions on AI risks (e.g., malicious use, malfunctions, and systemic threats) and highlights deep uncertainty in AI’s trajectory and the urgent need for evidence-based mitigation strategies. Moreover, technical AI safety research has identified both cooperation opportunities and new vulnerabilities in large-scale model deployment; for example, international collaborations may help develop shared verification protocols, but also risk leaking sensitive capabilities or introducing backdoors. Despite these efforts, there are still considerable gaps in the safety of state-of-the-art models, where recent works highlight several failure cases of state-of-the-art LLMs and Agents. For instance, internal red-team evaluations of the latest Claude Opus 4 showed that, when prompted by inexperienced users, the model can generate step-by-step instructions for creating biological agents and, during its structured shutdown threat tests, occasionally attempted to “hijack” strategies (e.g., threatening to leak internal secrets) to avoid being turned off. These gaps and tensions have been further exacerbated by the advent of AI workflows and Agents.

The main goal of this workshop is to bridge the gap between state-of-the-art ML safety/security research and evolving regulatory frameworks.

Please check out our Call for Papers. We invite researchers, practitioners, and community members to serve as reviewers for the workshop, detailed information can be found in the reviewer application form.

Important Dates:

Keynote Talks

Technical safeguards from epistemically cautious AI: Scientist AI

Portrait

Yoshua Bengio

Professor

Université de Montréal

Hallucinations, jailbreaks, and beyond

Portrait

Yarin Gal

Professor

Oxford University

Observations at the Intersection of Privacy and Machine Learning

Portrait

Gary Howarth

Physical Scientist

NIST

Guarding the Age of Agents: Advancing Risk Assessment, Guardrails, and Security Certification

Portrait

Bo Li

Associate Professor

University of Illinois Urbana-Champaign

Panel Discussion

Schedule

Dec 07, 2025, Upper Level Room 1AB @ San Diego Convention Center

TimeActivity
08:45-09:00Opening Remarks
09:00-09:40Contributed Talk Session 1
Contributed Talk 1: LatentGuard: Controllable Latent Steering for Robust Refusal of Attacks and Reliable Response Generation (Presenter: Yi Huang)
Contributed Talk 2: Policy-as-Prompt: Real-Time Guardrails for AI Agents (Presenter: Gauri Kholkar)
Contributed Talk 3: SemScore: Practical Explainable AI through Quantitative Methods to Measure Semantic Spuriosity (Presenter: Wei May Chen)
Contributed Talk 4: Rule Construction and Interpretation for Constitutional AI (Presenter: Lucy He)
09:40-10:00Coffee Break
10:00-10:30Invited Talk 1: Yarin Gal
10:30-11:00Invited Talk 2: Yoshua Bengio
11:00-12:15Poster Session + Speaker Office Hours
12:15-13:30Lunch
13:30-14:00Invited Talk 3: Gary Howarth
14:00-14:40Contributed Talk Session 2
Contributed Talk 5: How do data owners say no? A case study of data consent mechanisms in web-scraped vision-language AI training datasets (Presenter: Chung Peng Lee)
Contributed Talk 6: On the Regulatory Potential of User Interfaces for AI Agent Governance (Presenter: Kevin Feng)
Contributed Talk 7: SpecEval: Evaluating Model Adherence to Behavior Specifications (Presenter: Ahmed Ahmed)
Contributed Talk 8: Anatomy of a Machine Learning Ecosystem: 2 Million Models on Hugging Face (Presenter: Hamidah Oderinwale)
14:40-15:10Invited Talk 4: Bo Li
15:10-15:50Coffee Break + Speaker Office Hours
15:50-16:50Panel: The AI Wars: Regulation, Open Science, and the Race for Global Power
Panelists: Melissa Fabros, Rich Caruana, Jiahao Chen, Ian Eisenberg, Avijit Ghosh (Moderator: Chirag Agarwal)
16:50-17:30Networking + Wrap Up

Past Editions

Organizers

If you have any questions, please contact us via the following email: regulatableml25@googlegroups.com.

Core Organizing Team

Chirag Agarwal

Chirag Agarwal

University of Virginia
 

Hima Lakkaraju

Hima Lakkaraju

Harvard University
 

Jiaqi Ma

Jiaqi Ma

University of Illinois
Urbana-Champaign

Sarah Tan

Sarah Tan

Salesforce
Cornell University

Student Organizers

Junwei Deng

Junwei Deng

University of Illinois
Urbana-Champaign

Pingbang Hu

Pingbang Hu

University of Illinois
Urbana-Champaign

Eileanor LaRocco

Eileanor LaRocco

University of Virginia
 

Karolina Naranjo

Karolina Naranjo

University of Virginia
 

Shichang Zhang

Shichang Zhang

Harvard University
 

Program Committee

NameAffiliation
Sina AbdidizajiUniversity of Central Florida
Amina A. AbduUniversity of Michigan - Ann Arbor
Sepideh AbediniVector Institute
Ahmed M AhmedStanford University
Ashutosh AhujaStarbucks
Ayoub AjarraINRIA
Rohan Deepak AjwaniUniversity of Toronto
Nicolas AlderHasso Plattner Institute
Hadi AsghariTechnische Universität Berlin
Muhammad H. AshiqUniversity of Wisconsin - Madison
Alexander BakarskyETHZ - ETH Zurich
Aparna BalagopalanMassachusetts Institute of Technology
Solon BarocasMicrosoft Research
Seán BoddyUniversity of Dublin, Trinity College
Rishi BommasaniStanford University
Vamshi Krishna BonagiriMohamed bin Zayed University of Artificial Intelligence
Edisy Kin Wai ChanUniversity of Southampton
Nischal Reddy ChandraAdobe Systems
Abhiroop ChatterjeeJadavpur University
Jiahong ChenUniversity of Sheffield
Peijie ChenNoteworthy AI
Jiahao ChenNew York City
Elliot CreagerUniversity of Waterloo
Madeleine I. G. DaeppResearch, Microsoft
Jessica DaiUniversity of California, Berkeley
Junwei DengUniversity of Illinois at Urbana-Champaign
Sihao DingMercedes-benz R&D NA
Kate DonahueMassachusetts Institute of Technology
Timothy R. DubberAustralian National University
Eric EnouenCornell University
Carson EzellHarvard University
Fatima EzzeddineUniversita della Svizzera Italiana
Kevin FengUniversity of Washington
Ashley FerreiraCIGI
Philippe GiabbanelliOld Dominion University
Vacslav GlukhovNext Step Fusion
David Gray GrantUniversity of Florida
Mingfei GuoStanford University
Tessa HanHarvard University, Harvard University
Leif Hancox-Livijil
Galen HarrisonUniversity of Virginia, Charlottesville
Muhammad HassanUniversity of Illinois at Urbana-Champaign
Carl-Leander HennekingEpiq AI Labs
Sayash Raaj HiraouFidelity Investments
Pingbang HuUniversity of Illinois at Urbana-Champaign
Amtul B. IfraBISXP
Ismat JarinUniversity of California, Irvine
Tyler M. JohnRutgers University
Nari JohnsonCMU, Carnegie Mellon University
Santhosh KakarlaGeorge Mason University
Arturs KanepajsPour Demain
Gauri KholkarPure Storage
David KinneyWashington University, Saint Louis
Arinbjörn KolbeinssonUniversity of Virginia, Charlottesville
Jeanice KoorndijkDecathlon
Satyapriya KrishnaHarvard University
Eileanor LaRoccoUniversity of Virginia, Charlottesville
Chung Peng LeePrinceton University
Xiaoxia LeiShanghai Jiao Tong University
Zichao LiUniversity of Waterloo
Xiaomin LiHarvard University, Harvard University
Ilija LichkovskiAI Safety Initiative Groningen
Vítor LourençoUniversidade Federal Fluminense
Kuan LuCornell University
Arushi GK MajhaUniversity of Cambridge
Chris MarsdenMonash University
Audra McMillanApple
Carlos MouganUniversity of Southampton
Karolina NaranjoUniversity of Virginia, Charlottesville
Imran NasimUniversity of Surrey
Ezinne NwankwoUniversity of California, Berkeley
Tony O'HalloranNational University of Ireland, Galway
Hamidah OderinwaleMcGill University
Alex OesterlingHarvard University
Victor OjewaleBrown University
Lorenzo PacchiardiUniversity of Cambridge
Wesley PasfieldUS Census Bureau
Patricia PaskovRAND Corporation
Krishna PillutlaIndian Institute of Technology, Madras, Dhirubhai Ambani Institute Of Information and Communication Technology
Gokul Srinath Seetha RamCalifornia Polytechnic State University, Pomona
Atul RawalTowson University
Shaina RazaVector institute
Lauren Aris RichardsonRAND Corporation
Anthony J. RipaState University of New York at Stony Brook
Ananya SalianUniversity of Melbourne
Arpita SarkerHoschule Heilbronn
Pratinav SethLexsi.ai
Mohit SharmaIndraprastha Institute of Information Technology, Delhi
Xudong ShenNational University of Singapore
Huizhen Shuhydrox.ai
Varshini SubhashAmazon
Dippu Kumar SinghFujitsu, Fujitsu Research and Development Center Co. Ltm.
Jeff Smith2nd Set AI
Harshini SureshaPes University
Susanna Di VitaETHZ - ETH Zurich
Jennifer WangBrown University
Fulton WangMeta
Azmine Toushik WasiComputational Intelligence and Operations Laboratory
Alina WernickEberhard-Karls-Universität Tübingen
Han WuStanford University
Yang XiaoUniversity of Tulsa
Zou YangDartmouth College
Rui-Jie YewBrown University
James ZhangDepartment of Computer Science, Princeton University
Churan ZhiUniversity of California, San Diego
Tracy Yixin ZhuUniversity of Chicago