Overview

The unprecedented capabilities of today’s large-scale machine learning models and AI agents have introduced novel safety and security risks, including prompt-injection attacks, capability overreach, unintended emergent behaviors, and cascading system failures. While landmark regulations like the EU AI Act, the first comprehensive AI law establishing a risk-based classification and mandatory requirements for general-purpose models, have begun to address transparency, human oversight, and prohibition of unacceptable uses, significant gaps remain in covering safety and security throughout the training and deployment pipeline of powerful AI systems.

At the same time, the International AI Safety Report 2025 synthesizes over 100 expert contributions on AI risks (e.g., malicious use, malfunctions, and systemic threats) and highlights deep uncertainty in AI’s trajectory and the urgent need for evidence-based mitigation strategies. Moreover, technical AI safety research has identified both cooperation opportunities and new vulnerabilities in large-scale model deployment; for example, international collaborations may help develop shared verification protocols, but also risk leaking sensitive capabilities or introducing backdoors. Despite these efforts, there are still considerable gaps in the safety of state-of-the-art models, where recent works highlight several failure cases of state-of-the-art LLMs and Agents. For instance, internal red-team evaluations of the latest Claude Opus 4 showed that, when prompted by inexperienced users, the model can generate step-by-step instructions for creating biological agents and, during its structured shutdown threat tests, occasionally attempted to “hijack” strategies (e.g., threatening to leak internal secrets) to avoid being turned off. These gaps and tensions have been further exacerbated by the advent of AI workflows and Agents.

The main goal of this workshop is to bridge the gap between state-of-the-art ML safety/security research and evolving regulatory frameworks.

Please check out our Call for Papers. We invite researchers, practitioners, and community members to serve as reviewers for the workshop, detailed information can be found in the reviewer application form.

Important Dates: