IN THIS LESSON

his is the fourteenth lecture in the Language Models and Intelligent Agentic Systems course, run by Meridian Cambridge in collaboration with the Cambridge Centre for Data Driven Discovery (C2D3).

This lecture covers some promising solutions and approaches to mitigating risks from AI: AI control and safety cases. We begin by motivating the research agenda of AI control, detailing the basic ideas, and then exploring some concrete examples. After this we discuss safety cases, and how these can be used to structure arguments that AI systems deployed within specific contexts will be safe. Finally, we discuss a recent paper that takes a detailed look at the intersection of these two approaches.