IN THIS LESSON

This is the sixth lecture in the Language Models and Intelligent Agentic Systems course, run by Meridian Cambridge in collaboration with the Cambridge Centre for Data Driven Discovery (C2D3).

This lecture covers reward modelling. Reward modelling is the problem of training a neural network to output rewards in response to behaviours which acurately capture human preferences over those behaviours. We'll cover the motivation behind reward modelling, the Bradley-Terry loss used to train reward models, how preference data is obtained to train these models, and shortcoming and open problems in current reward modelling.

View Lecture Slides

6. Reward Modelling