ActiveUltraFeedback: Efficient Preference Data Generation using Active Learning

Published in International Conference on Machine Learning (ICML), 2026

Authors: Davit Melikidze^*, Marian Schneider^*, Jessica Lam^*, Martin Wertich^*, Ido Hakimi, Barna Pásztor, and Andreas Krause

^* Equal contribution

ActiveUltraFeedback studies how to reduce the annotation cost of preference data collection for Reinforcement Learning from Human Feedback while maintaining strong downstream model performance.

The paper introduces a modular active learning pipeline that uses uncertainty-aware reward estimates to identify the most informative response pairs for labeling. In addition to standard selection strategies, we evaluate methods such as Double Reverse Thompson Sampling and DeltaUCB, which prioritize comparisons with large predicted quality gaps.

Across our experiments, the resulting datasets consistently improve data efficiency and can match or exceed static baselines with substantially fewer annotations. The project combines methodological contributions, an open-source implementation, and released preference datasets for further research.

Paper BibTeX Blog Code Dataset Poster

Direct Link