Berry, P. M., Donneau-Golencer, T., Duong, K., Gervasio, M., Peintner, B., & Yorke-Smith, N. (2009, April). Evaluating user-adaptive systems: Lessons from experiences with a personalized meeting scheduling assistant. In Twenty-First IAAI Conference.
We discuss experiences from evaluating the learning performance of a user-adaptive personal assistant agent. We discuss the challenge of designing adequate evaluation and the tension of collecting adequate data without a fully functional, deployed system. Reflections on negative and positive experiences point to the challenges of evaluating user-adaptive AI systems. Lessons learned concern early consideration of evaluation and deployment, characteristics of AI technology and domains that make controlled evaluations appropriate or not, holistic experimental design, implications of “in the wild” evaluation, and the effect of AI-enabled functionality and its impact upon existing tools and work practices.