Benchmarking LLMs for Agricultural Advisory: Insights from a Global Community of Practice

As large language models (LLMs) are increasingly applied in agricultural advisory services, there is growing recognition of the need for shared benchmarks to evaluate their performance, equity, and contextual relevance. In response, a global community of practice, supported by the Gates Foundation, has begun developing standards and tools to assess how well LLMs perform across diverse use cases and geographies.

This webinar shares emerging insights from the first six months of collaboration. Building on an initial convening in May 2025, the session highlights practical approaches for evaluating AI performance in real-world agricultural settings, with attention to linguistic diversity, contextual relevance, and gender responsiveness.

Jagannath R (Precision Development) will talk about an AI evaluation framework that has three key pillars – AI model evaluations, user evaluations, and product evaluations. Jagan will also speak about the golden Q&A dataset that PxD is building.

Josue Kpodo (Michigan State University) will discuss his experiences in creating question-answering benchmark datasets tailored to practical agricultural extension use cases. He will also share his approaches for developing both qualitative and quantitative metrics to assess the capabilities of LLMs in staying grounded within agricultural domains.

Michael Minkoff (Athena Infonomics) will share key takeaways and insights from development of one of the community of practice’s Discussion Papers (https://agxai.notion.site/a-look-at-benchmarking-initiatives).The paper includes core elements of effective benchmarking, additional benchmarking tools and solutions developed (or under development), key challenges that need to be addressed, and ways the community of practice broadly – and the Benchmarking LLMs working group specifically – hope to mobilize and execute further progress in service of effective benchmarking of AI for Ag solutions.

This session aims to build momentum around a shared benchmarking agenda—one that supports responsible innovation in the use of AI for agricultural extension.

Speakers

Jagannath R, Research Manager, Precision Development (PxD)
Josué Kpodo, PhD Candidate, Michigan State University
Michael Minkoff, Director, Athena Infonomics

Discussant

Niyati Singaraju, Postdoctoral Fellow, Gender Research, International Rice Research Institute (IRRI); Gender and Inclusion Focal Point, CGIAR Gender + AI Accelerator & Digital Transformation Initiative

Moderator

Eliot Jones-Garcia, Senior Research Analyst, IFPRI; PhD Candidate, Wageningen University

Who we are

What we do

Research topics

Projects and impact

Food Security Portal

Modeling tools

What’s new

Where we work

IFPRI Regional Programs

IFPRI Country and Regional Offices

Research by country and region

Benchmarking LLMs for Agricultural Advisory: Insights from a Global Community of Practice

November 6, 2025

Links

Countries

Topics

Units