How Do LLMs Rate?
A Data-Driven Discussion of AI-Powered Item Review
Thursday, July 31 | 1pm EDT / 10am PDT
PANEL
Reese Butterfuss, AI Psychometrician
Neil Wilkinson, Director of Product and Innovation
Abby Esten Attieh, VP of Certification
Abby Esten Attieh, VP of Certification
What if AI could spot the same flaws in exam questions that expert reviewers do ... but in seconds? Building on the findings from our recent study, this webinar will showcase how large language models (LLMs) like GPT-4.1 are capable of identifying item-quality issues such as implausible distractors, conspicuous keys, and unclear stems with remarkable accuracy.
We'll explore:
- The challenges of traditional item review and how AI can streamline the process
- A head-to-head comparison of GPT-4.1 and GPT-4o in detecting common item flaws
- Real-world results from an exam analysis
- What’s next: Evaluating AI-generated revisions and scaling across item types and domains