How Do LLMs Rate?

A Data-Driven Discussion of AI-Powered Item Review

Thursday, July 31 | 1pm EDT / 10am PDT

PANEL

Reese Butterfuss, AI Psychometrician

Neil Wilkinson, Director of Product and Innovation
Abby Esten Attieh, VP of Certification

What if AI could spot the same flaws in exam questions that expert reviewers do ... but in seconds? Building on the findings from our recent study, this webinar will showcase how large language models (LLMs) like GPT-4.1 are capable of identifying item-quality issues such as implausible distractors, conspicuous keys, and unclear stems with remarkable accuracy.

We'll explore:

The challenges of traditional item review and how AI can streamline the process
A head-to-head comparison of GPT-4.1 and GPT-4o in detecting common item flaws
Real-world results from an exam analysis
What’s next: Evaluating AI-generated revisions and scaling across item types and domains

KNOW MORE

Schedule a call

How Do LLMs Rate?

A Data-Driven Discussion of AI-Powered Item Review

Sign up to get notified about our upcoming sessions.

Webinar - Topics on Demand

Previous sessions detailed the steps of exam development and highlighted how Certiverse's technology can simplify those processes.

Check out these and all past webinars on our channel.

Certiverse's transformative approach to exam development