How can we leverage technology to keep assessments up-to-date, relevant and flexible without sacrificing quality?

Automated Item Generation

Creating a scalable approach to generating high-quality exam content

As the health care landscape continues to evolve and more emphasis is placed on the formative assessment of physician competencies, the need for up-to-date and relevant exam content is more important than ever. Traditionally, the process of developing high-quality and reliable questions has not only required significant time and resources but is also not easily scalable.

This created a unique challenge for medical educators and one that NBME sought to address by exploring the application of automated item generation (AIG) in medical education assessment. AIG involves the use of cognitive modeling to automatically generate a high volume of multiple-choice questions, enabling educators to access a wider selection of content to measure student knowledge.

After running an initial proof-of-concept pilot in early 2020, NBME trained 38 subject matter experts from our test development committees on creating AIG models, which led to the generation of questions that were pretested on four NBME^® Clinical Science Subject Exams and the Health & Wellness Coach Certifying Exam. Item analyses indicate the AIG questions are performing at least as well as traditionally written items.

The success of this pilot inspired the creation of a new AIG tool specially tailored to NBME test items called IMAGE (Item Modeling and Automated Generation Engine), which is currently being used to generate questions for additional NBME exams.

Natural Language Processing

Developing a more efficient and transparent process for scoring clinical text

From speech recognition software to translation apps that foster better communication, Natural Language Processing (NLP) has made remarkable advances over the past decade. NBME is exploring new ways that this technology can be applied to medical education assessment.

One such application involves using NLP to automate the scoring of clinical text, such as patient notes or short-answer questions, with the goal of making the process more transparent and efficient. As outlined by Victoria Yaneva, Senior Data Scientist at NBME, this poses a unique research challenge:

“Having physicians score clinical text requires significant time, along with human and financial resources. NLP can aid this process by automatically mapping concepts from a scoring rubric to concepts found in the clinical text written by an examinee. However, this mapping is not always straightforward.

Some concepts such as ‘loss of interest in activities’ can be expressed as ‘no longer plays tennis,’ while other concepts may require combining multiple text segments or dealing with ambiguous linguistic constructions as in ‘no cold intolerance, hair loss, palpitations, or tremor’ corresponding to ‘lack of other thyroid symptoms.’ These cases require creative computational solutions.”

To help advance the area of clinical text scoring, NBME sponsored a Kaggle competition on this topic. Data scientists from all over the world enrolled to develop the best NLP scoring approach using the same dataset of clinical text. The May 2022 results will remain open source, so others can benefit and continue developing shared value for the community.