Human-AI Collaboration for Knowledge-in-use Assessment Design: Leveraging LLMs with RAG
Juanhui Li, Tingting Li, Hang Li, and
4 more authors
In 2025 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE), Apr 2025
Recent advancements in educational pedagogy
have shifted from static benchmarks to dynamic tools, with
knowledge-in-use gaining prominence. However, effectively assessing knowledge-in-use remains a significant challenge, necessitating the development of assessments that capture students’
ability to transfer knowledge beyond the classroom. Despite
the Next Generation Science Standards (NGSS) advocating for
high-quality, formative assessments, many educators struggle to
design and implement NGSS-aligned assessments due to their
time-consuming and labor-intensive nature. Artificial Intelligence
(AI), particularly Large Language Models (LLMs), presents a
promising solution by automating knowledge-in-use assessment
generation, with the potential to significantly enhance efficiency.
However, LLMs often lack domain-specific expertise, and no
standardized framework exists for evaluating their outputs. To
address these challenges, we integrate Retrieval-Augmented Generation (RAG) to enhance LLMs’ comprehension of educational
content and develop a Human-in-the-Loop strategy to refine
and evaluate AI-generated assessments. Meanwhile, we design
evaluation rules as quality standards and involve both human
experts and LLMs in assessing the generated content. This
LLM-based pipeline significantly improves the efficiency, and the
results demonstrate that human guidance significantly improves
LLM generations, leading to high-quality assessments that align
with the proposed evaluation rules.