
Dr. Kristen Murphy, University of Wisconsin-Milwaukee
Seminar Title: "It’s the items that make the assessment: Investigating test and item properties for better measurement"
Host: Matt Wu, wu.6250@osu.edu
Seminar Zoom Link: https://osu.zoom.us/j/9303326442?pwd=UncwWmlxMkU4bzV5OThmYWU0NHJiZz09
ABSTRACT
Since 1930, the American Chemical Society, Division of Chemical Education’s Examinations Institute (ACS Exams) has developed and produced chemistry tests utilizing exam committees of expert instructors. Similarly, instructors regularly write, administer and grade assessments ultimately to assign a grade for a course. These classroom assessment efforts can be valuable measurements of the progress of students through the program, and efforts to associate measurements between multiple courses in a program are routinely included in programmatic assessment requirements. But, how are tests developed, and, once written, what processes are in place to evaluate the test? Because tests are commonly used as a measurement for the degree to which a student understands the content, it is important that they are psychometrically sound instruments. Considerations as to what is the best measure of performance and how to capture this is important. In addition to traditional scoring of tests, there are other methods of scoring that can be used to capture important measures or inform decisions such as placement into courses, subscoring by content areas or identifying item fairness by student subgroups. This leads into item-level investigations including classic test statistics, item content, and item fairness as well as item environment effects and guessing. Standardized tests could be criticized for lower alignment to the content choices made by an instructor or department as well as using national samples for constructing norms that may not align to their institution-type. Recent investigations into custom exam and norm generation to address this have been conducted. Finally, methods for examining item performance over time for item stability and connecting varying samples or different exams will be presented.