Why Validity Matters to Grading

A photograph of a handheld calculator with the number 100 in its screen on a tabletop.

Instructors should give due diligence to grading validity because grades can have a profound effect on a student’s life and opportunities. At some point during most courses, one or more summative assessments are administered so the instructor can evaluate student learning and assign grades. Summative assessments are administered after instruction is completed, and unlike formative assessments, students do not get a chance to retake the assessment. A summative assessment could be in the form of a midterm or end of course exam, a final project or paper, or a performance. Points awarded for a particular summative assessment could vary based on the importance or difficulty of the content being assessed. Often, an instructor will include non-cognitive points such as points for attendance or for completing assignments on time into the final grade. These non-cognitive points typically are based on motivation, integrity, and interpersonal interaction (ACT Key Facts, 2014). Adding in non-cognitive and extra-credit points creates a validity issue because it affects the interpretation and use of the final exam score for its intended purpose to evaluate learning. This article overviews the issues associated with valid grading based on scores from summative exams.

Cumulative Exams

A cumulative exam is an exam that tests students on all the course content since the start of the semester. It is typically given towards the end of a course. Cumulative exams are designed to assess the learning outcomes covered during a specific timeframe. Cumulative means that the exam is the result of a gradual increase in material from day one to the last day of study in the timeframe. If the course is designed to be split into two or more distinct parts, then there could be a cumulative exam for each part. Often the end-of-course cumulative exam determines the student’s grade for the course, or at least is heavily weighted when determining a final grade. Another way to explain cumulative exams is that they would contain material from all the chapters studied starting at the beginning of the course to the present. Students tend to use a cramming strategy when studying for cumulative exams, but a spaced retrieval study strategy would be less stressful on the student, and what they learned would last well beyond the exam.

Non-Cumulative Exams

Non-cumulative exams contain material only from the new/current chapter or unit that was studied. In other words, it measures a part of the whole. Since a non-cumulative exam will cover only a portion of the course, rather than the entire course, the exam would be based on the most recent material. In terms of percentages, a cumulative exam may cover, for example, sixty-percent of new material and forty-percent of what was covered on previous tests, but a non-cumulative exam would be one-hundred-percent new material. For a cumulative exam, the student would have to restudy what was covered on all previous exams. For a non-cumulative exam, the student would have to restudy what was covered after the last exam. Research shows that even though students do not like cumulative exams, learning is enhanced when students expect a cumulative final even if they don’t take one because they study differently for a non-cumulative exam (Szpunar, McDermott, and Roediger, 2007).  Another study found that students who took cumulative exams throughout the semester did better on the cumulative portion of the final exam (Lawrence, N. K., 2013). Students completing courses with cumulative finals retained more than students who took non-cumulative finals (Khanna, M.M., Brack, A.S., & Finken, L.L., 2013).

Creating a Points Scale to Assign Grades

Typically, points gathered from cognitive assessments plus non-cognitive and extra credit points awarded are put on the same points scale. The scale could, for example, run from 1 to 400 points. Cognitive points plus non-cognitive and extra credit points could have equal value, or certain points could be weighted and have more or less value. Because it is easier, an equal-value point scale is most often used.

An equal value point scale, however, does not solve the validity issue of exactly what a certain score or a particular letter grade means in terms of what the student is expected to know and do based on the course’s learning-outcomes. A study by Carriveau & Blake (2016), in which 1,490 individual student grades were analyzed, demonstrated that there was only a thirty-percent agreement between grades awarded using an all possible points scale versus grades awarded based only on cognitive assessments that matched the learning outcome statements.

Standard Error of Measure and Cut Scores

In addition to validity issues associated with the use of non-cognitive score points, there is also a validity concern when awarding grades based on summative assessment cut scores.  A cut score is the score in the scale where a particular letter grade is awarded. For example, on a 400 point scale, it may be determined that a grade of B requires 350 to 375 points. The cut score for a B would be 350 points.

The index used to describe the consistency or reliability of a particular student’s performance is referred to as the standard error of measurement (SEM). The SEM is an estimate of the consistence of an individual student’s score if an assessment was administered to that student multiple times. Since repeated measures for a summative test is not possible, the SEM is computed from the test reliability index. The SEM is like the margin of error that national polls report, such as “plus or minus 3% margin of error.”

The SEM means that a student’s “true” score could be slightly higher or lower than the score the student received, and thus the student’s grade could be higher or lower. For example, if the SEM is two points, then a score of 88 could be 90, using the SEM in the student’s favor. Thus, a student score of 88 on a cut score of 90 points for an A would make the student score of 88 an A.

Instructors rarely compute SEM, but an inference can be made that there is always some measurement error, and it would be the instructor’s prerogative to decide whether a particular student’s score should be adjusted.  If an instructor wanted to consider the SEM to resolve a student’s complaint about a grade or because the instructor considers the student’s grade to be inconsistent with other observed performances by the student, such as high scores and outstanding performances, then the instructor could decide to adjust the student’s grade upward. Since SEM is a plus or minus measure, the “true” score could also be lower, but that is something that should only be mentioned for argument sake and not acted on. The idea is not to punish students, but rather to make sure each student gets a fair evaluation.   Although it is rarely recommended, the instructor may also choose to consider to some small degree, non-cognitive points.

References

ACT WorkKeys. (2014). Key facts, cognitive and non-cognitive skills. Retrieved from https://www.act.org/content/dam/act/unsecured/documents/WK-Brief-KeyFacts-CognitiveandNoncognitiveSkills.pdf

Carriveau, R. & Blake, A. (2016). Challenging the coin of the realm. Presented at Scholarship of Teaching & Learning conference, Savannah, GA.

Khanna, M. M., Brack A.S., & Finken, L. L. Laura (2013). Short-and long-term effects of cumulative finals on student learning. Teaching of Psychology, 40(3), 175-182. Doi: https://doi.org/10.1177/0098628313487458

Lawrence, N. K. (2013). Cumulative exams in the introductory psychology course. Teaching of Psychology, 40(1), 15-19. https://doi.org/10.1177/0098628312465858

Szpunar, K. K., McDermott, K. B., & Roediger III, H. L. (2007). Expectation of a final cumulative test enhances long-term retention. Memory & Cognition, 35, 1007–1013.