Citation
Perelman, Les. “Construct Validity, Length, Score, and Time in Holistically Graded Writing Assessments: The Case Against Automated Essay Scoring (AES).” International Advances in Writing Research: Cultures, Places, Measures. Ed. Charles Bazerman, Chris Dean, Jessica Early, KAren Lunsford, Suzie Null, Paul Rogers, and Amanda Stanswell. Fort Collins: WAC Clearinghouse and Parlor, 2012. 121-32. Web.
Summary
Perelman makes four points regarding automated essay scoring systems and why they should not be used to score student writing. First, the scores mostly assess the length of the writing piece rather than actual elements as it simply counts. Then, because the timeframe for writing these timed pieces is so short, length correlates with score. These timed essays are abnormal not only because of the correlation between length and score, but because they do not have authentic prompts. Next, the validity of AES scoring has been proven to be false. Last, MS Word has better grammar checking software than AES, and MS Word is very limited itself (128).
Works Cited
Attali, Y., & Powers, D. (2008). A developmental writing scale (ETS Research Report RR-08-19). Princeton, NJ: Educational Testing Service.
Council of Writing Program Administrators, National Council of Teachers of English, and National Writing Project. (2011). Framework for Sucess in Post- secondary Writing. Retrieved from http://wpacouncil.org/ les/framework- for-success-postsecondary-writing.pdf
Elliot, N. (2005). On a scale: A social history of writing assessment in America.New York: Peter Lang.
Huot, B. (2002). (Re)articulating writing assessment. Logan, UT: Utah State University Press.
White, E. M. (1984). Holisticism. College Composition and Communication, 35(4), 400-409.
Quotations
- “Although White (1995) has made a case for the timed-impromptu for certain assessment decisions, it is a genre of writing that has no real analogue in real human communication and therefore is invalid as a measure. Indeed, the timed impromptu exists in no activity system except for mass-market writing assessments and education geared towards mass-market writing assessments.” (Perelman 122)
- “When students have one hour to write, the shared variance predicted by length decreases to approximately 20%, and when students are given 72 hours, length predicts 10% or less of the shared variance of the holistic score.” (Perelman 124)
- “They do not understand meaning, and they are not sentient. They do not react to language; they merely count it.” (Perelman 125)
Questions
If the grammar checking software is as limited as Perelman claims it is, how can it truly assess students’ grammar usage? How many students have been rated lower because of the limitations of its software?
Perelman explains that in 1966, AES emerged because human scorers were not reliable. What was first tried to make human scorers more reliable? Did norming sessions occur?
I have seen World War II referenced in the history of composition in a few articles. What happened during or just after World War II to make composition change as much as it did? Was there a paradigm shift during this time? How did World War II affect other disciplines?