Reflections on “robo-grading” and my first month teaching first-year writing

Some of my students’ comments on the first month of class, including my personal favorite

I have 18 students in an 8:00 a.m. class who have been dutifully (if a bit groggily and yawningly) showing up three times a week to work on their writing. The course is not just about writing, though. It’s titled “Writing and Academic Inquiry,” and my goal has been to work with them on their writing as a skill but also to introduce them to writing as a subject of inquiry. The topic I selected is called by various names; by its proponents, “automated essay scoring” (which is its handle on Wikipedia) “automated writing assessment,” “automated writing evaluation,” and the like. By its opponents, it’s often called “robo-grading,” and similar-sounding names. I prefer “computer aided writing assessment,” or CAWA.

This post introduces CAWA, describes how my students and I have been exploring it, identifies what I think is a significant gap in the research, and explains how we will sustain our inquiry going forward. I’ll also briefly share my thoughts, and the students’, about how it’s going so far.

Introducing CAWA

You can easily find popular accounts of CAWA, including ones that my students read in their first week from Reuters, TechCrunch, the New York Times, and Slate. CAWA works from the notion that automated tools may be useful for assessing student writing, either at the formative stage–when students are still writing and revising–or at the summative stage–when student writing is being evaluated. Formative tools, like ETS’s Criterion® and Pearson’s WriteClick™, are supposed to help students improve their writing before turning it in. Summative tools, like ETS’s e-rater® (which is also embedded in Criterion®) can theoretically be used to “grade” student writing, and in fact, it is used for that purpose in the administration of the GMAT test (for admission to business graduate school).

My sense of the debate is that the two sides are talking past each other. The opponents (including most rhetoric/composition scholars who appear to have weighed in on the topic; see, for example, the essays in Freitag Ericsson & Haswell, 2006) attack CAWA on the grounds that (a) computers can’t really understand student writing; (b) use of CAWA is motivated by political and economic concerns that are not taking the best interests of students into account; (c) CAWA is not reliable or valid; and (d) reliability and validity are not applicable measures where writing assessment is concerned anyway. The proponents (see, for example, the essays in Shermis & Burstein, 2003) sometimes acknowledge the controversy, but are focused most on improving the validity and reliability of the tools.

How we’ve explored CAWA and the gap we’ve found

At the beginning the semester, my students were ignorant of CAWA (and so, for the most part, was I). What’s more, my students were also ignorant of current theories and practices in post-secondary writing assessment. (I have 8 years’ practice in that space, but I must confess that my practice has not been as informed by theory as I’d like.) So, in addition to the treatment in the popular press that I mentioned above, we’ve read four other pieces together: Wang & Brown (2007), Glenn and Goldthwaite (2008), McAllister and White (2006), and Burstein (2003). These texts provide some understanding of how humans assess (or purport to assess) writing, how CAWA does it, and what some of the controversy surrounding CAWA is. Some of these pieces are tough to read because of technical jargon and statistical concepts, but we’ve been working through them together.

One thing is profoundly clear: Despite reams of paper (or GB of data, in the case of online journals) written on CAWA, hardly anyone has taken account of student perceptions of it. In a book chapter ostensibly devoted to considering the various voices in the CAWA “dialectic,” McAllister and White (2006) devote exactly one paragraph to students’ interests. As it happens, my students so far appear to be ready to take a wait-and-see approach. They appear to accept, at least in theory, both the criticisms of CAWA and the arguments for its utility, but I have not yet asked them to judge CAWA. So far, they’ve just turned in an annotated bibliography (A/B) with abstracts of the nine sources we’ve read together.

October and forward

In October, we are focusing on two things: Gathering some data for students to use in assessing CAWA from their own perspectives and helping students settle on final paper topics. For the former, each student’s A/B from the send of September will be rated by a group of students according to the assignment grading criteria that students and I developed together two weeks ago. Each student A/B will also be evaluated by e-rater® (an implementation of which is available through Minnesota’s Moodle/Turnitin module). Finally, each student will receive an evaluation from me. Then, the author of each paper will write a brief reflection on the evaluations she’s received and whether and how they helped her prepare her next major assignment, which includes a revision of the A/B and is due November 2.

Meanwhile, each student is required to read and write analytical abstracts about four more sources (articles or book chapters) that they select themselves during October. They are selecting these pieces based on their own interests related to our broad topic. I’ve told them that they don’t need to write their final papers on “robo-grading,” but that what ever their final topic is, they should use the sources we read together in September in the paper. So, for example, a student might write a final paper on the standardized-test mania that has gripped the U.S. and use CAWA as just one example of it. They will also be able to use the data we gather as a class in October in their papers, if they choose to.

I think this is a good approach, but this final paper topic, and the uncertainty about what it will be in each student’s case, is the cause of some anxiety, as we shall see.

How’s that going for you?

I’m happy with the progress the students are making. Their writing on low-stakes, ungraded assignments so far has been quite good. I’m looking forward to working with them to improve it. As for their impressions, I asked them on October 1 to break into their groups and (while I was absent from the room), write on the board their impressions of how the class is going so far: what they’d change, the quantity of work, whether they are learning, whether I’m giving them what they need. I took down what they wrote, and here it is, verbatim:

Group 1
–lack of ‘gripping’ subject matter [the word ‘gripping’ has become an inside joke in our class… long story]
–good pace
–learning HOW to write?
–assignments are clear
–Moodle is very clear… kept up to date

Group 2
Too much reading
Topic—too technical(?)
Thank you for thorough explanations
One-on-one conferences=good idea
Moodle is helpful

Group 3
To much reading
To much writing
Incredibly boring topics
-Very funny
-You have a good heart
-Moodle issues
-Final topic is still confusing

Group 4
Over Explaining
A lot of reading
The instructor is helpful
perfect

I’m trying not to overinterpret their comments. Based on them, would describe students as either satisfied or as dissatisfied in a way that’s acceptable to me 😉

Until next time!

-Brian

References

Burstein, J. (2003). The E-rater® scoring engine: automated essay scoring with natural language processing. In M. D. Shermis & J. Burstein (Eds.), Automated Essay Scoring: A Cross-Disciplinary Perspective (pp. 113–121). Mahwah, NJ: Lawrence Erlbaum.

Freitag Ericsson, P., & Haswell, R. H. (Eds.). (2006). Machine Scoring of Student Essays: Truth and Consequences. Logan, UT: Utah State University Press.

Glenn, C., & Goldthwaite, M. A. (2008). Evaluating student essays. The St. Martin’s Guide to Teaching Writing (Sixth ed., pp. 114–147). Boston, MA: Bedford/St. Martin’s.

McAllister, K. S., & White, E. M. (2006). Interested complicities: the dialectic of computer-assisted writing assessment. In P. Freitag Ericsson & R. H. Haswell (Eds.), Machine Scoring of Student Essays: Truth and Consequences (pp. 8–27). Logan, UT: Utah State University Press.

Shermis, M. D., & Burstein, J. (Eds.). (2003). Automated Essay Scoring: A Cross-Disciplinary Perspective. Mahwah, NJ: Lawrence Erlbaum.

Wang, J. & Brown, M.S. (2007). Automated Essay Scoring Versus Human Scoring: A Comparative Study. Journal of Technology, Learning, and Assessment, 6(2).

1 thought on “Reflections on “robo-grading” and my first month teaching first-year writing”

Brian Larson says:

October 2, 2012 at 20:49

One of my students located this article, which I had seen before, as at least one example where someone has studied the effect of CAWA on students, if not the students’ perceptions:

Warschauer, M., & Grimes, D. (2008). Automated Writing Assessment in the Classroom. Pedagogies: An International Journal, 3(1), 22 – 36.