Assessing Purdue’s first year writing program

I’m currently heading up an assessment of Purdue’s first year writing program. We are collecting and analyzing a variety of student writing and beginning to report the results. This might be of interest to you if you teach or are interested in theories and practices of writing assessment. Note that this and other assessments are exploratory pilots, which will provide evidence for which aspects to refine for the full study next year.

Here’s a brief update I prepared on one of the assignments we piloted, the rhetorical analysis: Assessing the rhetorical analysis.

A theory of educational relativity

The theory of classical relativity understands all motion as relative. No object moves absolutely; an object moves relative only to the motion of other objects. The same can be said about much of learning and education: our educational growth is frequently measured relative to that of others–our classmates, coworkers, friends, family, and so on.

Relative educational growth recalls a concept central to test theory I’ve discussed before: norm referenced assessment. When norm referencing academic achievement, individual students are compared relative to one another and to overall group averages, which are other objects in motion. Norm referenced assessment differs from criterion referenced assessment, which measures individual growth relative to established objective criteria, stationary objects; that is, for criterion referencing, performance relative to peers doesn’t matter. Think of a driving test: you either pass or not, but passing doesn’t depend on you scoring better than your neighbor, but rather on you meeting the state-established criteria required for licensure.

As it were, I would argue most of educational assessment consists of broad norm referencing often masquerading as criterion referencing. As far as I’m concerned this is not really good or bad. “Masquerading” has negative connotations, of course, but I believe the masquerade is less deliberate than inevitable. Any teacher will tell you it’s really really hard not to compare their students to one another, even if subconsciously. Try reading a stack of 20 papers and not keeping an unofficial mental rank of performance.

Although norm referencing students relative to their peers’ class performance is somewhat inevitable, I think with careful attention (and a little statistics) our assessments can prioritize a superior norm referenced comparison than that of students to their peers: the comparison between the student and themselves.

Comparing a student’s performance to themselves recalls the familiar growth vs. proficiency debate in education circles, which our current Secretary of Education is infamously ignorant about. Basically, the argument is that schools should assess growth and not proficiency, since not all students are afforded the same resources and there is incredible variation in individual academic ability and talent. Because not all students start at the same place they therefore cannot all be expected to meet the same proficiency criteria. I agree. (Incidentally, this is why No Child Left Behind utterly failed, since it was predicated on the concept of all children meeting uniform proficiency criteria.)

One way to prioritize the assessment of growth over proficiency in a writing class is to use z-scores (a standardized unit of measurement) to measure how many standard deviations students are growing by during each assignment. Writing classes are particularly conducive to such measures since most writing assignments are naturally administered as “pre” and “post” tests, or, more commonly, rough and final drafts. Such an assignment design allows for growth to be easily captured, since a student provides two reference points for a teacher to assess.

By calculating the whole class’s mean score difference (μ) from rough to final draft, subtracting that number from an individual student’s rough and final draft score difference (x), and dividing by the standard deviation of the class score difference (σ), you obtain an individual z-score for each student, which tells you how many standard deviations their improvement (or decline) from rough to final draft represents.

Why do all this? Why not simply look at each student’s improvement from rough to final draft? Because we should expect some nominal amount of growth given the assignment design of a rough and final draft, so not all improvement could actually be improvement in such a context. Calculating a z-score controls for overall class growth, so a nominal improvement in scores from rough to final draft can be interpreted in the context of an expected, quantified amount.

To assess for an individual student’s growth relative to themselves, then, you can calculate these individual z-scores for each assignment and compare the z-scores for a single student across all assignments, regardless of differing assignment scales or values. This provides a simple way to look at a somewhat controlled (relative to observed class growth) measure of growth for an individual student relative to themselves over the course of the semester. In this way, we are better able to see more carefully the often imperceptible educational “motion” of our students relative to themselves and to peers.

If we transfer what we learn, then we need a map

This past weekend I traveled to the College English Association 2018 conference in St. Pete, Florida, to give a talk about “learning transfer.” Learning transfer, often simply “transfer” in education literature, is the idea that when we talk about education broadly and learning something specifically, what we really mean is the ability to transfer knowledge learned in one context (the classroom, e.g.) to another (the office, e.g.). It’s the idea that we have only really learned something when we can successfully move the learned knowledge into a new context.

As far as theories of learning go, I think transfer is fairly mainstream and intuitive. Of course, the particular metaphor of “transfer” has both affordances and limitations, as all metaphors do. Some critics offer “generalization” or “abstraction” as more appropriate metaphors, and there might be a case to be made for those. But as long as the theory of transfer prevails I think we first need to get some things straight. This is what my talk was about.

If learning is concerned with transferring, and thus moving, between two places like the classroom and the office, then we must have a map to help us navigate that move. In the humanities, the literature on transfer disproportionately focuses on the vehicle of transfer, not a map to guide us through landscape the vehicle traverses. The vehicle is of course literacy–reading and writing. We painstakingly focus on honing and developing the most perfect and best kinds of writing prompts and essay projects, as well as assigning the most thought-provoking reading material. And this is good. If we try to move, or transfer, from point A to B but our vehicle is broken or old or slow or unfit for the terrain, we can’t go anywhere. We like to say that our students don’t just learn to write, but they write to learn as well. Literacy is a vehicle for all knowledge. All disciplines understand this. It’s why we write papers in all our classes, not just English.

But no matter how luxurious our transfer vehicles (our writing assignments and reading requirements) are, if we don’t know where we’re going, how to get there, or what obstacles lie in our way, then it doesn’t much matter. So how do we map the abstract terrain of education and cognition? Here are two simple statistics that can help us: R-Squared and Effect Size, or Cohen’s d.



The R-Squared statistic is commonly reported when using regression analysis. At its simplest, the R-Squared stat is a proportion (that is, a number between 0-1) that tells you how much of the variance, or change, in one quantitative variable can be explained by another. An example: suppose you’re a teacher who assigns four major papers each semester, and you’re interested in which of the essays can best predict, or explain, student performance in the class overall. For this, you’d regress student performances on one of the assignments on their final course grades. Some percentage of the variance in your Y variable (final course grades) will be explained by performance in your X variable (one of your assignments). This can give you a (rough) idea of which of your assignments most comprehensively tests the outcomes for which your whole course is designed to measure. (R-Squared proportions are often low but still instructive.)

If we extend the transfer/map metaphor, R-Squared is like the highways on a map–it can help us find more or less efficient routes, or combinations of routes, to get where we want to go.

Effect Size, or Cohen’s d


Effect Size is a great statistic, because its units are standard deviations. This means Effect Sizes can be reported and compared across studies. Effect Sizes are thus often used in Meta-Analyses, one of the most powerful research techniques we have. At the risk of oversimplification, Effect Size is important because it takes statistical significance one step further. Many fret over whether a result in an experiment, like the test of an educational intervention, is statistically significant. This just means: if we observe a difference between the control and experimental group, what is the likelihood that the difference is simply due to chance? If the likelihood falls below a certain threshold (which we arbitrarily set, usually 5%, but that’s a different discussion), then we say the result is statistically significant and very likely real and not due to chance. However, some difference being statistically real doesn’t tell us much else about it, like how big of a difference it is. This is where Effect Sizes come in.

An Effect Size can tell us how much of a difference our intervention makes. An example: suppose you develop a new homework reading technique for your students and you test it against a control group, which will be based on performance on a later reading comprehension test. If the group means (experimental vs. control group) differ significantly, great! But don’t stop there. You can also calculate the Effect Size to see just how much they differ–and, as mentioned, Effect Size units are standard deviations! So you are able say something like: my new homework reading technique is so effective that a student testing at the 50th percentile in a different class will test about one standard deviation higher in my course, around the 84th percentile.

If we again extend the transfer/map metaphor, Effect Size helps us find the really good places to visit, or transfer to. It’s like a map of the best vacation spots. After all, most of us would rather visit the beach than North Dakota in the winter, so it’s good to know where’s worth going (for certain purposes).

Stats can help teachers

Basically, what I tried to argue in my talk is not that quantitative methods/analysis are superior, but that they can do a lot of cool things, and that they are particularly important for building maps. Maps are, after all, inherently approximate and generalized, just like grades and student performance on tests and any kind of quantitative measure are merely approximations. Maps are indeed limited in many ways: looking at a (Google street) map of Paris, for instance, obviously pales in comparison to staring up at the Palace of Versailles in person. But looking at a map beforehand can help you get around once you do visit. It’s the same with quantitative measures of learning. They can help us get a lay of the land, which then allows us to use our pedagogical experience and expertise to attend to the particular subtleties of each class and each student. Teachers are experimenters, after all, and each class, each semester, is a sample of students whose mean knowledge we hope is statistically significantly and sizably improved by the end of the course.

Dernst’s language compass

I’m currently working on my dissertation prospectus, which considers the next steps Automated Essay Scoring (AES) and Natural Language Processing (NLP) technologies are taking to enable machines to read and interpret language beyond surface level textual features. The big question mark is whether machines can assess the rhetorical dimension of writing, such as the use of metaphor, analogy, and persuasion, and how those and linguistic qualities like them might best be modeled, both statistically and conceptually. Here’s a quick mock-up of a compass-style model of two aspects of language I’m constantly assessing, the actual words used and the intent behind them:

Screen Shot 2018-04-03 at 2.50.26 PM.png

The degree of metaphor of an utterance’s words is contrasted with that of its literalness, while the orthogonal axis contrasts the degree of irony of an utterance’s intent with that of its sincerity.

Edit: An example might prove instructive. I would map the phrase “tax relief” onto the yellow square, as “ironic and metaphorical.” The metaphorical quality is obvious, since “relief” substitutes from an area other than taxation to pre-figure taxation as evil theft, which is itself not literal but also a metaphor. The irony is that those who utter the phrase “tax relief” don’t actually mean it.

“I can’t teach writing in only one semester”

One complaint I’ve consistently heard (and myself made) during all my years teaching college writing is that one or two semesters is not enough time to teach writing. One finding I’ve repeatedly come across in all my years poring over educational research is that the largest source of variance in student writing performance is the interaction between student and task. In other words, the fewer the number of tasks assigned in your course (assignments, papers, tests, etc.), the higher the variation in student performance, which means the less reliably (consistently) your class measures whatever it is we conceive of as “writing ability.” This seems intuitive to me, and it is consistently found in the literature. On average two different assignments, in the same course and even for the same student, are basically nonpredictive of performance in one by performance in another. A student can write an amazing editorial and absolutely bomb a research paper. Does that make your course an unreliable measurement of writing ability? Maybe.

The logical response to this is simple: just assign many more and wildly different tasks in your course to better cover the vast domain of writing ability and thus reduce performance variance. But, alas, that’s where the time constraint comes in. It’s virtually impossible to assign more than four major assignments a semester. Are four writing assignments enough to reliably capture writing ability? No way. Not given the infinite amount of genres and writing tasks out there and our evolving definition of writing ability.

So then, what if we increased the number of assignments, but made them smaller and spent less time on each? 10 small writing assignments a semester, instead of four major ones? Would 10 assignments more reliably capture writing ability and minimize our measurement error? Statistically, yes. Intuitively, I think yes, too. But I understand the resistance to this idea. There is value, I think, in longer, more in-depth writing assignments. I bet most freshman college students haven’t written a paper longer than 10 pages, and at some point they absolutely should write one. (I think multiple.)

But I wonder if that value can be realized in a freshman writing class. What is the realistic purpose of a one semester, freshman college writing class after all? If our time, and thus our measurement instrument, is narrowed to one semester, maybe we should break up the cognitive trait we intend to measure into smaller chunks, since the whole construct can never be reliably captured in a semester with four major assignments. It’s like we’re trying to measure several miles with four yardsticks. If we’re only going to get one (or two at most) semesters(s), maybe we should adjust the use of our narrowed instrument accordingly, by using it on more, smaller, varied tasks. Then instead of measuring miles with yardsticks we’ll at least be measuring yards with rulers.

What are grades?



Should teachers rank-order their students against one another?

The fancy way to ask this: Do we want norm-referenced grades? As opposed to criterion-referenced grades?

Norm-referenced means grades are assigned relative to a class average, while criterion-referenced means grades are assigned relative to a set of objective outcomes. Put another way: Do we care about what order the runners finish the race in? Or do we just want everyone to finish the race? What is a race?

Norm-referencing is common at the highest levels of educational and professional attainment, like in law or medical school. First in your class. Fifth in your class. Last in your class. Criterion-referencing, meanwhile, is more common in lower-stakes situations, such as a driving test. As long as you meet some level of competence, it doesn’t matter how your driving test score compares to that of your neighbor, you both get driver’s licenses. (Curiously, the Bar Exam and certain Medical Exams are criterion-referenced, even if prior schooling for the students taking them isn’t.)

Aside from law and medicine, school grades represent somewhat of a middle ground between pure ranking and broad licensure or credentialing. School grades reflect performance relative both to established criteria and to the performance of other students in the class. Think about it: we’ve all at times done A work or C work or D work, which means there are objective tiers of performance criteria. Theoretically, nothing prevents multiple, or even all, students from achieving the same grade, if they perform at roughly the same level. Yet this almost never happens. And grades, though tiered, are still ranked; the A tier is obviously better than the C tier. You can’t say you just care about everyone finishing the race if you rank-grade their performance. So what are grades?

My answer? Depends on the class. Some classes are indeed more like driving tests, and ranking students in such classes serves little purpose if we’re just trying to get students driving. But other classes perhaps do benefit from a more granular assessment of performance, like the rank-ordering of law and medical school students. Surgery isn’t driving.

My–and I suspect many others’–gut says ranking is bad, inherently hierarchical, discouraging, and inegalitarian. Yet, at the same time, I fear the service model of education comports all too well with a credentialing-style approach to grades, where students expect to just “finish the race” with little thought to how much learning actually occurs along the way.

Teach metaphor not grammar

At least once a semester since I started grad school (seven semesters now) I find myself embroiled in a familiar debate: to teach grammar or not to teach grammar. The majority of people outside of my field and outside the world of education broadly would laugh that this is even a question, because to most people of a course a writing teacher teaches grammar. Every time my job comes up in conversation with a non-teacher there’s inevitably a side comment made about watching their grammar around me. You can set your watch by it. In fact, to most people, grammar’s all I should teach. This is of course wrong, my field will eagerly point out. Writing is more than grammar; good grammar does not equal good writing; traditional, decontextualized grammar instruction–in the form of grammar handbooks or sentence diagramming–is largely ineffective as a pedagogical approach to teaching writing and usage to young, learning writers. 

However, I’ve always felt that because the grammar jokes persist, that because grammar is so strongly associated with writing and teaching in a general, it simply can’t be ignored. In a sense, these people are right. I’ve thus argued every semester I get in this debate that we absolutely we need to teach grammar. It’s irresponsible to ignore something so heavily weighted and so widely rewarded in our culture. Yes, we ought to live in a society that doesn’t disproportionately reward a certain interpretation of correct grammar, but we don’t. The question, then, is not should we teach grammar but how should we teach grammar.

In recent semesters I’ve concluded one effective way to teach grammar is to teach metaphor. The first unit of my course introduces Kenneth Burke’s four “master tropes” of language–metaphor, metonymy, synecdoche, and irony, the latter three each a species of the first. Now, in most composition circles, it’s currently fashionable to critique the teaching of writing at the sentence level; I might be laughed at by serious compositionists for emphasizing such dull linguistic nuances and sentence level tropes as metaphor and irony instead of spending valuable class time on having students explore their identity or something similar through their writing. I happen to believe assignments dealing with personal identity can (but not always) institutionalize identities in a way that doesn’t sit well with me, but that’s a different story. The real resistance to sentence-level writing pedagogy is that people associate it with sentence diagramming and rote grammar drills, which, many claim (and some studies support), are ineffective strategies for teaching writing.

But while sentence diagramming and decontextualized and rote grammar drills do in fact fail to teach writing, we must not confuse the method of pedagogical delivery for its content. Which is to say: the problem with attention to the sentence level is not the sentence level content itself; it’s the way we present the sentence level content. A big part of teaching writing, I’ve learned, is changing students’ perception of what writing and language is and does in the first place, an undoing of the view of language as solely grammatical, a product of the rote grammar exercises that programs us to think of language as merely a set of rules to feverishly follow. Coming at the sentence level from a different angle then–that of metaphor broadly and its various species, metonymy, synecdoche, irony–can help students re-think the content of the sentence level in productive, and, more importantly, novel ways. Language becomes less of a system of rules to anxiously follow and more of a vast toolbox for helping you describe and externalize some feeling inside you. Language becomes fun. Not to mention, teaching language as metaphorical, rather than simply “grammatical,” is a lot more intuitive to students. 

To be sure, emphasizing language as metaphorical as opposed to grammatical doesn’t remove the rules of grammar, it just transforms them. Foregrounding writing as metaphorical actually gets at grammatical concepts through a backdoor. It reframes the rules so they’re less about prohibition and more about assisting. It’s like crushing up your medicine and sprinkling it in pudding. Making students write deliberately metaphorical prose forces them to make use of more sophisticated grammatical constructions without them even realizing it. For example, early in the semester I often assign them the task of writing an extended metaphor for attending college, and by asking them to not only describe a complex event (attending college) but in a metaphorical way, they’re forced to reach for new kinds of tools in the toolbox of language to help construct their metaphor. Whether the students are conscious of it or not, these new tools contain various grammatical complexities, which then filter out into their writing more generally. After they write, then we can chat about what kind of grammatical affectations are present, how changing them might change the sentence or the metaphor as a whole, and so on.

I think we often teach grammar with the hope that, if students just memorize all the rules then they can write beautiful and grammatically complex sentences. But the rules of grammar are unintuitive, endless, and inherently restrictive (they are in fact rules.) Meanwhile, metaphors, or the four types I’ve briefly discussed, are generative. They’re not prohibitions, but sparks, suggestions. What’s more likely to enable you to write complex sentences, memorizing a bunch of rules or playing a game?

It’s not a perfect system and I’m still tweaking it, but it’s been a good move for me, and it helps resolve some of the anxieties I have about teaching grammar in Freshman Composition.

Education Discernments for 2017


The education journalist Kristina Rizga spent four years embedded at Mission High School in San Francisco and apprehended this key insight concerning modern education reform: “The more time I spent in classrooms, the more I began to realize that most remedies that politicians and education reform experts were promoting as solutions for fixing schools were wrong.” (Mission High page ix)

California Adopts Reckless Corporate Education Standards

Standards based education is bad education theory. Bad standards are a disaster. I wrote a 2015 post about the NGSS science standards concluding:

 “Like the CCSS the NGSS is an untested new theory of education being foisted on communities throughout America by un-American means. These were not great ideas that attained ‘an agreement through conviction.’ There is nothing about this heavy handed corporate intrusion into the life of American communities that promises greater good. It is harmful, disruptive and expensive.”

View original post 1,858 more words

Assonance and brand names

What’s in a name? Assonance. Assonance, of course, is the repetition of vowel sounds in successive words. I’ve long theorized, without any real research to either confirm or deny,* that one way to create a memorable or pleasant name is to use assonance. Incidentally, this mostly applies to brand and band/album names, I’ve found.** For example, the fifth studio (and third best) album by The National, a band I like, is titled “High Violet.” Now, there could be some artistic rationale for the title, but there is no corresponding track name and as far as I can tell no mention in any lyrics of either “high” or “violet.” At the risk of oversimplification, I’m left to conclude that, on some level, The National decided on High Violet because it sounds cool. Whether or not that’s why they arrived at that name is irrelevant, because it does “sound cool.” Why? Because the long i in “high” assonates with that in “violet.”

At least, that’s the only reason I can find for pairing High with Violet. Moreover, other, non-assonant but equally conceptually distant word pairs don’t sound as cool or pleasant or memorable. Consider: does Low Violet work as well? What about Quick Violet, Wet Violet, Brown Violet, Dumb Violet, Credible Violet, Expensive Violet? Now consider these assonant alternatives: Dry Violet, My Violet, Sky Violet, Why Violet, Like Violet, Shy Violet, White Violet. The second set preserves the same pleasance as High Violet, presumably through assonance.

But maybe you disagree with me or you don’t like assonance, yet you still want to pair your words in some subtle, playful way. You’re left with three other common phonological maneuvers: rhyme, consonance, and alliteration. Unfortunately, all three are too easy, too campy, too childish. Assonance achieves a more sophisticated attention to phonological detail. It suggests rhyme, but doesn’t go so far as to rhyme for you; it repeats open and round vowel sounds, not harsh, quick consonants (consonance); and unlike alliteration it doesn’t occur at the beginning of words, something you might find in a tongue twister or nursery rhyme. No, assonance is adult.

Assonance can occur within a single word, like Nirvana. And single word names/titles are trending hard, last I checked, especially for restaurants and bands.*** But what I’m diagnosing here is more the deliberate assonating of two or more conceptually unrelated words to create some ambiguously pleasant aura; that assonance, then, becomes the only connection between the words. Iconic band names frequently employ it: Led Zeppelin, Creedence Clearwater Revival, AC/DC, Lynyrd Skynyrd, Joy Division, Rolling Stones, and so on. And corporate America confirms my thesis too. When their names consist of more than one word, corporate brands love assonance. Out of the 50 most profitable brands, only 10 have names consisting of more than one word (though the one-worders often inter-assonate, like Toyota and Microsoft) but 6 out of those 10 use assonance. Examples include Coca-Cola, General Electric, Wal-mart, Home Depot. In general, brand names tend to have less conceptual distance between the name and product they offer (Home Depot is a home improvement store, after all) than artistic projects which involve layers of interpretive distance. But that proves my point further: even when constructing a practical brand name, using assonance can make your name/title that much more memorable. What if Home Depot were called Home Warehouse?

Maybe I will tell my students that I am renaming this Rhetorical Device Thursday to Assonate Day.

*I only do this with silly theories of no importance. I promise I don’t make a habit of willful ignorance.

**This is because over the last 7 years if I’ve ever been trying to name something, it’s either a band I’ve been in (all of which have had terrible names, maybe with the exception of Mote) or a hypothetical Brewery Zack and I will start and the subsequent beers we will brew. The name of one of the last beers I brewed used assonance–it was called “Hot Gold,” a phrase I got From Toni Morrison’s Sula.

***Bands also have a fixation on one word plurals (e.g. Battles).