Covid-19 by the numbers

Rainbow gravity theory - Wikipedia

Historians of science debate it, but many consider the first example of “true science”—defined as the effort to numerically describe natural world phenomena—to be when the ancient Greeks calculated the ratios of musical tone intervals. The Greeks discovered that a string of the same thickness and tension as another but twice its length vibrated at a frequency an octave lower, meaning the ratio of an octave is 2:1. The ratio of a perfect fourth was calculated to be 4:3, a perfect fifth 3:2, and a tone (whole step) 8:9.

This process of quantifying intervalic harmony came to structure the Greeks’ entire way of thinking, extending beyond the strings of the lyre all the way to the heavens. They soon formed an early version of astronomy in which the planets were thought to exist in various degrees of harmony with one another. Sometimes consonant and other times dissonant, the planets orbited at differing harmonic intervals above or below another. Venus and Earth were a minor third apart.

This is hailed as scientific thinking—translating reality into numbers, identifying patterns, and inferring broad conclusions. But describing the distance between Earth and Venus as a minor third, we now know, is not science but poetry. While the ratios have proved correct and useful for musical tones, they are a useless framework for analyzing astronomical patterns. Yet numbers have that kind of seductive power. Their seeming objectivity projects a certainty that is both comforting and dangerous.

I’ve been thinking about the ancient Greeks as we continue to battle Covid-19. We are deep in the process of describing this novel virus using our own modern quantitative language. The pandemic has brought with it a dizzying array of numbers to decipher: total confirmed cases, running death toll, case fatality rate, R0 (the rate of infection), growth rate, daily test totals, 7 day case average, incubation period, six feet of social distancing, and at-risk age ranges are just some of the most common metrics. Obviously, vital research is being conducted using these quantities, and this is not meant as an anti-intellectual or conspiratorial screed. But some days I feel like we are the ancient Greeks, staring at the sky and charting the planets using major and minor musical scales.

Almost every day, I visit the CDC Covid Data Tracker, which collects all these numbers and more. The interactive graphs and quantitative specificity exude an air of authority. Yet, crucially, each of these metrics is an approximation, a best guess. We will never know the true totals and rates. No matter how many decimal places we account for and log-adjusted rates we calculate, numbers are crude, blunt instruments that can only ever describe the vague contours of the pandemic.

Now that we are beginning to open the country back up I worry that our obsession with numbers may be leveraged for callous justifications. Although numbers help us stay vigilant, they can also be wielded to justify reckless action. Blind confidence in numbers has the potential to transform a rubric of compassion and caution into a cost-benefit ledger sheet of risk vs. reward, inoculating us from the horror of death by cloaking it in relative and impersonal percentages.

I often think about my personal relationship to the the Covid-19 numbers, how they factor into my current state of mind. Normally, when I’m dealing with quantities, the smaller the number the more intimate, and the less abstract, it is. I’m able to see things few in number as themselves, not as quantities. “The death of one man is a tragedy; the death of a million men is a statistic,” as they say. But this pandemic is peculiar. As the numbers add up, and the death toll rises, I feel less distant from this. Instead I feel the pandemic closing in on me; the mounting case figures feels like I’m sinking into quicksand.

I also think about the political significance of the numbers. We’re over 135,000 deaths now and confirmed cases are once again rising. Who’s to blame for all this senseless death? The president? Governors? Mayors? Individual Americans? Will anyone be tried for crimes? I happen to believe the more we focus on the personal transgressions of individual Americans the more we are distracted from holding the true perpetrators—politicians—accountable. They must pay us to fight this thing, or else we won’t fight it, not because we are bad people but because without actual relief such as paid leave, unemployment benefits, healthcare, and mortgage/rent suspension, fighting the virus by staying home currently amounts to a different kind of death—financial ruin.

Finally, I think about the numbers yet to enter the equation. When there is a vaccine, a new number will emerge as the most salient: how many people get it and who. There won’t be enough for everyone, we know that much. And if I know America, someone will be turning a huge profit. I fear the calculus to come.

Numbers are prisms. Held right so the light hits at the correct angle and a whole spectrum becomes visible. Held wrong, or stowed away in darkness, and the prism is empty and blank, nothing inside. Numbers are better than nothing, and I believe all policy decisions going forward should draw on the best numbers we’ve got. Once we do get a handle on this thing, though, I hope we are careful not to confuse harmony with astronomy.

A theory of educational relativity

The theory of classical relativity understands all motion as relative. No object moves absolutely; an object moves relative only to the motion of other objects. The same can be said about much of learning and education: our educational growth is frequently measured relative to that of others–our classmates, coworkers, friends, family, and so on.

Relative educational growth recalls a concept central to test theory I’ve discussed before: norm referenced assessment. When norm referencing academic achievement, individual students are compared relative to one another and to overall group averages, which are other objects in motion. Norm referenced assessment differs from criterion referenced assessment, which measures individual growth relative to established objective criteria, stationary objects; that is, for criterion referencing, performance relative to peers doesn’t matter. Think of a driving test: you either pass or not, but passing doesn’t depend on you scoring better than your neighbor, but rather on you meeting the state-established criteria required for licensure.

As it were, I would argue most of educational assessment consists of broad norm referencing often masquerading as criterion referencing. As far as I’m concerned this is not really good or bad. “Masquerading” has negative connotations, of course, but I believe the masquerade is less deliberate than inevitable. Any teacher will tell you it’s really really hard not to compare their students to one another, even if subconsciously. Try reading a stack of 20 papers and not keeping an unofficial mental rank of performance.

Although norm referencing students relative to their peers’ class performance is somewhat inevitable, I think with careful attention (and a little statistics) our assessments can prioritize a superior norm referenced comparison than that of students to their peers: the comparison between the student and themselves.

Comparing a student’s performance to themselves recalls the familiar growth vs. proficiency debate in education circles, which our current Secretary of Education is infamously ignorant about. Basically, the argument is that schools should assess growth and not proficiency, since not all students are afforded the same resources and there is incredible variation in individual academic ability and talent. Because not all students start at the same place they therefore cannot all be expected to meet the same proficiency criteria. I agree. (Incidentally, this is why No Child Left Behind utterly failed, since it was predicated on the concept of all children meeting uniform proficiency criteria.)

One way to prioritize the assessment of growth over proficiency in a writing class is to use z-scores (a standardized unit of measurement) to measure how many standard deviations students are growing by during each assignment. Writing classes are particularly conducive to such measures since most writing assignments are naturally administered as “pre” and “post” tests, or, more commonly, rough and final drafts. Such an assignment design allows for growth to be easily captured, since a student provides two reference points for a teacher to assess.

By calculating the whole class’s mean score difference (μ) from rough to final draft, subtracting that number from an individual student’s rough and final draft score difference (x), and dividing by the standard deviation of the class score difference (σ), you obtain an individual z-score for each student, which tells you how many standard deviations their improvement (or decline) from rough to final draft represents.

Why do all this? Why not simply look at each student’s improvement from rough to final draft? Because we should expect some nominal amount of growth given the assignment design of a rough and final draft, so not all improvement could actually be improvement in such a context. Calculating a z-score controls for overall class growth, so a nominal improvement in scores from rough to final draft can be interpreted in the context of an expected, quantified amount.

To assess for an individual student’s growth relative to themselves, then, you can calculate these individual z-scores for each assignment and compare the z-scores for a single student across all assignments, regardless of differing assignment scales or values. This provides a simple way to look at a somewhat controlled (relative to observed class growth) measure of growth for an individual student relative to themselves over the course of the semester. In this way, we are better able to see more carefully the often imperceptible educational “motion” of our students relative to themselves and to peers.

If we transfer what we learn, then we need a map

This past weekend I traveled to the College English Association 2018 conference in St. Pete, Florida, to give a talk about “learning transfer.” Learning transfer, often simply “transfer” in education literature, is the idea that when we talk about education broadly and learning something specifically, what we really mean is the ability to transfer knowledge learned in one context (the classroom, e.g.) to another (the office, e.g.). It’s the idea that we have only really learned something when we can successfully move the learned knowledge into a new context.

As far as theories of learning go, I think transfer is fairly mainstream and intuitive. Of course, the particular metaphor of “transfer” has both affordances and limitations, as all metaphors do. Some critics offer “generalization” or “abstraction” as more appropriate metaphors, and there might be a case to be made for those. But as long as the theory of transfer prevails I think we first need to get some things straight. This is what my talk was about.

If learning is concerned with transferring, and thus moving, between two places like the classroom and the office, then we must have a map to help us navigate that move. In the humanities, the literature on transfer disproportionately focuses on the vehicle of transfer, not a map to guide us through landscape the vehicle traverses. The vehicle is of course literacy–reading and writing. We painstakingly focus on honing and developing the most perfect and best kinds of writing prompts and essay projects, as well as assigning the most thought-provoking reading material. And this is good. If we try to move, or transfer, from point A to B but our vehicle is broken or old or slow or unfit for the terrain, we can’t go anywhere. We like to say that our students don’t just learn to write, but they write to learn as well. Literacy is a vehicle for all knowledge. All disciplines understand this. It’s why we write papers in all our classes, not just English.

But no matter how luxurious our transfer vehicles (our writing assignments and reading requirements) are, if we don’t know where we’re going, how to get there, or what obstacles lie in our way, then it doesn’t much matter. So how do we map the abstract terrain of education and cognition? Here are two simple statistics that can help us: R-Squared and Effect Size, or Cohen’s d.

R-Squared

1_KwdVLH5e_P9h8hEzeIPnTg.png

The R-Squared statistic is commonly reported when using regression analysis. At its simplest, the R-Squared stat is a proportion (that is, a number between 0-1) that tells you how much of the variance, or change, in one quantitative variable can be explained by another. An example: suppose you’re a teacher who assigns four major papers each semester, and you’re interested in which of the essays can best predict, or explain, student performance in the class overall. For this, you’d regress student performances on one of the assignments on their final course grades. Some percentage of the variance in your Y variable (final course grades) will be explained by performance in your X variable (one of your assignments). This can give you a (rough) idea of which of your assignments most comprehensively tests the outcomes for which your whole course is designed to measure. (R-Squared proportions are often low but still instructive.)

If we extend the transfer/map metaphor, R-Squared is like the highways on a map–it can help us find more or less efficient routes, or combinations of routes, to get where we want to go.

Effect Size, or Cohen’s d

Cohen-s-D-large.png

Effect Size is a great statistic, because its units are standard deviations. This means Effect Sizes can be reported and compared across studies. Effect Sizes are thus often used in Meta-Analyses, one of the most powerful research techniques we have. At the risk of oversimplification, Effect Size is important because it takes statistical significance one step further. Many fret over whether a result in an experiment, like the test of an educational intervention, is statistically significant. This just means: if we observe a difference between the control and experimental group, what is the likelihood that the difference is simply due to chance? If the likelihood falls below a certain threshold (which we arbitrarily set, usually 5%, but that’s a different discussion), then we say the result is statistically significant and very likely real and not due to chance. However, some difference being statistically real doesn’t tell us much else about it, like how big of a difference it is. This is where Effect Sizes come in.

An Effect Size can tell us how much of a difference our intervention makes. An example: suppose you develop a new homework reading technique for your students and you test it against a control group, which will be based on performance on a later reading comprehension test. If the group means (experimental vs. control group) differ significantly, great! But don’t stop there. You can also calculate the Effect Size to see just how much they differ–and, as mentioned, Effect Size units are standard deviations! So you are able say something like: my new homework reading technique is so effective that a student testing at the 50th percentile in a different class will test about one standard deviation higher in my course, around the 84th percentile.

If we again extend the transfer/map metaphor, Effect Size helps us find the really good places to visit, or transfer to. It’s like a map of the best vacation spots. After all, most of us would rather visit the beach than North Dakota in the winter, so it’s good to know where’s worth going (for certain purposes).

Stats can help teachers

Basically, what I tried to argue in my talk is not that quantitative methods/analysis are superior, but that they can do a lot of cool things, and that they are particularly important for building maps. Maps are, after all, inherently approximate and generalized, just like grades and student performance on tests and any kind of quantitative measure are merely approximations. Maps are indeed limited in many ways: looking at a (Google street) map of Paris, for instance, obviously pales in comparison to staring up at the Palace of Versailles in person. But looking at a map beforehand can help you get around once you do visit. It’s the same with quantitative measures of learning. They can help us get a lay of the land, which then allows us to use our pedagogical experience and expertise to attend to the particular subtleties of each class and each student. Teachers are experimenters, after all, and each class, each semester, is a sample of students whose mean knowledge we hope is statistically significantly and sizably improved by the end of the course.