The surprising power of n=2

We are enmeshed in data every day. It shapes our decisions, informs our perspectives, and drives much of modern life. Often, we wish for more data; rarely do we wish for less.

Yet there are moments when all we have is a single datapoint. And what can we do with just one? One datapoint offers almost nothing. It is isolated, contextless, and inert—a fragment of information without relationship or meaning. One datapoint might as well be no datapoint.

But two datapoints? That’s transformative. Moving from one to two is not just an incremental improvement; it is a fundamental shift. Your dataset has doubled in size, a 100% increase. More importantly, with two datapoints, you can begin to make connections. You can compare and combine, correlate and coordinate.

From Isolation to Interaction

Consider the possibilities unlocked by having two datapoints rather than one. A single name—first or last—is practically useless; it cannot identify a person. But a full name—two datapoints—suddenly carries weight. It situates someone in a specific context, distinguishing them from others and enabling meaningful identification.

The same holds true for testimony. A single witness to a crime might not provide enough perspective to reconstruct what happened. Their account could be unreliable, incomplete, or subjective. But with two witnesses, we gain a second perspective. Their testimonies can corroborate or contradict each other, offering a deeper understanding of the event.

Or think about computation. A solitary binary digit—0 or 1—cannot do much. It is a static state. But introduce a second binary digit, and the world changes. With two bits, you unlock four possible combinations (00, 01, 10, 11), the foundation of all logical computation. Every computer, no matter how powerful, builds its intricate systems of thought from this basic doubling.

The Exponential Power of Pairing

Why is the shift from one to two so significant? It is not simply the doubling of data, but the transition from isolation to interaction. A single datapoint cannot create relationships, patterns, or meaning. It is static. Two datapoints, however, introduce dynamics. They allow for comparison and combination, for movement between states, for a framework within which meaning can emerge.

This leap—from one to two—is the smallest step toward creating systems of knowledge. Science relies on comparisons to establish causality. A single experimental result is meaningless without a control group to measure it against. Literature and language depend on dualities—protagonist and antagonist, question and answer, speaker and audience. Even human vision is based on the comparison of binocular inputs, it is our two eyes that allow us to see depth.

AI and the Power of Two

The transformative power of n=2 is most recently demonstrated in the operation of generative AI. At its core, generative AI depends on the interaction of two distinct but interdependent datasets: the training data and the user’s prompt. The training data serves as the foundation—a vast repository of language patterns, structures, and examples amassed from diverse sources. This data alone, however, is inert; it is an immense collection of information without activation or direction. Similarly, a prompt—a fragment of input text provided by a user—is meaningless without context. It is a solitary datapoint, incapable of producing anything on its own.

When these two datasets combine, however, the true power of AI is unlocked. The training data provides a rich, multidimensional context, while the prompt activates specific pathways within that context, directing the AI to generate meaningful output. This dynamic interaction transforms static data into a creative process. Much like the leap from one to two datapoints, the relationship between the training data and the prompt enables the emergence of patterns, coherence, and utility. Without the prompt, the AI remains silent; without the training data, the prompt is purposeless. Together, they form a system capable of producing complex and contextually relevant language.

This relationship between training data and prompts underscores the profound significance of pairing, the power of n=2. The interaction between these two elements mirrors a broader principle: meaning arises not from isolation, but from connection. Just as two witnesses can construct a fuller account of an event, and two binary digits can enable computation, the union of training data and prompts enables AI to simulate human-like language and reasoning, creating systems that are both dynamic and generative. The leap from one to two here is not just a quantitative doubling—it is a qualitative transformation that makes the impossible possible.

Building Toward Complexity

Two is not the end point; it is the beginning. Once we have two datapoints, we can imagine three, then four, and so on, building increasingly complex systems. But we should not overlook the profound importance of the leap from one to two. It is the first and most crucial step toward understanding—toward the ability to identify patterns, make connections, and draw conclusions.

N=2 is the minimum threshold for meaning, the simplest structure capable of supporting complexity. From two datapoints, entire worlds of logic, creativity, and understanding can unfold.

A theory of educational relativity

The theory of classical relativity understands all motion as relative. No object moves absolutely; an object moves relative only to the motion of other objects. The same can be said about much of learning and education: our educational growth is frequently measured relative to that of others–our classmates, coworkers, friends, family, and so on.

Relative educational growth recalls a concept central to test theory I’ve discussed before: norm referenced assessment. When norm referencing academic achievement, individual students are compared relative to one another and to overall group averages, which are other objects in motion. Norm referenced assessment differs from criterion referenced assessment, which measures individual growth relative to established objective criteria, stationary objects; that is, for criterion referencing, performance relative to peers doesn’t matter. Think of a driving test: you either pass or not, but passing doesn’t depend on you scoring better than your neighbor, but rather on you meeting the state-established criteria required for licensure.

As it were, I would argue most of educational assessment consists of broad norm referencing often masquerading as criterion referencing. As far as I’m concerned this is not really good or bad. “Masquerading” has negative connotations, of course, but I believe the masquerade is less deliberate than inevitable. Any teacher will tell you it’s really really hard not to compare their students to one another, even if subconsciously. Try reading a stack of 20 papers and not keeping an unofficial mental rank of performance.

Although norm referencing students relative to their peers’ class performance is somewhat inevitable, I think with careful attention (and a little statistics) our assessments can prioritize a superior norm referenced comparison than that of students to their peers: the comparison between the student and themselves.

Comparing a student’s performance to themselves recalls the familiar growth vs. proficiency debate in education circles, which our current Secretary of Education is infamously ignorant about. Basically, the argument is that schools should assess growth and not proficiency, since not all students are afforded the same resources and there is incredible variation in individual academic ability and talent. Because not all students start at the same place they therefore cannot all be expected to meet the same proficiency criteria. I agree. (Incidentally, this is why No Child Left Behind utterly failed, since it was predicated on the concept of all children meeting uniform proficiency criteria.)

One way to prioritize the assessment of growth over proficiency in a writing class is to use z-scores (a standardized unit of measurement) to measure how many standard deviations students are growing by during each assignment. Writing classes are particularly conducive to such measures since most writing assignments are naturally administered as “pre” and “post” tests, or, more commonly, rough and final drafts. Such an assignment design allows for growth to be easily captured, since a student provides two reference points for a teacher to assess.

By calculating the whole class’s mean score difference (μ) from rough to final draft, subtracting that number from an individual student’s rough and final draft score difference (x), and dividing by the standard deviation of the class score difference (σ), you obtain an individual z-score for each student, which tells you how many standard deviations their improvement (or decline) from rough to final draft represents.

Why do all this? Why not simply look at each student’s improvement from rough to final draft? Because we should expect some nominal amount of growth given the assignment design of a rough and final draft, so not all improvement could actually be improvement in such a context. Calculating a z-score controls for overall class growth, so a nominal improvement in scores from rough to final draft can be interpreted in the context of an expected, quantified amount.

To assess for an individual student’s growth relative to themselves, then, you can calculate these individual z-scores for each assignment and compare the z-scores for a single student across all assignments, regardless of differing assignment scales or values. This provides a simple way to look at a somewhat controlled (relative to observed class growth) measure of growth for an individual student relative to themselves over the course of the semester. In this way, we are better able to see more carefully the often imperceptible educational “motion” of our students relative to themselves and to peers.

If we transfer what we learn, then we need a map

This past weekend I traveled to the College English Association 2018 conference in St. Pete, Florida, to give a talk about “learning transfer.” Learning transfer, often simply “transfer” in education literature, is the idea that when we talk about education broadly and learning something specifically, what we really mean is the ability to transfer knowledge learned in one context (the classroom, e.g.) to another (the office, e.g.). It’s the idea that we have only really learned something when we can successfully move the learned knowledge into a new context.

As far as theories of learning go, I think transfer is fairly mainstream and intuitive. Of course, the particular metaphor of “transfer” has both affordances and limitations, as all metaphors do. Some critics offer “generalization” or “abstraction” as more appropriate metaphors, and there might be a case to be made for those. But as long as the theory of transfer prevails I think we first need to get some things straight. This is what my talk was about.

If learning is concerned with transferring, and thus moving, between two places like the classroom and the office, then we must have a map to help us navigate that move. In the humanities, the literature on transfer disproportionately focuses on the vehicle of transfer, not a map to guide us through landscape the vehicle traverses. The vehicle is of course literacy–reading and writing. We painstakingly focus on honing and developing the most perfect and best kinds of writing prompts and essay projects, as well as assigning the most thought-provoking reading material. And this is good. If we try to move, or transfer, from point A to B but our vehicle is broken or old or slow or unfit for the terrain, we can’t go anywhere. We like to say that our students don’t just learn to write, but they write to learn as well. Literacy is a vehicle for all knowledge. All disciplines understand this. It’s why we write papers in all our classes, not just English.

But no matter how luxurious our transfer vehicles (our writing assignments and reading requirements) are, if we don’t know where we’re going, how to get there, or what obstacles lie in our way, then it doesn’t much matter. So how do we map the abstract terrain of education and cognition? Here are two simple statistics that can help us: R-Squared and Effect Size, or Cohen’s d.

R-Squared

1_KwdVLH5e_P9h8hEzeIPnTg.png

The R-Squared statistic is commonly reported when using regression analysis. At its simplest, the R-Squared stat is a proportion (that is, a number between 0-1) that tells you how much of the variance, or change, in one quantitative variable can be explained by another. An example: suppose you’re a teacher who assigns four major papers each semester, and you’re interested in which of the essays can best predict, or explain, student performance in the class overall. For this, you’d regress student performances on one of the assignments on their final course grades. Some percentage of the variance in your Y variable (final course grades) will be explained by performance in your X variable (one of your assignments). This can give you a (rough) idea of which of your assignments most comprehensively tests the outcomes for which your whole course is designed to measure. (R-Squared proportions are often low but still instructive.)

If we extend the transfer/map metaphor, R-Squared is like the highways on a map–it can help us find more or less efficient routes, or combinations of routes, to get where we want to go.

Effect Size, or Cohen’s d

Cohen-s-D-large.png

Effect Size is a great statistic, because its units are standard deviations. This means Effect Sizes can be reported and compared across studies. Effect Sizes are thus often used in Meta-Analyses, one of the most powerful research techniques we have. At the risk of oversimplification, Effect Size is important because it takes statistical significance one step further. Many fret over whether a result in an experiment, like the test of an educational intervention, is statistically significant. This just means: if we observe a difference between the control and experimental group, what is the likelihood that the difference is simply due to chance? If the likelihood falls below a certain threshold (which we arbitrarily set, usually 5%, but that’s a different discussion), then we say the result is statistically significant and very likely real and not due to chance. However, some difference being statistically real doesn’t tell us much else about it, like how big of a difference it is. This is where Effect Sizes come in.

An Effect Size can tell us how much of a difference our intervention makes. An example: suppose you develop a new homework reading technique for your students and you test it against a control group, which will be based on performance on a later reading comprehension test. If the group means (experimental vs. control group) differ significantly, great! But don’t stop there. You can also calculate the Effect Size to see just how much they differ–and, as mentioned, Effect Size units are standard deviations! So you are able say something like: my new homework reading technique is so effective that a student testing at the 50th percentile in a different class will test about one standard deviation higher in my course, around the 84th percentile.

If we again extend the transfer/map metaphor, Effect Size helps us find the really good places to visit, or transfer to. It’s like a map of the best vacation spots. After all, most of us would rather visit the beach than North Dakota in the winter, so it’s good to know where’s worth going (for certain purposes).

Stats can help teachers

Basically, what I tried to argue in my talk is not that quantitative methods/analysis are superior, but that they can do a lot of cool things, and that they are particularly important for building maps. Maps are, after all, inherently approximate and generalized, just like grades and student performance on tests and any kind of quantitative measure are merely approximations. Maps are indeed limited in many ways: looking at a (Google street) map of Paris, for instance, obviously pales in comparison to staring up at the Palace of Versailles in person. But looking at a map beforehand can help you get around once you do visit. It’s the same with quantitative measures of learning. They can help us get a lay of the land, which then allows us to use our pedagogical experience and expertise to attend to the particular subtleties of each class and each student. Teachers are experimenters, after all, and each class, each semester, is a sample of students whose mean knowledge we hope is significantly and sizably improved by the end of the course.