Prepare yourselves for the third episode of my podcast, Counterintuitive! If you’re up to it, join me for journeys into stories that are not what they seem to be. Episodes examine unusual events from a broad spectrum that will surprise you and then make you think. Because there’s always a layer beneath. You can find new episodes on Spotify, Apple Podcasts, Stitcher, TuneIn, YouTube, or SoundCloud. Below, you’ll find the transcript of the first episode with some references / further reading hyperlinks. The music for this episode comes from FreeSound, specifically these pieces:
This is Counterintuitive, the critical thinking podcast about things which are not what they seem to be. My name is Daniel Bojar and here we explore the hidden effects governing our world as well as the science behind them. This time, some of the elements of our story are inspired by a great blog post from Allen Downey over on his blog Probably Overthinking It and the usual slew of academic research, linked to in the show notes of this episode.
Join me on a journey. Imagine you’re a researcher investigating the educational effects of small class sizes for a project you’re working on. Now, before we start our project, stop this podcast and put ‘small class sizes’ into Google. What do you find? Unless the world dramatically changed since recording this podcast, you’ll find a veritable deluge of amazing benefits of smaller class sizes in schools. Teachers have more time for their students, there is less overall noise, and everyone can participate in class. Sounds great! Those are just a few examples among the flurry of advantages you can find online about small class sizes. The measure of shrinking classes is so beloved that esteemed charities such as the Bill & Melinda Gates Foundation at one point spent billions of dollars to achieve it. Billions, with a b. But, as a good scientist, we put aside those preconceived notions and delve into our experiment, trying to replicate the educational benefits of small class sizes.
As the school you will have a look at today as a researcher is located in another corner of the country, you decide to take a flight. In the morning, you try to catch the bus which goes to the airport and typically comes by every seven minutes. Inexplicably however, upon arriving you have to wait a full 15 minutes until the next bus appears. In the rain. Musing about your bad luck, you’re at least content that you planned for enough time at the airport. Arriving at the airport, things don’t look better: the queue for security control is the longest you’ve ever witnessed and then, worst of all, the plane itself is so overcrowded that a small sensation of claustrophobia ensues in you. At this point, you start questioning your life choices and resolve to send someone else to the next school. After an excruciating flight, you land in the city of your school of interest. Fast forward a while and you’re trying to hail a taxi to get to the school but, of course, there just doesn’t seem to be an available taxi in the whole city. Finally, reaching the school after an arduous odyssey slowly consuming your sanity, you start your actual research project.
Your study consists of two parts: first you’ll interview the school administration, including gathering data about academic performance, and then the students themselves to collect all the information you need.
Let’s pause for a brief detour here.
The data you will collect at the school will have to be eventually analysed with statistical methods. One major branch of statistics is referred to as inferential statistics, with another major branch being descriptive statistics. In inferential statistics, we try to draw conclusions about a certain population, even though we didn’t sample it but rather sampled a decidedly smaller subpopulation. If you, for instance, survey a certain amount of people which presidential candidate they would vote for, you don’t have to survey every citizen to reach a good estimate of the proportions. While a ‘survey’ of two of your best friends may not result in a representative sample, a random poll including about 1000 people will give you percentage estimates which are roughly applicable to the millions and millions of citizens living in a country. Notably, the small sample size isn’t the only issue in the best-friends-example. Even if you choose 500 of your friends, you may not reach a representative sample. The reason for this is that they’re not randomly selected and the condition to be part of this specific poll would be to be friends with you. Your friends tend to have similar political leanings as you and are unlikely to express very different political opinions. Thereby, your survey would severely underestimate the chances of the candidate of the opposing political party. In sampling, size matters and origin does too. As long as a sufficiently large number of randomly chosen people are involved, inferences from the group of people filling in the survey to the much larger group of people who didn’t (but are in principle similar to the first group) are possible. The larger the group of questioned citizens is, the more precise the estimate will be, as long as the condition of random selection holds true.
The trouble really starts when we start to interpret the biased poll among your friends, or a far too small poll among the general population. We might take a fluctuation, run with it, and postulate some plausible-sounding explanation. Because we’re really good at rationalizing. Consider an example by Howard Wainer and Harris L. Zwerling. Looking at counties in the US, they found that the rate of kidney cancer is lowest in mostly rural, sparsely populated counties which are located in traditionally Republican states in the Midwest and surrounding areas. So far, so good. Maybe you already started to spin a story in your head and tell yourself that the good country air or the family values promoted by the Republican party keep the people there healthy. Remember, we humans are devilishly good at these things.
Now, let’s hold our horses for a moment and ask the right question: How do counties look like which have the highest kidney cancer rate? They are mostly rural, sparsely populated counties which are located in traditionally Republican states in the Midwest and surrounding areas. Yes, you heard that correctly. In other words, these two categories look exactly the same. Suddenly, the explanation from before doesn’t seem to be so clever anymore.
The solution to this is the consideration of sample size. Smaller samples are less robust to outliers. A village might be, for instance, located next to a factory, increasing the concentration of carcinogens in the environment, or alternatively a village might be populated by a couple of families with exceptionally good genes, decreasing the rate of cancer. Small samples are bound to fluctuate considerably more than large samples. So you would expect the sparsely populated rural areas to provide us with more extreme rates (in both directions) than the more densely populated urban counties, making this result less surprising in hindsight. The disregard for this fundamental statistical concept running through all areas of our world is called ‘insensitivity to sample size’ and is also one of the cognitive biases popularized by behavioural economist Daniel Kahneman.
Here comes the interesting bit: if you present people with scenarios relying on the concept of sample size, their intuition (affected by insensitivity to sample size) leads them down the wrong path. While most people may be excused (because how often do you really think about abstract statistical concepts during your normal day?), the same effect can be observed with professionals such as statistics professors, whose job it is to apply statistics every day. Even more interestingly, if you mention the fact that this scenario is related to statistics to these professionals, they get the answer right! Don’t get me wrong, their understanding of statistics is obviously correct. It just so happens that (as long as their statistics knowledge isn’t triggered) they apply their normal you-and-me intuition to the problem at hand. Their hard-won statistical insights may be integrated in their professional intuition but in their everyday intuition, they’re still endowed with the evolutionary baggage we all carry. Which has the unfortunate consequence that you will run into this bias again and again, even though you abstractly know about it. Insensitivity to sample size is a stable counterintuitive which persists after explanation, similar to an optical illusion. You know the squares are the same colour, but your visual cortex just doesn’t seem to agree.
So, as we have seen in the kidney cancer example, small sample sizes can quickly lead to extreme results. Our inherent desire to read meaning into randomness (especially into large random fluctuations) then rewards us with a narrative of how this extreme result came into being. You may have guessed it already but here’s the one-million-dollar question: what do ‘small samples’ and ‘small school classes’ have in common? Yeah, exactly.
Having substantially fewer students per class, these small classes also have a notedly higher variance in their performance. As we’ve seen earlier, you would naturally expect a higher frequency of extreme events in these classes (extremely good as well as extremely bad, in terms of academic performance). This is exactly what you also see in practice. Imagine if you have four geniuses in a class of 15 and how that would impact the average performance of the whole class. It might seem counterintuitive to you, but smaller classes don’t necessarily pose an advantage. This problem of only noticing the positive fluctuations (and not realizing that the connection of small class size and academic performance is caused by increased fluctuations) is compounded once we celebrate and reward small classes for their performance. Because by then, we effectively reward random variation in small sample sizes. As demonstrated by this, you can even uncover a counterintuitive from an intuitive, no-brainer statement like that small classes are good. The mere act of questioning its validity would have been sufficient to discover the falsehood of the advantages of small class sizes, yet sometimes we’re simply strangled by the chains of convention and groupthink. I think we’re also drawn toward the idyllic, bucolic nature of small groups. We link it to contemplation and connection, especially in contrast to the bland mass production of the industrial age which we associate with large groups and which we have learnt to sneer at.
After establishing that there is no clear effect of small class sizes on academic performance, we can now speculate briefly why this is the case. Why aren’t smaller classes better for students? One of the chief arguments in favour of small classes is that teachers spend more time on each student. But the notion that teachers have more time per student only holds true in practice if teachers then actually take advantage of this. The reduction of class size isn’t magically coupled to a behavioural change in teachers, they might simply behave the same as before. Another argument is that everyone, even shy students, has a better chance to contribute to class. While all students may be able to participate in class, everyone also has fewer co-students who could explain something to them that they don’t understand. So the case for smaller classes was never rock-solid from the start and, as we’ve seen, isn’t backed by the data.
In principle, the small class myth is a transient counterintuitive and once you understand the sample size problematic the issue should become clear in the future. Yet in practice this misconception is a bit more tenacious, as statistical principles are hard for us to grasp and also, in this case, because there still could be a case to be made. Some might argue that, while it’s true that very small classes indeed suffer from large performance fluctuations, it still could be beneficial to shrink very large classes to medium-sized classes (in a way re-defining the term ‘small classes’). Though while this may be true and worthwhile for abnormally large classes (outliers in the class size distribution), meta-analyses found no general correlation between decreasing class size and academic performance. What you can find is a strong correlation of decreasing class size with increasing variance in performance due to small sample sizes and a high cost to decrease class size. We should also not forget that any issue involving children immediately activates emotional responses as well, which complicates our task of calling out this counterintuitive. The previously advertised merits of small classes aside, even the Bill & Melinda Gates Foundation has since rescinded their support for this program. After spending over $2 billion on these programs, no notable jump in academic performance was visible.
If you thought the story ends here, you haven’t been paying attention. Think back. What did you miss? We started our journey with…a journey. A journey to a school, where you wanted to gather data. And you did. We just never picked up on that strand again. So let’s do that now. You sit down with your data and start to do preliminary analyses. If you want to investigate class size effects, it would be best to determine average class sizes first. You asked the school administration about the average class size and a sizable number of randomly selected students about the size of their class, so this should be really easy!
The answer of the school administration was 21 students per class on average (which is roughly the average for New York state-based primary schools). Fair enough. Yet when you calculate the mean value for the answers you’ve got from the students, you get a different answer: 28 students per class. Now, an easy and obvious explanation for this discrepancy might be that the children simply misremembered the exact size of their class (or maybe they were even intentionally fooling you). Or maybe the school administration wanted to make their classes look smaller than they are, to get some of that glamour of small classes the media is going on about. Yet what if I told you that we can get the exact same result even if both sides, school administration and children alike, have been perfectly honest about their class sizes? This is what’s known as the inspection paradox.
The inspection paradox belongs to the complicated mathematical field of renewal theory (a subset of probability theory). A consequence of renewal theory is that we tend to oversample large intervals or groups if we choose randomly, just because they’re more likely to be chosen. Remember the long waiting time for the bus on your way to the school? If we assume that you arrive at a random interval between buses (which is crucial, as an arrival during rush hour traffic would also elongate your waiting time), which interval would be the most likely one for you to arrive at? Based on pure probability, the longest interval offers most chances. Therefore, on average, you’re more than twice as likely to arrive in the long 15 minutes interval than in the more typical seven minutes interval. Even worse, your chances of catching a shorter interval than the average interval are even lower than that.
When you asked the school administration about their average class size, they simply added up all the students of all classes and divided that number through the number of classes, a standard procedure to arrive at an average. However, if you randomly interview students, you’re more likely to get students from a large class, as there are simply more of them per class by definition. As a consequence, you’re oversampling children from large classes and get the impression that classes are far larger than the average class size indicates. Incidentally, while the average given by the school administration may be more representative of the class size distribution, the average gained by the student interviews more closely resembles what the average child experiences in their daily life. As there are many children in large classes, many children will feel the overcrowded atmosphere.
Oh, we’re not done here. Not yet. Every cause of annoyance detailed in your journey to the school showcases the inspection paradox. Next to the already mentioned bus example, we have a cascade of events we think of as just-our-luck: the excruciatingly long wait at airport security (long queues equal a lot of people who stand in line and experience waiting for a long time), the impression that airplanes are always overcrowded (because if they’re nearly empty on a flight, there would be hardly anyone to notice this fortuitous circumstance) and, the last drop in the bucket, that there never seems to be a free taxi (if all taxis are booked then by definition there are many people looking for a taxi, and many of them will be frustrated in their fruitless search). The inspection paradox is ubiquitous in our society and is one of the explanations why we see the world and our lives as dimmer than they actually are. Since we don’t take this oversampling of large groups into account when we intuitively think about such situations, they strike us as counterintuitive. Despite its pervasiveness, the inspection paradox is oftentimes ignored and even though an explanation reveals the somewhat mundane nature of this counterintuitive it has a certain stability, in that it’s hard to rid the intuition from the inspection paradox. Sample size matters, both in extent as well as in how you account for the sizes of groups in your sample. Class size, on the other hand, unfortunately doesn’t seem to be the golden lever we had hoped for in education.
I hope you’ve enjoyed this instalment of Counterintuitive! If you did, join me next time where we’ll talk about clumsy mountain goats and lazy ants. You can find references and further reading for this episode in the show notes. If you like Counterintuitive, please recommend it to your friends and give it a 5-star rating on Apple Podcasts, Spotify, or wherever you get your podcast from. It really helps. A new episode will be uploaded every two weeks. My name is Daniel Bojar and you’ve listened to Counterintuitive, the critical thinking podcast about things which are not what they seem to be. You can follow me on Twitter at @daniel_bojar or on my website dbojar.com, where you will find articles about more counterintuitive phenomena. Until next time!