Not sure how long it will last, but probably has something to do with me finally fixing the WordPress Visual editor (thanks to here) and it being post-MPSA, and yesterday everything cancelled so time to catch up on things.
In spring 2012 I was lucky to serve as teaching assistant for an undergraduate research methods course in the MIT political science department. Like many such courses, we give a broad overview of linear regression, internal and external validity, and the difference between observational and experimental studies. Whenever you read an observational study, we taught students, you should always be thinking about two possible alternative explanations for a statistical relationship: an omitted third variable, or reverse causality.
I’m realizing that we probably left out an important third explanation: the relationship is an artifact, either a technical error or, perhaps, willful manipulation of the data. (Well, we did always tell students to always look at their own data, but we probably could have emphasized this more in how they evaluate research by others.) Recent events make me think that technical errors are probably more common than I previously thought.
The particular event I have in mind is the Reinhart-Rogoff fiasco. The (perhaps overblown — the 2010s are still young after all) money quote from one observer of the situation:
one of the core empirical points providing the intellectual foundation for the global move to austerity in the early 2010s was based on someone accidentally not updating a row formula in Excel.
But this follow-up on Andrew Gelman’s blog, which details all the seemingly common and easy-to-make mistakes that occur even using more sophisticated statistical software. I have to say I think point #5 at that post is the best, and again is actually just another version of the advice we gave to our undergraduates: always show your data!
Anyway, see this post by another UMass Amherst economist for what seems like a terrific explanation of why reverse causality is actually the more serious problem for Rogoff-Reinhart.
Recently on the Freakonomics blog, economist Steven Levitt wrote:
I have spent the last 20+ years of my life doing academic research and popular writing on economics. I’ve been lucky, and my work has gotten a lot of exposure. I certainly have had a lot of fun along the way.
But, I think I can honestly say that no government has ever changed a law or a public policy as a result of my work. Sometimes politicians cite my research in pushing an agenda but having talked to these politicians, it is clear they had the agenda first, and then they went looking for research – any research – that would support their position. When I’ve taken unpopular stances (like saying children’s car seats don’t work well), there has never been even a sliver of political movement on the issue.
I found this depressing, because while it’s slightly tongue-in-cheek, I take Levitt to be serious. From what I know, Levitt is one of the most publicly engaged economists, co-author of the best-selling Freakonomics and many interesting and technically sophisticated papers on important public policy questions, and now a frequent blogger and contributor to NPR. So if anyone would be expected to have an influence on public policy, it would be Levitt. It’s also my impression that economics as a field is the most influential in policy circles. So taking Levitt as the “most-likely” case of an academic producing research that affects policy, we might infer then that his lack of success is generalizable to all academics.
So why is non-influence depressing? For one, I think many social scientists, particularly political scientists, are driven by a (perhaps naive) desire that their research can make the world a better place. And even if you aren’t, basically every paper in political science, no matter how obscure the subject matter or how technical the methodology, ends with a series of implications for public policy. This is for good reason, because it’s hard to be interested in a paper if these implications are lacking. But if no one is ever going to take these implications seriously, then what is the point of them?
There is another interpretation of Levitt’s comment, but it doesn’t seem any less pessimistic to me. Note Levitt doesn’t say no one is listening to him, but that those who do “had the agenda first, and then they went looking for research — any research — that would support their position.” Which brings us to Reinhart and Rogoff (some background, and a terrific replication, here). These famous Harvard economists published a paper in the top economics journal in 2010 arguing that (1) there was a relationship between national debt and economic growth such that high debt led to slower growth, and (2) that there was a certain threshold of debt above which things got really bad for countries in terms of their growth. Then recently, three UMass Amherst economists tried to replicate the analysis and found they couldn’t, finally realizing that the original paper’s conclusions were driven by technical errors (more on that in another post).
Sociologist Peter Frase finds the ensuing discussion in the “economics blogosphere” to be missing the point: the problem is not the technical errors per se, he seems to be saying, but something similar to what Levitt pointed out: the function of research to satisfy pre-existing political arguments:
The reaction of the left-wing peanut gallery (at least to judge by my Twitter feed) has been to ridicule liberals for caring about this at all. Obsessing over the analytical missteps in this paper reeks of the preoccupation with having correct and empirically supported arguments, while ignoring the importance of power and ideology. For while this new critique of Reinhart-Rogoff just now became possible because they finally made their original data available, plenty of people pointed out earlier that the whole analysis rested on shaky conceptual foundations. It used a correlation to assert that high debt to GDP ratios lead to slower growth, ignoring the much more plausible theory that the causal order was the opposite, with slow growth leading to increasing debt loads. If the political elite in Washington failed to heed these criticisms, it wasn’t because they were unaware of them, but because the claim that debt leads to slow growth fit a deficit hysteria that was already entrenched. In other words, Reinhart-Rogoff was being used as rhetorical cover for a pre-existing position, not as an actual empirical aid to decision-making.
Frase’s post is focused on railing against “wonks” — policy-focused journalists / bloggers who specialize in translating (or in his view, sometimes simply parroting) policy-relevant social science research to readers. But his general point about “power and ideology” is interesting. I think what he’s saying is that the poor quality of the underlying research was discounted because it served an ideological purpose. Presumably, high-quality research that made the opposite argument would be ignored, under Frase’s implicit model of policy advice.
Of course, there could be other possible avenues of “making a difference.” One is communicating research results to organizations outside of government. Unfortunately, the type of confirmation biases Levitt and Frase discuss also appear to operate there as well, as another Freakonomics post recently described.
Note: Many of the links in this article were found via this Monkey Cage post by John Sides.
I haven’t been reading the papers in a few months, so am short on material.
I still Rick Hasen’s Election Law Blog, however, which is always chalk full of interesting policy and causal questions. In the wake of the 2012 election, and the president’s apparently off-handed comment about long lines at the polling places, there is apparently some very tiny rumblings about nationalizing election administration in the U.S. For example, Hasen himself argued in the NYT’s “Room for Debate” forum that elections should be nationalized. In response, Doug Chapin gave some reasons why not, citing both the (apparently normative) virtues of federalism and the perceived gridlock and incompetence of the feds right now.
Slightly more interesting is why or why not reforming election administration is “a thing” or not, in Chapin’s language. That is, why is it so remote from the policy agenda? It’s easy to say that it’s a non-starter and that state and local governments really don’t want to centralize, but that seems like begging the question. Indeed it’s even more mysterious when (to crib another item from Hasen’s blog) a poll finds 88% of Americans supporting a uniform system!
Perhaps it’s not hard to dig up other items where the public seems to speak with such a loud voice, and in contrast to what policy is or what elites think it should be. But the fact is that usually we laud correlations between mass opinion and public policy as a good thing, and decry low or negative correlations as a bad thing, for democracy. So it would seem inconsistent to just write this off as an anomaly.
We compared three states that substantially expanded adult Medicaid eligibility since 2000 (New York, Maine, and Arizona) with neighboring states without expansions. The sample consisted of adults between the ages of 20 and 64 years who were observed 5 years before and after the expansions, from 1997 through 2007. The primary outcome was all-cause county-level mortality among 68,012 year- and county-specific observations in the Compressed Mortality File of the Centers for Disease Control and Prevention. Secondary outcomes were rates of insurance coverage, delayed care because of costs, and self-reported health among 169,124 persons in the Current Population Survey and 192,148 persons in the Behavioral Risk Factor Surveillance System.
The Times provides some context as to why it is seen as “controversial” whether Medicaid expansions positively impact health outcomes:
Medicaid expansions are controversial, not just because they cost states money, but also because some critics, primarily conservatives, contend the program does not improve the health of recipients and may even be associated with worse health. Attempts to research that issue have encountered the vexing problem of how to compare people who sign up for Medicaid with those who are eligible but remain uninsured. People who choose to enroll may be sicker, or they may be healthier and simply be more motivated to see doctors.
See also earlier post.
Without the transparency offered by the Disclose Act of 2012, we fear long-term consequences that will hurt our democracy profoundly. We’re already seeing too many of our former colleagues leaving public office because the partisanship has become stifling and toxic. If campaigning for office continues to be so heavily affected by anonymous out-of-district influences running negative advertising, we fear even more incumbents will decline to run and many of our most capable potential leaders will shy away from elective office.
I suppose it makes sense to argue this if you are speaking to senators’ self
Interest, but it might not look great to voters.
This New York Times article today explains how states can respond to the recession by either raising taxes or cutting spending. They pick Maryland and Kansas as exemplars of each respective strategy. They assert that there is great confusion as to how well each strategy works. Kansas Governor Sam Brownback apparently did some of his own data analysis:
Gov. Sam Brownback of Kansas, who sought the Republican nomination for president four years ago, said he was persuaded that his state needed to cut its income taxes and taxes on small businesses significantly when he studied data from the Internal Revenue Service that showed that Kansas was losing residents to states with lower taxes.
Another interesting quote:
The effects of state taxes are hotly debated. This spring, when the George W. Bush Institute held a conference in New York on how to promote economic growth, panelist after panelist asserted that cutting state taxes would jolt the economy; Governor Brownback told the conference that his small-business tax cuts would be “like shooting adrenaline into the heart of growing the economy.”
But the Institute on Taxation and Economic Policy, a nonprofit research organization in Washington associated with Citizens for Tax Justice, which advocates a more progressive tax code, issued a report this year that found that the states with high income tax rates had outperformed those with no income tax over the past decade when it came to economic growth per capita and median family income.
The choices made by Kansas and Maryland could provide something of a real-time test of the prevailing political theories of taxing and spending — though it could be years before the results are in.
Interesting New York Times article. Great metaphor for science.
A couple weeks ago political scientist Jacqueline Stevens attacked her own discipline on the opinion pages of the New York Times. Her critique, as portrayed in the piece’s title, was that political scientists are “lousy forecasters.”
Many political scientists have responded to Stevens’ piece, from different angles and levels of rage, but one theme I’ve noticed is people disputing the assumption that political science’s goal is to make forecasts. For example, this recent letter to the editor of the Times by a professor emeritus of political science at the University of Iowa:
Forecasting is a very specialized field in political science, limited to predicting election outcomes, and the record in that field is impressive. But that is not what most political scientists do.
Political science, like most social sciences, seeks explanation rather than prediction. Its aim is to explain puzzling phenomena by relating them to phenomena that are well understood. Much of science consists of trying to resolve puzzles in that way.
I actually think prediction is what the majority of political scientists do. The distinction is that Stevens focuses on a couple of extraordinary historical cases, whereas most political science predictions involve more specific data sets. For example, theory might say that when we apply experimental treatment X, Y will increase or decrease. Or, someone might claim that Republicans give more to charity than Democrats, which is a prediction about what a data set of party identification and giving behavior would reveal.
It’s hard for me, in contrast, to think of a political science example of explanation that is divorced from prediction. I suppose the idea is to find a historical case–let’s take Stevens’ case of the end of the Cold War–and try to predict it retrospectively. But then why wouldn’t the prediction there apply to any future situations–why wouldn’t it also be a forecast? If the idea is that every historical case has to be considered on its own and that as a result their explanations tells us nothing about the future, I don’t know if that is science at all.
With a caveat that this is all filtered through an NPR show, I found this interview on the Higgs Boson `quasi-discovery’ relevant to current discussions of what’s supposedly wrong with social science.
I transcribed parts of the interview, which took place between WAMU host Celeste Headlee and Scientific American associate editor John Matson.
Headlee: Scientists needed to find out, whether or not the standard theory held water: why does matter hold mass? And they may have gotten their proof. I say may because we don’t know exactly what they’ve found yet, other than that it’s a new subatomic particle. Remember last year scientists at CERN claimed they had discovered particles that were faster than the speed of light. You remember that? Then they had to retract that claim. So the burden of proof when you discover something new…the standard is pretty high. The folks at today’s announcement needed to be certain if they had something…
Tell me about their certainty here. What standard did they use?
Matson: Sure. So the standard, uh sort of physics measure is sigma, or standard deviations. So they say if they have 3 sigma evidence that’s good for evidence of a new factor a new particle. 5 sigma is a discovery. And 5 sigma relates to a 1 in 3 and a 1/2 million chance that it’s just a statistical fluke, you know you’re just seeing some noise that looks like something real. And in this case they made it to that, they made it to 5 sigma, so this is certainly a very strong effect that they’re seeing.
Headlee: [...] So it’s like a 1 in 3.5 million chance it’s _not_ a new particle, right?
Matson: Well assuming that they’ve gotten everything correct, and that is where the faster than light particle finding comes in. That was a very high sigma effect, but wrong in other ways. So there’s always a chance there could be something funny going on, but in this case with this particle that looks like it could be the Higgs, that’s probably not going to happen, some mundane explanation, because there are two different experiments that are seeing what looks like the same thing, it’s similar to what’s been predicted for decades, so everything sort of rings true here, whereas with the faster than light neutrino thing last year, that sort of came out of no where, there was really no other experiment that supported that, and it went against decades and decades of scientific findings that said this shouldn’t be possible.
Here’s what caught my attention in particular.
- Arbitrary standards of significance. In social science we have p-values, and apparently in physics they have sigmas, but they both contain the same information: how likely is it that what we’ve found is a result of random chance? Matson states that 5 sigma is the standard for a “discovery” in physics, where a discovery is not quite going the whole way. And he says that the researchers just barely made it to 5. But in fact, a researcher is quoted in the piece as saying (apparently at a press conference), “We conclude by saying that we have observed a new boson with a mass of 125.3, plus or minus .6 GV, at 4.9 standard deviations.”
- Significance is only meaningful when combined with assumptions. A precisely estimated point estimate is meaningless if it is biased. As Matson is quick to point out, the significance comes from a null hypothesis that is based on model assumptions: it’s a 1 in 3.5 million chance they are wrong, “assuming they’ve gotten everything correct.”
- Fragility of results. I like how the story keeps circling back to the “faster than light neutrino” finding and retraction from last year. It’s instructive how they use it as a baseline: we had a huge sigma there, but it was later retracted, so how do we avoid being fooled again? The two pieces of evidence they give are (1) replication–this finding comes from not one but two experiments, and (2) theory–the previous finding contradicted decades of other findings, but this one is consistent with decades of theory. While it makes sense to believe things that are consistent with other pieces of evidence–whether another experiment, other findings, or other theory–you can’t help but worry that this type of thinking causes us to reject useful information as well.
- Theory testing. The researchers are motivated by the desire to test (implications of) theories. Some say that social science shouldn’t bother with this, but instead focus on “thinking deeply about what prompts human beings to behave the way they do.” Substitute “particles” for “human beings”, and this sort of advice would have prevented such an announcement. I also see an affinity with the CERN director quoted in the story as saying “To know that our maths, our equations, all our Greek symbols tell us some deep truth about everything and what everything is made of, and to have that verified with the discovery of the Higgs – that is one of the great, great moments in science.”