A wise man once said that even causal inference involves description, since ultimately what one is doing is describing patterns in a data set. Indeed without description it is hard to know what the important questions are. For example, suppose we want to know whether having a black president makes the media pay less attention to the black unemployed. Before we even think about the design, how would we go about measuring media attention?
This is exactly what Washington Post blogger Eric Wemple struggles with in this post.
Wemple describes two experts who ask this question and reach opposite conclusions. Here is all the information given about how the conclusions are reached, from only one side:
I had a tough time finding good data specifically on coverage of issues such as black unemployment — as opposed to, say, unemployment or the economy more broadly. So I spent a lot of time talking to journalists, and others who closely follow and speak to issues of concern in the African American community, about whether they’ve noticed a significant change in national media coverage since the election of the nation’s first black president.
Yikes. You mean even with Google Trends, the Project for Excellence in Journalism, and scores of newspaper databases online, the best we can do to quantify media attention is to talk to some activists?
In fact I’ve run into this problem before, when as an RA this summer I was asked to see how much different policy options had been discussed, relative to one another, in the press over the past few decades. There was no apparent off the shelf solution and it wasn’t crucial enough that I should code something up myself. Here’s what I ended up doing:
1. Do a lexis nexis search for the keyword of interest, restricting sources to the new york times
2. Exported the search results to endnote, an online citation service. Since it’s a citation service all I can export is meta data–title, date, author, page number.
3. From endnote, save the list of citations as a tab separated file, which means we can throw it into excel, stata, r, or whatever stats package we want.
4. In r, count the number of articles per year. Simple way: table(year). Can save this table as a new dataset.
5. Repeat the above steps for each additional keyword of interest.
6. Merge all the counts together into one dataset, and line plot away.
There must be an easier way! Anyone know of one?