What Can We Learn from 10.1 Million Facebook Users? “It’s Complicated.”

by Thomas Leeper

Earlier this week, Science published an article (ungated) by researchers at Facebook and the University of Michigan School of Information that was apparently sufficiently newsworthy enough to have merited immediate press attention in The New York Times. It’s also apparently sufficiently controversial to have earned a simultaneously published commentary in Science by David Lazer, a detailed rebuttal from Zeynep Tufekci, a line-by-line breakdown by Christian Sandvig, and a brief, but well-circulated critique by Eszter Hargittai.

The Science article that started it all notes in its abstract its major punchline: “Compared to algorithmic ranking, individuals’ choices about what to consume had a stronger effect limiting exposure to cross-cutting content.” In short, human behavior does more to create ideological echo chambers than the News Feed algorithm Facebook uses to display things it thinks you – as the user – might like to see.

What’s at-stake in this debate and why do we care?

The study, in brief, examined the behavior of 10.1 million Facebook users who publicly disclosed their ideological affiliation in their Facebook profile. Specifically, it looked at links shared by these users during a six-month period between July 2014 and January 2015. Some filtering was applied to reduce the number of links being analyzed to only those shared by several users, and the resulting data were classified as “hard” or “soft” news using pretty standard machine learning techniques, and links were scored ideologically based on the affiliations of the users who shared them (e.g., links shared disproportionately by liberals were scored as liberal, etc.). There are probably two main findings. First, conservatives and especially liberals are less likely to be exposed to cross-cutting news articles than would be expected by a random social network (i.e., liberals tend to have friends that don’t share conservative content). Second, while News Feed ranking reduces opportunities cross-cutting exposure, the authors estimate that conditional on News Feed position, liberals and conservatives both still engage in ideologically congruent selective exposure (and they argue this effect of individual choice is larger than the effect of the algorithm).

To put this in some broader perspective, the study speaks to an enormous literature on the notion of selective exposure and the concern that there are echo chambers, especially in American society, especially in the post-broadcast period, and especially online. Cass Sunstein has written a widely circulated book about it. Eli Pariser has a book about algorithmic personalization. Natalie Jomini Stroud has a book on it related to television selective exposure. Diana Mutz has written about it in the broader context of political deliberation. There are 3 million articles on the topic on Google Scholar. It’s a big debate that, in short, relates to the relatively widely held belief that cross-cutting political exposure is good for democracy or, at least, living in a political echo chamber might be democratically problematic.

So, if this study contributes just one additional piece of information to such a large, well-established scientific literature, why is it so controversial? Arguably, The New York Times picked it up because it involves two things that are “hot”: Facebook studying its users, and “big data” – 10.1 million individuals observed over 3.8 billion potential opportunities for media exposure to 226,000 different news stories. The study is unrelated to but gets to earn the public profile inherited from a Facebook experiment from 2014 that I’ve written about previously.

But beyond it being high profile, why is it controversial?

Tufekci’s critique, which is representative of critiques I’ve seen, has two major points. One relates to the apparent effect of individual selectivity, and the other relates to the sample under examination. First, Tufekci focuses on results in the paper’s appendix. Specifically, a finding that a new story positioned first in one’s News Feed is clicked about 20% of the time, but the likelihood of clicking drops exponentially and consistently for liberals and conservatives, and for congruent and cross-cutting exposure. Cross-cutting items are clicked less, regardless of position, but the rate of decline determined by News Feed position is consistent across news types. The paper does note that its reported findings are conditional – i.e., conditional on News Feed position, cross-cutting items are clicked less.

My read of Tufekci’s critiques is that she confounds the interaction between users’ interactions with the News Feed stream itself, and the algorithm that displays posts on the feed. This is an important distinction. The News Feed is a stream (like Twitter, or many other social media services). She points to a figure in the paper’s Appendix (Figure S5) that shows the likelihood of clicking on a story depending on its position on the feed. Here’s the figure:

fb1

It shows a precipitous decline in click rates as News Feed position increases. Many critics interpret this as meaning that the News Feed algorithm has an enormous effect on selective exposure. That is an incorrect interpretation, however. The paper defines the effect of the algorithm as whether a story is displayed on the News Feed at all. If one were to simply sit with Facebook open, refreshing the page periodically, posts would move down the webpage. That pattern shows that the inevitable progress of time has an enormous effect on what is clicked by users, but that is not the same as the effect of filtering by the algorithm. These users avoided content that is “old” but that is not the effect of the News Feed’s algorithm. The paper is concerned with whether the agorithm prevents posts from being shown on the News Feed at all, not what the effect of time-dependent streaming is. And rightly so. Th effect of the time is rather boring and unsurprising – users don’t click on old stories “below the fold.” No surprise there.This is such an important point that I would have been inclined to include an additional figure (S7) from the Appendix in the body of the paper:

fb2

This figure shows that among individuals in the sample who had at least one cross-cutting article shared by a friend, over 95% of those articles were shown in the News Feed but less than 60% of those articles were clicked on. This is really a key figure for their results. Filtering performed by the algorithm removes almost no aligned content and removes a tiny percentage of cross-cutting content. Users’ choices among the displayed content determines most of what is actually clicked on. This is the paper’s core claim and the above graph shows that incredibly clearly.Related to all of this, Tufekci says that trying to compare the effect of the News Feed algorithm and the effect of individual choice is an apples and oranges comparison: “I cannot remember a worse apples to oranges comparison I’ve seen recently, especially since these two dynamics, algorithmic suppression and individual choice, have cumulative effects.” Given that the paper reports effects of users’ choice, conditional on exposure in the News Feed, I do not see how pointing out the cumulative nature of the effects is a relevant criticism.It’s not a representative sample!

Tufekci’s second major point (and one also made by others, especially Sandvig) criticizes the study’s sample: namely, the sample consists of self-identified ideologues who regularly use Facebook, thus consistuting about 4% of the total Facebook population ratehr than a randomly selected sample. This is the same issue taken up in Hargittai’s critique. She says “Sampling is crucial to social science questions since biased samples can have serious implications for a study’s findings. In particular, it is extremely important that the sampling methodology be decoupled from the substantive questions of interest in the study. In this case, if you are examining engagement with political content, it is important that sampling not be based on anything related to users’ engagement with politics. However, that is precisely how sampling was done here.”

An aside: via Twitter, Deen Freelon describes this as sampling on the dependent variable, which is a cardinal sin in the social sciences. Sampling on the dependent variable means only studying cases where the outcome phenomenon of interest occurs. In this study, that would mean only studying those who looked at cross-cutting stories or only those who didn’t. That’s not what happened, so that critique is incorrect. The study may have sampled people that may be more or less likely to engage in selective exposure than some other group(s) of individuals, but no sin has been committed.

The actual concern expressed by Tufekci and Hargittai is that the results of the study hinge on the sample (i.e., a different sample would have produced different results). This is, of course, true: different samples often produce distinct results and a sample that is not representative of the population should not be assumed (a priori and without an explicit model of generalization) to produce results that are unbiased estimates of population quantities of interest. The issue with this line of criticism is that it doesn’t change anything. The authors of the study are explicit about who is in the study and in all likelihood a random sample would not have been feasible because their measure of ideology (which is the crucial variable in the analysis) is unobserved for the vast majority of users. While this means that their sample may be different from the Facebook user population as a whole, that isn’t important for their research findings as they relate to this sample, it is only important for attempts to generalize their results to particular other groups. In this case, it is not obvious that the authors intend to generalize to other groups – maybe they do, maybe they don’t, they’re not really clear on the question. Thus the critics are holding the study to a different standard than the authors intend. (There is some problematic language in the conclusion that I discuss below.)

Hargittai offers a stronger series of rhetorical questions: “And why does Science publish papers that make such claims without the necessary empirical evidence to back up the claims? Can publications and researchers please stop being mesmerized by large numbers and go back to taking the fundamentals of social science seriously?” I think that she and I must read scientific literature in very different ways. She seems to be concerned that the rhetoric of the paper is important to the accumulation of scientific knowledge. I read the paper in terms of its data and analysis only. How the paper is written is largely irrelevant to our cumulative understanding of selective exposure. What matters is how large the effect of individual choice seems to be and the paper suggests that effect is modestly large and larger than the automated selection of content performed by the News Feed algorithm. While I would hesitate to give authors broad license to claim whatever they want, as a scientist I also know not to interpret anything anyone writes anywhere as definitive truth (be it in on Twitter, on a blog, or in a peer-reviewed journal article). We cannot accumulate knowledge about political and social processes by just talking about them, we need actual data and while this study has very clear limitations, it also offers a lot of data on a very specific question that will be helpful for future research moving forward.

But don’t we know all this already?

One of Tufekci’s smaller criticisms is that these “Researchers then replicate and confirm a well-known, uncontested and long-established finding which is that people have a tendency to avoid content that challenges their beliefs.” I would argue that we don’t know that very well. While there is an enormous literature on selective exposure, studies find a range of different effect sizes (and directions) related to selective exposure, in part because the concept of exposure is extremely difficult to measure. Most of this research is based on self-reported survey data that must be read with incredible caution; another body of literature is based on laboratory experiments that have to be read with caution, as well. This study involves direct observation of real-world behavior on a massive scale. That’s very informative, even if it doesn’t necessarily tell us about the typical Facebook user or behavior outside of Facebook.

So, this is a great study then?

No. It’s worth highlighting that I did not love this paper. It said a lot in the 4500 words allowed by Science, but Tufekci is right to point out that a lot of important material is in the Appendix. Here are some things the authors could have done differently:

– The conclusion could have reiterated the sample restrictions that they described in the paper’s second paragraph. In 4500 words, I’m not sure that’s necessary but it would have probably helped assuage concerns about intended generalizations.
– Figure 3A is confusing. It tries to show the size of different contributions to exposure, but it’s not immediately obvious what is going on so it fails as an effective visualization.
– The paper (or its Appendix) should have included some descriptive statistics on how the sample compared to the overall Facebook population and the U.S. adult population (or U.S. internet population) as a whole. Relevant measures here would have been frequency of site use, frequency of sharing, liking, and clicking on news stories, as well as demographics. These data would have likely assuaged some concerns about sample representativeness.
– I worry about two aspects of measurement validity. One relates to people misrepresenting their ideology. If I am a very politically liberal (or very politically conservative), I may find it – if I have a particular sense of humor – amusing to characterize myself as something other than my true political identity. I would want to be cautious that this phenomenon isn’t too prevalent. The other relates to the measure of “content alignment” used to infer selectivity. The paper could be clearer (perhaps through text analysis of comments attached to shares) whether users are sharing posts that they agree or disagree with, and the relative prevalence of each type of behavior.
– The figure that Tufekci cites (Figure S5) from the paper’s appendix is nice and I would have tried to fit it in the body of the paper (but I also would have extended the y-axis to range from 0% to 100% to show how unlikely anyone is to click on anything they see on Facebook).
– The paper’s final paragraph says “we conclusively establish.” I hate the word “conclusively” because nothing in science is conclusive, so I would have left that out.
– The paper’s final sentence says “Regardless, our work suggests that the power to expose oneself to perspectives from the other side in social media lies first and foremost with individuals.” I’m assuming the authors mean to attribute this “power” to both the mechanism of selective exposure *and* the mechanism of constructing a homophilous social network. They probably should have been clearer about their meaning there.

Another final point – the research was funded and produced by Facebook, so it could be argued that it written in service to Facebook’s PR strategy. That may be true. The authors say that it was not. I think we have to take them at their word unless any evidence suggests otherwise. Is it problematic for organizations to research themselves? No. I would much rather have an organization so central to American (and global) life as Facebook at least publish some of their research through mainstream, peer-reviewed scientific channels. If your view is that Facebook will only publish research that casts itself in a particular light, then update your priors accordingly in light of new data. If your view is that Facebook is unbiased in allowing its researchers to conduct and publish research, then update your priors otherwise. Most importantly, do not assume that others share your priors or interpret evidence the same way. Such are the complications of the scientific process.

This was an interesting study. I liked reading about it. The critics raise some important points, but most of those relate to how the authors think the results should be intepreted and are not criticisms that run to the core of the research itself. I am glad this study was published. Thus, I think it’s fundamentally problematic to insinuate – as Nathan Jurgenson has on Twitter – that the paper’s publication was not subject to a standard review process, without evidence to that effect, and to criticize Science for publishing the research and Facebook for allowing it to be disseminated. Simply because you do not like the conclusion or interpretation of a piece of research, does not mean that it wasn’t subjected to peer review or that it wasn’t worth publishing. The concerns raised by critics are largely editorial and stylistic, not scientific.

But, Jurgenson also has a good point: journalists ran with this story. That’s unfortunate to some extent. I’ve always wished that journalists would only report the results of systematic reviews, so as to avoid the perpetual coffee/wine/chocolate/running/soda/etc. kills you/saves you back-and-forth. Of course, we can’t prevent journalists from reporting what they want to report, but we – as educators – can try to more to suggest they critically interpret and report on scientific research. And that should be the takeaway here.

* In the interest of complete transparency about potential conflicts of interest, here are some disclosures. I know the study’s second author Solomon Messing, having met him several times at political science conferences. Eszter Hargittai is a professor at my PhD alma mater and we met several times due to her current and my past affiliation with Northwestern’s Institute for Policy Research. I don’t know the other authors or other scholars I reference. I have written previously about research by Facebook data scientists. I have never received funding from Facebook nor have I ever had any formal or informal relationship with the company, though I have an account on the site.

Thomas Leeper earned his PhD from Northwestern University. He is currently a Postdoc in the Institut for Statskundskab at Aarhus University in Aarhus, Denmark. In September 2015, he will join the Department of Government at the London School of Economics and Political Science as an Assistant Professor in Political Behaviour. 

Advertisements

2 thoughts on “What Can We Learn from 10.1 Million Facebook Users? “It’s Complicated.”

  1. Hi Thomas,

    Thanks for the feedback. Please note that here’s what I actually said about the link placement and click-through:

    “Notice how steep the curve is. The higher the link, more (a lot more) likely it will be clicked on. You live and die by placement, determined by the newsfeed algorithm. (The effect, as Sean J. Taylor correctly notes, is a combination of placement, and the fact that the algorithm is guessing what you would like).”

    This is actually correct. (And the minor clarification came from a person on the Facebook data science team. He was right so I added that to prevent misunderstanding). I think you misunderstand the Facebook stream. There is no “new” or “old”–your language– placement is determined by the algorithm. This chart shows that this placement greatly affects clickthrough. This is a separate finding then the algorithmic removal of hard news items finding. So, there is no “time-dependent streaming” as you claim, as the stream is algorithmically ordered. If you want to dispute that, you need to take it up with Facebook. It’s pretty straightforward.

    Second, “selecting on the dependent variable”–used by Deen Freelon not me–is not a cardinal sin in social sciences. You can do it, but you need to take into account what that means.

    Subsamples like this raise a question similar to that of “missing not at random”. (I recommend Allison (2001) for an overview). It’s absolutely plausible that *the mechanism* between exposure and selectivity is different in this political active subsample of 4% versus the rest of Facebook, less politically active. Again, straightforward. By the way, here’s a *great* study from Facebook, from one of the authors, that solves this problem via validated imputation: http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=9586211&fileId=S0003055414000525. I really like this paper as it shows what Facebook data can do–it confirms increased polarization among the population over time, a really hard research question.

    For an example of how operating from a nonrandom subsample can have dire differences in mechanisms, I refer you as an example to the hormone replacement therapy trials, where the subsample differences were masking increased cancer due to the therapy, and the larger, randomized trial had to be halted after it was discovered that the findings from the previous subsamples were wrong that we were iatrogenically giving cancer to women. This may not be a similar matter of life and death, but methodologically speaking Eszter’s concern is not unwarranted. It doesn’t mean we don’t work on subsamples but it means that we are upfront about it throughout the paper so as not to be misleading.

    Third, I find it somewhat disregarding to the authors to say who cares what their “rhetoric” is and if it is misleading. Of course I can read the data and analyses, too, and that’s why I and others have written these pieces–to explain what the actual findings were. But since you also seem to agree that the rhetoric did not match the findings, this is of particular concern, since they are not traditional academic researchers with the traditional academic freedom. I’d actually like them to have more independence from Facebook’s corporate PR interests, and the overselling in this paper coincides with Facebook’s corporate sensitivities. That is not a good sign for the future of scientific research coming from Facebook data. (For the skewed press coverage that resulted from the misleading rhetoric, see a sampling here: https://twitter.com/zeynep/status/596688688031133696)

    Finally, I and others have made the case why the algorithm and the individual tendencies are a coupled system, and the latter fact is pretty well established (though I do appreciate a quantification as the researchers do). My beef is with the final framing not only oversells, and miscompares, but ends up reading like something from the public relations department–it’s your fault, not Facebook’s–which then dominates the press coverage–a concern especially since we know that corporate Facebook vets and has final gatekeeping authority over what gets allowed to be published from their researchers.

    As for going forward, I’d actually like corporate parent to provide a clear and explicit statement on researchers independence. I know that journalists have asked Facebook and have not received an answer. There’s a reason academic tenure is crucial to academic freedom. The question here is not the integrity of the researchers, but the pressures they face.

    Like

  2. Pingback: Quotes & Links #65 | Seeing Beyond the Absurd

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s