We Cracked The Code On How The Facebook News Feed Algorithm Works
Sean Davis
By
Email
Print
Hangout with us

If you manage your company’s Facebook page and have ever wondered how the Facebook news feed algorithm decides how many of your fans will see your content, then wonder no more. We’ve cracked the code (or we’ve at least cracked the code as it pertains to The Federalist’s Facebook page). And yes, for those of you who don’t feel like reading through the entire post or grappling with the math and statistics below, the Facebook news feed algorithm absolutely rewards the purchase of Facebook ads.

According to our analysis, five simple variables explain the vast majority (nearly 75 percent) of how the Facebook news feed algorithm works: total likes, daily paid reach, site page views from Facebook, weekend vs. weekday, and posts per day. The full magnitude of each factor’s effect is discussed in detail below.

But let’s start with the basics, namely that Facebook actively restricts the number of your fans that see your posts in their news feeds. That was not always the case, but the halcyon days of publishers having free, unfettered, and permanent access to their fans’ news feeds are over.

The Basics

By all appearances, Facebook is rapidly implementing what economists call a two-part tariff: you pay once to get in the door, and then you pay again to talk to people who are already inside. Costco, credit cards that charge annual fees and interest on purchases, bars that charge both a cover and a per drink fee, and carnivals or amusement parks that charge an entrance fee and a per ride/game fee are examples of various businesses and products that utilize a two-part tariff pricing mechanism.

Two-part tariffs are generally instituted by businesses when most of the value of their products is being retained by the consumer, rather than the producer. The benefit to the seller of the two-part tariff system is that it takes a good chunk of any initial consumer surplus (the value of the product above and beyond what the consumer paid for it) and turns it into a producer surplus. And boy was there a lot of consumer surplus before Facebook decided to restrict news feed access.

How much has Facebook throttled a given page’s access to its fans’ news feeds? A ton. Take, for example, the news feed access granted to Facebook fans of The Federalist:

News Feed Throttle

Pretty measly, right? As the individual tasked with setting The Federalist’s Facebook budget and analyzing its impact on our bottom line, that chart didn’t make me very happy. The amount of money that we were spending to attract new fans was not generating the increased exposure to our content that we expected. Thankfully, our Facebook like campaigns were not randomly designed or executed. Over the past five months, we designed a number of different advertising experiments that we hoped would give us the ability to retroactively analyze the bottom-line impact of our ad expenditures. We ran news feed like campaigns, right column like campaigns, mobile ad campaigns, sponsored post campaigns, and so on, so that we could study the effects of each in isolation.

The Results

Why did we do all that? We wanted to answer one question: was the money we were spending on ads giving us the increased news feed exposure that we needed to justify the expenditures?

The results of our analysis were pretty shocking.

Using a statistical technique known as multivariate linear regression (it’s much less daunting than it sounds), we were able to isolate five variables that explained roughly 75 percent of all variation in news feed penetration. The proxy we used to measure news feed penetration is called “daily organic reach” and is provided by Facebook.

According to Facebook Insights, the user-facing Facebook analytics platform, daily organic reach is “the number of people who visited your page, or saw your page or one of its posts in news feed or ticker. These can be people who have liked your page and people who haven’t.” Although only those deep within the bowels of Facebook’s analytics department can know for sure whether the organic reach is truly quarantined from the paid reach, this category is supposed to separate reach achieved organically versus reach achieved through the use of paid advertising. After all, that’s why Facebook also provides the “daily paid reach” category of data, which reports “the number of people who saw a sponsored story or ad pointing to your page.” As far as I can tell, the “daily organic reach” data category is the only one made available by Facebook that allows page managers to quantify how many people are organically exposed to their page’s daily content.

So which five variables explained the vast majority of variation in daily organic news feed penetration? Total number of likes, daily paid reach, daily site page views from Facebook, whether it was a weekend or weekday, and number of Facebook posts per day. And how good was our statistical model? This good:

Federalist News Feed Algorithm

For the statistics junkies out there who know that while pictures may lie, numbers don’t, here are the numerical results of the linear regression:

Summary Output

An Explanation Of The Statistics

What exactly do those numbers mean and what do they say about the factors that influence a company’s news feed penetration? Let’s start with the top section, which includes “R Square” and “Adjusted R Square.” The R Square number tells us what percentage of variation in the dependent variable (or y-variable, which in our case was daily organic reach) is explained by the independent variables included in the regression (or x-variables, which in our case were total likes, daily paid reach, site page views from Facebook, weekend vs. weekday, and posts per day). Although my former statistics professors would have me imprisoned for the imprecision of this next statement, a high R Square is generally good, and a low R Square is generally bad.

The Adjusted R Square number makes some corrections based on the number of observations and variables included in the analysis. The R Square numbers for this particular regression, which show that nearly 75 percent of the variation of daily organic reach can be explained by the five included variables, are pretty awesome.

The only number in the next set of data that we care all that much about is the “F,” or F-statistic. It is useful for comparing different regressions to each other. And like R Square, a higher F-statistic is usually a good thing (an F-stat of 83 is remarkable and blew away the F-stats on the dozens of other regressions that we ran).

Now that we’ve established that the regression itself contains useful information, we can turn our attention to the more fun part: analyzing which variables appear to predict news feed penetration. Just because a regression itself is statistically significant doesn’t mean each of the independent variables tested will be, so we must examine the t-statistics of each. Generally, anything above 1.96 (or below -1.96) is statistically significant at a 95 percent confidence level. As is shown in the table above, each variable is statistically significant (although site page views from Facebook is right on the bubble given a p-value that’s not quite below 5 percent). And though the coefficient for the intercept (a constant) in this model is negative, its t-stat/p-value suggests that it is not statistically different than zero.

The Predictive News Feed Equation

The predictive equation produced by the linear regression model is as follows:

Daily Organic Reach = -22 + (Total Likes x 5.399%) + (Daily Paid Reach x 0.327%) + (Page Views x 0.416%) + (Weekend [1 if yes, 0 if no] x -194.4) + (Posts Per Day x 81.08)

So what does that equation mean? First, it means that — holding all other variables equal (one of the nice aspects of regression) — you can expect a mere 5.4 percent of your Facebook fans to see your Facebook posts in their news feeds on any given day. Second, it means that ad purchases clearly increase your organic reach. For every 10,000 people you reach via paid Facebook ads, you can expect an additional 33 people to see your content in their Facebook news feeds. Third, it means that Facebook likely adjusts your news feed exposure based on how popular a particular post is:  for us, each 10,000 page views generated from Facebook increased our organic news feed exposure by about 42 people. Fourth, it means that weekend posts — at least for The Federalist — just don’t get that much exposure (holding all other variables equal, the negative number there means a weekend post all by itself reduces our daily organic reach by 194 people). And finally, it means that post frequency makes a big difference, to the tune of an additional audience of 81 people per additional daily post.

Limitations

Is this model perfect? Of course not. No model is perfect, and every model has flaws. For example, while we have a lot of data about The Federalist’s Facebook activity, we are still only one site, so in the big scheme of things our model has only a single observation. Other sites, especially those with monster numbers like Upworthy or Buzzfeed or Huffington Post, might exhibit completely different behavior. (Although I can’t prove it, I suspect that “verified” sites that have clearly invested hundreds of thousands, if not millions, of dollars in Facebook ads may get preferential treatment from the social media giant.)

The linear nature of linear regression also has severe limitations — in reality, most things are not linear, and at some point one would assume that diminishing marginal returns would kick in. There are ways to address this issue, such as using logarithms to transform different variables, but those methods would not have been appropriate given the nature of the underlying data. And, as mentioned above, there may be different factors that kick in once a site reaches different levels of likes.

Additionally, it’s also worth noting that 25 percent of the variation in daily organic reach is not explained by the model, so a factor that was not included could end up having a large impact going forward. Finally, Facebook could change its algorithm tomorrow and render this particular model obsolete going forward.

If, in spite of all those limitations, this model continues to be an accurate predictor of news feed penetration for our page, then we will continue to use it (as should you!). And if it doesn’t, we’ll start from scratch and try to build another one with more explanatory and predictive power. (As an aside, it would be really nice if more professional modelers, like those who work in fields like climate science or economics, were more honest and open about the severe and rather obvious limitations of their models.)

Implications

So with all those limitations in mind, what’s the point of this exercise? For starters, from a budgeteer’s perspective, it’s remarkably valuable. Given that the algorithm appears to favor total ad reach versus total money spent (yes, we tested that in a different regression), advertisers are better off holding cost constant and maximizing ad impressions served. For us, that means abandoning news feed ads to increase likes in favor of right column ads to increase likes. We’ve found that while the cost per like is nearly identical for either method, the cost per thousand ad impressions (CPMs) for right column ads are far less expensive than for news feed ads. Given the coefficients of the regression variables above and our own internal data on historical Facebook ad performance, a $100 ad spend on right column ads instead of news feed ads will increase our daily organic reach relative to total likes by 1.8 percent.

This analysis also told us that sponsored posts (yep, tested that as well in a different regression) are a complete waste of money. They don’t generate even close to enough page views to offset the cost, and they don’t increase daily organic reach nearly as much as ads meant to increase page likes. That might not be the case for a company that sells pricey products and uses Facebook to find and acquire new customers, but for a business whose main unit of revenue is the page view, the math on sponsored posts just doesn’t work.

Conclusion

Facebook can deny the charge all it wants, but according to extensive data for our Facebook page, the Facebook news feed algorithm clearly rewards the purchase of ads. There’s nothing inherently wrong with that — Facebook has every right to charge whatever it wants for the services it provides. The company’s advertisers and publishers, however, need to understand the extent to which Facebook uses ad purchases to increase a page’s news feed exposure. That’s why we conducted the analysis we did — money doesn’t grow on trees, and we need to have a very clear understanding of how money invested in advertising affects our overall bottom line. But it would be nice if Facebook were more transparent and specific about how its news feed algorithm works.

The company’s continued opacity is what led us to do our own digging, and according to Facebook’s own numbers about how our fans interact with our page, it turns out that a dollar spent on Facebook might not be worth as much as a dollar spent somewhere else.

If you are a social media account manager and would like to learn more about how to apply these analytic methods to your ad campaigns, please feel free to e-mail me. You can find my contact information here.

comments powered by Disqus