return to list of articles

Building performant newsfeeds

Some light theory on building newsfeeds, with Ruby example code

Newsfeeds and social networks

If you’ve ever considered building a webapp that displays any sort of timeline or newsfeed that is personalizable, you’ll have come across the problems associated with newsfeeds. If you’ve never gone down that road, this post may help prevent future heartache.

Imagine that you have an application with a User model and that your users can follow or subscribe to other users. By following a person, their content will show up on your news feed, in addition to your own news items. A simple implementation could be something like the following where fetching all of a user’s news feed items would require the use of user.news_feed. The code could be:

def guest_news_feed
  NewsItem.last(15)
end

def news_feed
  return guest_news_feed if self.guest?

  # add user's own news items
  feed = [] << news_items

  # friends' news_items
  feed << friend_news_items

  feed.flatten_and_sort_by_recency
end

def friend_news_items
  if people_i_follow.any?
    # find all news entries for each person this user follows
    people_i_follow.map { |friend| friend.news_items }
  end
end

# giving this method this name to improve the narrative of the code example
def people_i_follow
  # find all of the user's friends in DB using an array of user IDs
  User.where('uid IN (?)', list_of_friends)
end

This is a crude implementation, and it works, but it will start struggling as the number of users, friends, and news feed items increases. Every time that you fetch a news feed, you’re making several SQL calls. In fact, with the code above you’ll be facing N+1 problems all over the place. You will improve performance by using eager loading - in fact I’d recommend it, but it’s still not an ideal solution.

Newsfeed design

In the world of newsfeeds, customized streams and timelines, there are two main models that you have at your disposal. They are the Fan Out On Write, and the Fan Out On Read models. Once you reach a certain level of scale with your application, you will almost always need to implement a separate store that exists specifically for your newsfeeds or timelines. Your main database or data store becomes a lesser concern around the newsfeed functionality. It is at this stage that your team must decide on where the fanning out of data will occur.

Fan Out On Write

This is the model that you’d use if your newsfeed is anything like the ones Pinterest or Twitter use. Each user has an aggregation of items and when an additional news item is created, it will be stored in your primary data store. An additional copy will be created for each newsfeed that it will be available in. One of the weaknesses here is that if you have a user that has 10 million followers, each new item that they post will have to be written 10 million times.

Fan Out On Read

This model is optimal if the newsfeed in your application is similar to the newsfeed that you see on platforms like Facebook. How this model works is that your application will only write a single record upon creation, and when a user loads their timeline, the application will fetch that data from that record or table. If you have a more complex newsfeed that fetches from multiple sources, it will fetch from each of those sources at the time of the request.

This is the approach that the code above uses. The reason I said that the implementation above was going to be a big weakness was that the FOOR model doesn’t perform well if you write to disk. Even with faster hardware, like SSDs, your performance will suffer. With this model, you’ll want all of your newsfeeds to be in memory, all of the time. However this approach becomes expensive and isn’t viable if you can’t fork out a lot of money on RAM.

The solution

There is no single right or wrong solution, and there are a lot of different solutions out there. This is one that I’d go with…

What you could do is use the in-memory store as an index. Use a store like Redis to store the IDs for all of the newsfeed items, and make a single SQL request using that array of IDs to fetch the actual newsfeed item collection. This is a variant of the FOOW model discussed above.

The implementation

You could implement this architecture like so… When a new user is created, a new Redis set is also created for that user. You could name it after the user’s primary key or something. A name like newsfeed-user-1 could be used. When the user subscribes to another user, adds a new friend, or whatever the nomenclature on your application is, the user they’ve just followed will have their ID added to this new user’s Redis set.

When a newsfeed item is added, your application can add the ID of the new item to the set of every friend. So for example, say that you have a user with an ID of 1, and that user adds a new entry. Three other users “follow” user 1. Their IDs are 2, 3, and 4. You would want to push the ID of the newsfeed item to the following Redis sets:

newsfeed-user-1 # the user's own feed
newsfeed-user-2
newsfeed-user-3
newsfeed-user-4

You could then optimize this by adding a maximum amount of entries and implementing a First-In, First-Out model, or whatever your preference would be.

When a user would load a page with the newsfeed in the browser, your application could call a #try_redis method that will fetch the newsfeed-user-1 (assuming the user is user 1) set from Redis. This provides your application with an array of newsfeed IDs to look up, at which point you can query your RDBMS once. The #try_redis can have a fallback so that if no Redis set is found, it should create one - or whatever your preference is.

Agree? Disagree? Write something up and tweet me your view.


Get notified when Pawel releases new posts, guides, or projects