A few days ago Facebook announced its new search capabilities. These are Google-like capabilities of searching your history, the feature that was the crown jewel of Google+ – Google’s attempt to fight off Facebook. Want to find that funny thing you posted when you took the ice bucket challenge a few months ago? It’s now easier than ever. And it’s now supported also on your phone.
You may think this is a simple (yet highly useful) feature. But when you come to think of it, this is quite a challenge, considering the 1.3 billion active users generating millions of events per second. The likes of Facebook, Google and Twitter cannot settle for the traditional processing capabilities, and need to develop innovative ways for stream processing at high volume.
A challenge just as big is encountered with queries: Facebook’s big data stores process tens of petabytes and hundreds of thousands of queries per day. Serving such volumes while keeping most response times under 1 second is hardly the type of challenge of traditional databases.
These challenges called for innovative approach. For example, Facebook’s Data Infrastructure Team was the one to develop and open-source Hive, the popular Hadoop-based software framework for Big Data queries. Facebook also took innovative approach in building its data centers, both in the design of the servers, and in its next-gen networking designed to meet the high and constantly-increasing traffic volumes within their data centers.
Facebook is taking its data challenge very seriously, investing in internal research as well as in collaboration with the academia and the open-source community. In a data faculty summit hosted by Facebook a few months ago, Facebook shared its top open data problems. It raised many interesting challenges in managing Small Data, Big Data and related hardware. With the announced release of Facebook Search for mobile, I remembered in particular the challenges raised in the Facebook data faculty summit on how to adapt their systems to the mobile realm, where network is flaky, where much of the content is pre-fetched instead of pulled on-demand, where privacy checks need to be done much earlier on in the process. The recent release may indicate new innovative solutions to these challenges. Looking to hear some insights from the technical team.
Facebook, Twitter and the like face the Big Data challenges early on. as I said before:
These volumes challenge the traditional paradigms and trigger innovative approaches. I would keep a close eye on Facebook as a case study for the challenges we’d all face very soon.
Follow Dotan on Twitter!