Data Storage is Facebook’s Next Initiative
Facebook will follow up on its Open Compute intiiatiive with a new initiative to improve its data storage.
According to Fast Company, the emphasis on storage is needed due to the continued sheer volume of data that Facebook’s users are producing. Those hundreds of millions of users take a lot of pictures. They use Facebook to post their video.
Facebook engineer Kannan Muthukkaruppan wrote last November that Its messaging infrastructure handles more than 350 million users sending 15 billion person-to-person messages per month. Its chat service supports more than 300 million users who send over 120 billion messages per month.
Facebook’s Frank Frankovsky tells Fast Company it will focus on better efficiencies in the data center to handle this volume of data. That means better energy management and a resulting decrease in costs. That’s the same open approach as with Open Compute. Their goal is to spur innovation in its core business strengths.
Last Fall, Facebook chose HBase, the Hadoop database over Cassandra and MySQL for its messaging infrastructure, According to Muthukkaruppan, the Facebook engineers found that MySQL could not handle the long tail of data very well. And Cassandra consistency model has proven to be a difficult pattern to reconcile for its new infrastructure.
Muthukkaruppan writes that HBase scales better. it has better performance. And they like its feature set:
HDFS, the underlying filesystem used by HBase, provides several nice features such as replication, end-to-end checksums, and automatic rebalancing. Additionally, our technical teams already had a lot of development and operational expertise in HDFS from data processing with Hadoop. Since we started working on HBase, we’ve been focused on committing our changes back to HBase itself and working closely with the community. The open source release of HBase is what we’re running today.
Facebook’s messaging infrastructure gives some insights into the requirements needed to continue scaling Facebook’s data storage. It also shows what we should continue to see from Facebook as it strengthens its infrastructure to differentiate on the application layer.