Friday, February 13, 2009

Scalability: What are YOU doing about it?

I've been working on HBase lately. The system is cool, integrates nicely with Hadoop and HDFS and has an awesome community. But there is one elephant in the room: raw seek and read speed of HBase.

Due to the speed issue, hbase is ruled as interesting but too slow in some blog posts. Since serving a website directly out of HBase is on the todo list for many users (including me), I took some time out to do something about it.

At the recent HBase LA Hackathon, a number of core system implementers and commiters met and hung out with users and new users. On the agenda was planning the next 0.20 release.

It became apparent that there were two massively needed features:

  • ZooKeeper integration for reliability

  • Faster HBase for website serving

While both of these are very interesting to me, the faster HBase was of a more critical need for me. I sat with Stack one of the lead developers to do some performance testing and profiling on a new file format.

When retrieving data from disk, the file format is a performance linchpin. It provides the mechanisms for turning blobs of data on disk back into the structured data the clients expect.

We were evaluating a new proposed file format (known as TFile), and while the raw numbers on the file format seemed good, profiling exposed some worrysome trends. The details are too detailed, but my intuitive guess is the layered stream micro-architecture of the new file format was not leveraging explicit block efficiency, and instead was relying on the layers of streams to block read instead. The problem is it becomes hard to control, and loss of control reduces performance in this case.

Furthermore our tests were being done on local disk which has the benefit of OS block caching. When you move to HDFS, things change - data gets are expensive network RPCs, and you want to cache as much as possible as well as retrieve as much as possible at once.

So in parallel with Stack, he went forward and ripped out stream layers from TFile, and I decided to start anew and write a new block-oriented file format. Once both got to a sufficient space, Stack ran performance tests, and the results were conclusive, the new format, dubbed HFile, is vastly superior to existing formats (Mapfile).

My new file format HFile (previously RFile) is block oriented (you read block chunks at a time) and has an explicit block cache via an interface. Reading, all the data is stored and referred to by ByteBuffer reducing unnecessary duplication. The streaming features of other formats were removed, since with HBase it is necessary to store the key/value in memory at least once during the write (into the so-called "memcache"). Reducing that complexity while explicitly managing blocks has resulted in a file format that is faster, reduces memory overhead (See: ByteBuffer) and improves in-memory block caching all in one go.

The current status is that HFile is on track to be the new official file format for HBase 0.20. Development is active, and while there is nothing for people to use yet, you can see the glorious details at github:

Join us on IRC as well - #hbase on freenode.