The engineering team at Pinterest recently documented the deprecation path of their HBase clusters due to high maintenance and infrastructure costs, a lack of Hbase experts, and limited product functionalities. Following Pinterest’s move to TiDB and other database technologies, the community questions whether this is another sign of the decline of the non-relational database that runs on top of Hadoop and HDFS.
Pinterest used to host one of the largest production deployments of HBase in the world, which peaked at around 50 clusters, 9000 AWS EC2 instances, and over 6 PBs of data. Alberto Ordonez Pereira, senior staff software engineer at Pinterest, and Lianghong Xu, senior engineering manager at Pinterest, explain the company’s transition from managing multiple online storage services supported by HBase to a brand new serving architecture with a new datastore and a unified storage service:
HBase had proven to be durable, scalable, and generally performant since its introduction at Pinterest. Nevertheless, after a thorough evaluation with extensive feedback gathering from relevant stakeholders, at the end of 2021 we decided to deprecate this technology.
Modeled after Google’s Bigtable and implemented in Java, HBase is a key-value store built on top of HDFS and used by Apache Hadoop. According to Pereira and Xu, HBase was Pinterest’s first NoSQL datastore and one of the most widely used storage backends for the image-sharing and social media company. They write:
The maintenance cost of HBase had become prohibitively high, mainly because of years of tech debt and its reliability risks. Due to historical reasons, our HBase version was five years behind the upstream, missing critical bug fixes and improvements. Yet the HBase version upgrade is a slow and painful process due to a legacy build/deploy/provisioning pipeline and compatibility issues.
The ecosystem at Pinterest built around HBase. Source: Pinterest blog.
Highlighting HBase’s missing functionalities, the authors mention that the lack of distributed transactions in HBase led to several bugs and incidents for the in-house graph service. Furthermore, they found that HBase failed to match the performance of other data stores for OLAP workloads.
In the article “Why is Pinterest deprecating HBase? Is HBase dying?”, Shivang Sarawagi highlights a steady decline in the number of Google searches for HBase over the last five years and writes:
While HBase continues to be used in the industry, over the years, with the emergence of cloud-native services, we have several alternatives and solutions available to serve specific system use cases.
In a popular thread on Hacker News, user dehrmann comments:
I worked for a place that used HBase heavily. They migrated from AWS to GCP just for BigTable (…) The workload of managing HBase and HDFS was high, and it was unreliable enough that they always had a failover cluster set up. Interestingly, the migration surfaced degenerate cells/tables that might have been partially to blame for reliability issues.
Pinterest has previously shared how they migrated some workloads from HBase to TiDB with zero downtime. Sarawagi adds:
With the emergence of modern databases, the industry’s focus has gradually moved from HBase. However, this does not mean the technology is obsolete.
The engineering team at Pinterest promised to publish two additional articles to document how they conducted a comprehensive evaluation to finalize their decision on storage selection.