17 Aug, 2009

A Compendium of solutions for scaling a Data Store

Posted by Bhavin Turakhia

Achieving infinite scalability on your data store is the holy grail of scaling an application. App servers are typically stateless and therefore a cinch to scale. This document serves as a comprehensive compendium on my thoughts and research on scaling a data store.

Requirements

  • Inifinite Scalability
  • High Availability (0% downtime)
  • Data Redundancy
  • High Performance
  • Storage Flexibility – ability to store any type of data
  • Query Flexibility – ability to perform simple gets, range based gets, range based updates, and possibly complex joins

Features that a solution must have to deliver the above Requirements

  • Replication – Each unit of data should be copied to multiple nodes so that if an underlying node crashes there is no data loss
  • Partitioning – Data should be divided across multiple nodes based on specific keys so that the data layer is infinitely scalable
  • Online node addition – Solution should support adding new nodes online, with automated data distribution upon new node addition
  • Load Balancing – Queries should be load balanced between nodes
  • Persistence – Should have a data persistence layer so that data is not volatile
  • Caching – Should support flexible in-memory caching for increased retrieval speeds
  • Tree based Indexing – To support range based queries ideally one may need tree type indexes for keys on which range queries maybe made
Proposed Solutions
The below solutions are a result of researching a ton of options (refer Research seciton below). They also represent solutions that are practical to deploy and are being used in production. There are exotic variants that I came up with during my research but I have left them out.

Solution 1: Google App Engine
  • Provides BigTable – Googles implementation of a scalable database
  • BigTable is not an RDBMS, but has a fairly flexible API that supports creative data fetching methods
  • BigTable is distributed and self-balancing – scaling is no longer the application developers problem
  • You will need to host your application on Google’s App Engine
  • Your application needs to use the BigTable API for data storage
Solution 2: MySQL NDB Cluster
  • MySQL NDB Cluster is a master-master, self-partitioned, replicated storage engine that technically seems to provides all the features listed above
  • It also offers access via SQL or a high-performant native NDB API
  • It seems like the holy grail of database scaling
  • It however stores indexes entirely in memory – and has lesser flexibility w.r.t persistence
  • I could not find much material on the performance of an NDB Cluster
Solution 3: HyperTable
  • HyperTable is an opensource BigTable clone and provides essentially the same features as described in the BigTable whitepaper
  • It is supported by Baidu, Zvents and Rediff
Solution 4: Project Voldemort
  • Voldemort is a big, distributed, persistent, fault-tolerant hash
  • Developed and used by Linkedin
  • Java based API
  • Reasonably performant (10-20k ops on commodity hardware)
  • Maintains replicated copies of data over multiple nodes and automatically handles server failures
  • Does not support range based querying
Solution 5: Tokyo Tyrant
  • Tokyo Tyrant is a layer on top of Tokyo Cabinet – a highly performant, persistent data strore (site claims over 2 million qps)
  • Tokyo Tyrant itself claims to deliver upwards of 58,000 qps
  • It supports multiple language bindings (Java, Perl, PHP etc)
  • Supports various data structures – hash, tree, B+tree, array, table etc
  • Supports caching
  • Tokyo Tyrant does not support active-active master-master replication, thus failing out on redundancy. It also does not support data partitioning out of the box
Solution 6: Postgres + Pl/Proxy + Replication (Slony / Continuent) + PGBouncer (connection pooler)
  • Postgres is an extremely mature RDBMS
  • Using PL/Proxy one can abstracte horizontal partitioning concerns out of the database layer and into an abstracted underlying layer
  • Using Slony or Continuent one can ensure that multiple copies of any set of rows exist at any given point in time (Synchronous or Async replication)
  • PGBouncer provides a light-weight connection pooler for PL/Proxy
  • Together this will satisfy all our requirements above
  • You may club Voldemort or memcached or redis with this to provide a caching layer
  • Postgres also has hooks for connecting to memcached that make cache population and invalidation easier
Research
A lot of reading went behind discovering the above solutions. The below links are a good start -
BigTable
Google App Engine
MySQL Cluster
Amazon SimpleDB
Cassandra
Hypertable
MongoDB
neo4j
Project Voldemort
Redis
Tokyo Cabinet
Lightcloud
Gigaspaces
Coherence
Postgres
Tags: , , , , , , , , , , ,

Comments
Rushabh
August 20, 2009

Very nice article! Something interesting i found on these lines is that google is planning to launch a new file system. More info here http://www.dailytech.com/Google+Works+New+File+System+into+Caffeine+Plans/article16004.htm

Salesh Chacko
August 24, 2009

Another scalable data storage worthy of mention is CouchDB.
http://couchdb.apache.org/

nazanin
September 15, 2009

Hi,
I have some questions.
How do you compare BigTable with HyperTable? If you could choose an alternative to big table, would you choose HyperTable or Voldemort? and why?
confused student,
Best.

Prakash
October 1, 2009

HBase will be another worthy entry to this list.based on Hadoop.
http://hadoop.apache.org/hbase/

yangyang
March 25, 2010

Rosetta Stone Spanish (Latin America) Level 1 with Audio Companion. I have used this popular Spanish rosetta stone software to learn Spanish with mix results.Our “TopTenREVIEWS Bronze Award” went to Rosetta stone french, a recognized leader in the language learning industry. Note for Intel Macintosh Users: rosetta stone V3 application for Macintosh is a universal application that will run natively on Intel Macs.Learn French in your own time and have fun.

Leave a comment

(required)

(required)

Spam protection by WP Captcha-Free