17 Aug, 2009

A Compendium of solutions for scaling a Data Store

Posted by Bhavin Turakhia | (5) Comments

Achieving infinite scalability on your data store is the holy grail of scaling an application. App servers are typically stateless and therefore a cinch to scale. This document serves as a comprehensive compendium on my thoughts and research on scaling a data store.

Requirements

  • Inifinite Scalability
  • High Availability (0% downtime)
  • Data Redundancy
  • High Performance
  • Storage Flexibility – ability to store any type of data
  • Query Flexibility – ability to perform simple gets, range based gets, range based updates, and possibly complex joins

Features that a solution must have to deliver the above Requirements

  • Replication – Each unit of data should be copied to multiple nodes so that if an underlying node crashes there is no data loss
  • Partitioning – Data should be divided across multiple nodes based on specific keys so that the data layer is infinitely scalable
  • Online node addition – Solution should support adding new nodes online, with automated data distribution upon new node addition
  • Load Balancing – Queries should be load balanced between nodes
  • Persistence – Should have a data persistence layer so that data is not volatile
  • Caching – Should support flexible in-memory caching for increased retrieval speeds
  • Tree based Indexing – To support range based queries ideally one may need tree type indexes for keys on which range queries maybe made
Proposed Solutions
The below solutions are a result of researching a ton of options (refer Research seciton below). They also represent solutions that are practical to deploy and are being used in production. There are exotic variants that I came up with during my research but I have left them out.

Solution 1: Google App Engine
  • Provides BigTable – Googles implementation of a scalable database
  • BigTable is not an RDBMS, but has a fairly flexible API that supports creative data fetching methods
  • BigTable is distributed and self-balancing – scaling is no longer the application developers problem
  • You will need to host your application on Google’s App Engine
  • Your application needs to use the BigTable API for data storage
Solution 2: MySQL NDB Cluster
  • MySQL NDB Cluster is a master-master, self-partitioned, replicated storage engine that technically seems to provides all the features listed above
  • It also offers access via SQL or a high-performant native NDB API
  • It seems like the holy grail of database scaling
  • It however stores indexes entirely in memory – and has lesser flexibility w.r.t persistence
  • I could not find much material on the performance of an NDB Cluster
Solution 3: HyperTable
  • HyperTable is an opensource BigTable clone and provides essentially the same features as described in the BigTable whitepaper
  • It is supported by Baidu, Zvents and Rediff
Solution 4: Project Voldemort
  • Voldemort is a big, distributed, persistent, fault-tolerant hash
  • Developed and used by Linkedin
  • Java based API
  • Reasonably performant (10-20k ops on commodity hardware)
  • Maintains replicated copies of data over multiple nodes and automatically handles server failures
  • Does not support range based querying
Solution 5: Tokyo Tyrant
  • Tokyo Tyrant is a layer on top of Tokyo Cabinet – a highly performant, persistent data strore (site claims over 2 million qps)
  • Tokyo Tyrant itself claims to deliver upwards of 58,000 qps
  • It supports multiple language bindings (Java, Perl, PHP etc)
  • Supports various data structures – hash, tree, B+tree, array, table etc
  • Supports caching
  • Tokyo Tyrant does not support active-active master-master replication, thus failing out on redundancy. It also does not support data partitioning out of the box
Solution 6: Postgres + Pl/Proxy + Replication (Slony / Continuent) + PGBouncer (connection pooler)
  • Postgres is an extremely mature RDBMS
  • Using PL/Proxy one can abstracte horizontal partitioning concerns out of the database layer and into an abstracted underlying layer
  • Using Slony or Continuent one can ensure that multiple copies of any set of rows exist at any given point in time (Synchronous or Async replication)
  • PGBouncer provides a light-weight connection pooler for PL/Proxy
  • Together this will satisfy all our requirements above
  • You may club Voldemort or memcached or redis with this to provide a caching layer
  • Postgres also has hooks for connecting to memcached that make cache population and invalidation easier
Research
A lot of reading went behind discovering the above solutions. The below links are a good start -
BigTable
Google App Engine
MySQL Cluster
Amazon SimpleDB
Cassandra
Hypertable
MongoDB
neo4j
Project Voldemort
Redis
Tokyo Cabinet
Lightcloud
Gigaspaces
Coherence
Postgres
Category : 0-cosmos | TechTalk