17 Aug, 2009
A Compendium of solutions for scaling a Data Store
Posted by Bhavin Turakhia | (5) Comments
Achieving infinite scalability on your data store is the holy grail of scaling an application. App servers are typically stateless and therefore a cinch to scale. This document serves as a comprehensive compendium on my thoughts and research on scaling a data store.
Requirements
- Inifinite Scalability
- High Availability (0% downtime)
- Data Redundancy
- High Performance
- Storage Flexibility – ability to store any type of data
- Query Flexibility – ability to perform simple gets, range based gets, range based updates, and possibly complex joins
Features that a solution must have to deliver the above Requirements
- Replication – Each unit of data should be copied to multiple nodes so that if an underlying node crashes there is no data loss
- Partitioning – Data should be divided across multiple nodes based on specific keys so that the data layer is infinitely scalable
- Online node addition – Solution should support adding new nodes online, with automated data distribution upon new node addition
- Load Balancing – Queries should be load balanced between nodes
- Persistence – Should have a data persistence layer so that data is not volatile
- Caching – Should support flexible in-memory caching for increased retrieval speeds
- Tree based Indexing – To support range based queries ideally one may need tree type indexes for keys on which range queries maybe made
Proposed Solutions
The below solutions are a result of researching a ton of options (refer Research seciton below). They also represent solutions that are practical to deploy and are being used in production. There are exotic variants that I came up with during my research but I have left them out.
Solution 1: Google App Engine
- Provides BigTable – Googles implementation of a scalable database
- BigTable is not an RDBMS, but has a fairly flexible API that supports creative data fetching methods
- BigTable is distributed and self-balancing – scaling is no longer the application developers problem
- You will need to host your application on Google’s App Engine
- Your application needs to use the BigTable API for data storage
Solution 2: MySQL NDB Cluster
- MySQL NDB Cluster is a master-master, self-partitioned, replicated storage engine that technically seems to provides all the features listed above
- It also offers access via SQL or a high-performant native NDB API
- It seems like the holy grail of database scaling
- It however stores indexes entirely in memory – and has lesser flexibility w.r.t persistence
- I could not find much material on the performance of an NDB Cluster
Solution 3: HyperTable
- HyperTable is an opensource BigTable clone and provides essentially the same features as described in the BigTable whitepaper
- It is supported by Baidu, Zvents and Rediff
Solution 4: Project Voldemort
- Voldemort is a big, distributed, persistent, fault-tolerant hash
- Developed and used by Linkedin
- Java based API
- Reasonably performant (10-20k ops on commodity hardware)
- Maintains replicated copies of data over multiple nodes and automatically handles server failures
- Does not support range based querying
Solution 5: Tokyo Tyrant
- Tokyo Tyrant is a layer on top of Tokyo Cabinet – a highly performant, persistent data strore (site claims over 2 million qps)
- Tokyo Tyrant itself claims to deliver upwards of 58,000 qps
- It supports multiple language bindings (Java, Perl, PHP etc)
- Supports various data structures – hash, tree, B+tree, array, table etc
- Supports caching
- Tokyo Tyrant does not support active-active master-master replication, thus failing out on redundancy. It also does not support data partitioning out of the box
Solution 6: Postgres + Pl/Proxy + Replication (Slony / Continuent) + PGBouncer (connection pooler)
- Postgres is an extremely mature RDBMS
- Using PL/Proxy one can abstracte horizontal partitioning concerns out of the database layer and into an abstracted underlying layer
- Using Slony or Continuent one can ensure that multiple copies of any set of rows exist at any given point in time (Synchronous or Async replication)
- PGBouncer provides a light-weight connection pooler for PL/Proxy
- Together this will satisfy all our requirements above
- You may club Voldemort or memcached or redis with this to provide a caching layer
- Postgres also has hooks for connecting to memcached that make cache population and invalidation easier
Research
A lot of reading went behind discovering the above solutions. The below links are a good start -
BigTable
- The BigTable paper – http://labs.google.com/papers/bigtable.html
- A powerpoint that touches upon bigtable, gfs etc - http://cbcg.net/talks/googleinternals/
- Condor – a specialized workload management and job queue system – http://www.cs.wisc.edu/condor/description.html
- Notes on Jeff Dean’s talk at Univ of Washington on BigTable – http://andrewhitchcock.org/?post=214
- Video on bigtable – http://video.google.com/videoplay?docid=7278544055668715642&q=bigtable
- A blog post walkthrough of the bigtable paper – http://hnr.dnsalias.net/wordpress/2008/10/bigtable-googles-distributed-data-store/
- A description of Paxos – http://en.wikipedia.org/wiki/Paxos_algorithm
- Project Boxwood – Microsoft Research project for a scalable data layer – http://research.microsoft.com/en-us/projects/boxwood/default.aspx
Google App Engine
- http://code.google.com/appengine/
- Campfire video introducing app engine – http://www.youtube.com/watch?v=3Ztr-HhWX1c
- Getting started guide – http://code.google.com/appengine/docs/python/gettingstarted/
- Using the datastore – http://code.google.com/appengine/docs/python/gettingstarted/usingdatastore.html
MySQL Cluster
Amazon SimpleDB
Cassandra
Hypertable
MongoDB
neo4j
- http://neo4j.org/
- http://wiki.neo4j.org/content/Main_Page
- http://wiki.neo4j.org/content/Neo_Performance_Guide
- http://wiki.neo4j.org/content/FAQ
- http://dist.neo4j.org/neo-technology-introduction.pdf
- http://wiki.neo4j.org/content/Getting_Started_Guide
- http://wiki.neo4j.org/content/Getting_Started_In_One_Minute_Guide
Project Voldemort
Redis
Tokyo Cabinet
Lightcloud
Gigaspaces
Coherence
Postgres
- http://www.slony.info/
- http://www.slony.info/documentation/
- https://developer.skype.com/SkypeGarage/DbProjects/PlProxy
- https://developer.skype.com/SkypeGarage/DbProjects/PgBouncer
- http://www.continuent.com/solutions/overview
- http://www.continuent.com/images/stories/pdfs/tungsten%20overview%20white%20paper.pdf









