17 Aug, 2009
A Compendium of solutions for scaling a Data Store
Posted by Bhavin Turakhia | (5) Comments
Achieving infinite scalability on your data store is the holy grail of scaling an application. App servers are typically stateless and therefore a cinch to scale. This document serves as a comprehensive compendium on my thoughts and research on scaling a data store.
Requirements
- Inifinite Scalability
- High Availability (0% downtime)
- Data Redundancy
- High Performance
- Storage Flexibility – ability to store any type of data
- Query Flexibility – ability to perform simple gets, range based gets, range based updates, and possibly complex joins
Features that a solution must have to deliver the above Requirements
- Replication – Each unit of data should be copied to multiple nodes so that if an underlying node crashes there is no data loss
- Partitioning – Data should be divided across multiple nodes based on specific keys so that the data layer is infinitely scalable
- Online node addition – Solution should support adding new nodes online, with automated data distribution upon new node addition
- Load Balancing – Queries should be load balanced between nodes
- Persistence – Should have a data persistence layer so that data is not volatile
- Caching – Should support flexible in-memory caching for increased retrieval speeds
- Tree based Indexing – To support range based queries ideally one may need tree type indexes for keys on which range queries maybe made
- Provides BigTable – Googles implementation of a scalable database
- BigTable is not an RDBMS, but has a fairly flexible API that supports creative data fetching methods
- BigTable is distributed and self-balancing – scaling is no longer the application developers problem
- You will need to host your application on Google’s App Engine
- Your application needs to use the BigTable API for data storage
- MySQL NDB Cluster is a master-master, self-partitioned, replicated storage engine that technically seems to provides all the features listed above
- It also offers access via SQL or a high-performant native NDB API
- It seems like the holy grail of database scaling
- It however stores indexes entirely in memory – and has lesser flexibility w.r.t persistence
- I could not find much material on the performance of an NDB Cluster
- HyperTable is an opensource BigTable clone and provides essentially the same features as described in the BigTable whitepaper
- It is supported by Baidu, Zvents and Rediff
- Voldemort is a big, distributed, persistent, fault-tolerant hash
- Developed and used by Linkedin
- Java based API
- Reasonably performant (10-20k ops on commodity hardware)
- Maintains replicated copies of data over multiple nodes and automatically handles server failures
- Does not support range based querying
- Tokyo Tyrant is a layer on top of Tokyo Cabinet – a highly performant, persistent data strore (site claims over 2 million qps)
- Tokyo Tyrant itself claims to deliver upwards of 58,000 qps
- It supports multiple language bindings (Java, Perl, PHP etc)
- Supports various data structures – hash, tree, B+tree, array, table etc
- Supports caching
- Tokyo Tyrant does not support active-active master-master replication, thus failing out on redundancy. It also does not support data partitioning out of the box
- Postgres is an extremely mature RDBMS
- Using PL/Proxy one can abstracte horizontal partitioning concerns out of the database layer and into an abstracted underlying layer
- Using Slony or Continuent one can ensure that multiple copies of any set of rows exist at any given point in time (Synchronous or Async replication)
- PGBouncer provides a light-weight connection pooler for PL/Proxy
- Together this will satisfy all our requirements above
- You may club Voldemort or memcached or redis with this to provide a caching layer
- Postgres also has hooks for connecting to memcached that make cache population and invalidation easier
- The BigTable paper – http://labs.google.com/papers/bigtable.html
- A powerpoint that touches upon bigtable, gfs etc - http://cbcg.net/talks/googleinternals/
- Condor – a specialized workload management and job queue system – http://www.cs.wisc.edu/condor/description.html
- Notes on Jeff Dean’s talk at Univ of Washington on BigTable – http://andrewhitchcock.org/?post=214
- Video on bigtable – http://video.google.com/videoplay?docid=7278544055668715642&q=bigtable
- A blog post walkthrough of the bigtable paper – http://hnr.dnsalias.net/wordpress/2008/10/bigtable-googles-distributed-data-store/
- A description of Paxos – http://en.wikipedia.org/wiki/Paxos_algorithm
- Project Boxwood – Microsoft Research project for a scalable data layer – http://research.microsoft.com/en-us/projects/boxwood/default.aspx
- http://code.google.com/appengine/
- Campfire video introducing app engine – http://www.youtube.com/watch?v=3Ztr-HhWX1c
- Getting started guide – http://code.google.com/appengine/docs/python/gettingstarted/
- Using the datastore – http://code.google.com/appengine/docs/python/gettingstarted/usingdatastore.html
- http://neo4j.org/
- http://wiki.neo4j.org/content/Main_Page
- http://wiki.neo4j.org/content/Neo_Performance_Guide
- http://wiki.neo4j.org/content/FAQ
- http://dist.neo4j.org/neo-technology-introduction.pdf
- http://wiki.neo4j.org/content/Getting_Started_Guide
- http://wiki.neo4j.org/content/Getting_Started_In_One_Minute_Guide
- http://www.slony.info/
- http://www.slony.info/documentation/
- https://developer.skype.com/SkypeGarage/DbProjects/PlProxy
- https://developer.skype.com/SkypeGarage/DbProjects/PgBouncer
- http://www.continuent.com/solutions/overview
- http://www.continuent.com/images/stories/pdfs/tungsten%20overview%20white%20paper.pdf
16 Aug, 2009
Infinitely Scalable Infrastructure and RDBMSes
Posted by Bhavin Turakhia | (1) Comments
Since the last several months I have been spending part of my time on conceptualizing an abstracted infrastructure layer that is highly scalable and can be leveraged by any application without having to worry too much about it. I have researched and continue to research conventional and unconventional techniques – partitioning, clustering, replication, shared-nothing architectures, grid computing and so on. This article represents a sliver of my thoughts concerning scalability and RDBMSes -
The holy grail of scalability is to be able to scale your data store. And as data stores go, RDBMSes seem to be the predominant choice (though that is changing – refer http://bit.ly/2lnRet). RDBMSes by their very nature, due to the features they provide (ACID compliance, Transaction safety etc) tend to be difficult to easily scale. This has resulted in the recent mushrooming of data storage options that are feature-poor but scalable out of the box (eg Voldemort, HyperTable etc)
I wanted to chronicle the list of features that a standard RDBMS provides, that we take for granted, so that I have a reference of the features that one may have to compromise on w.r.t application development in favor of easier scalability -
- Range based selects and updates – Being able to fire queries on a table specifying a range of values (eg where age >35). Typically RDBMSes use B+ Tree based indexes which support range based row selection. This in turn allows one to fire range based queries.
- Transactions – In an RDBMS one can perform multiple operations within a transaction and ensure that all of them or none of them go through. This ensures data integrity
- B+ tree indexes
- Foreign key relationships and referential integrity
- Joins and nested selects
- Aggregations (sum, avg etc)
- Advanced scripting using non-native languages (java etc)
- Stored procedures (allow encapsulation of business logic in the database layer)
- Triggers
22 Jul, 2009
Column Oriented DBMS
Posted by Bhavin Turakhia | (1) Comments
Conventionally we take DBMS for granted as a structured data store that stores data in the form of rows. Infact most application developers can begin visualizing their data as rows in an RDBMS quite naturally.
While RDBMS serve the purposes of OLTP applications well, OLAP / data anlytics type applications have conventionally not been able to obtain the type of performance needed from RDBMses. This is where column oriented DBMS can help.
In the simplest form the difference between a conventional RDBMS and a column oriented database is that the latter stores data in a column form rather than a row form when persisted to disk. Another way to look at this is that the storage in a column oriented DBMS transposes the rows and columns of the storage in a conventional RDBMS.
For eg
| ID | Name | Age |
| 1 | Bhavin | 29 |
| 2 | Roger | 30 |
This would be persisted in a conventional RDBMS as follows -
1,Bhavin,29|2,Roger,30
In a column oriented DBMS this would be persisted as -
1,2|Bhavin,Roger|29,30
It is common knowledge that the slowest piece of a DB query is its disk seek time. While the RDBMS favors queries which require fetching all data of a given row, the latter model favors queries which require aggregates. For instance – count of all users with age >20, or sum of ages of all users, and so on. These type of queries will run much faster on a column oriented DBMS due to lesser seek time required to obtain the data.
OLAP and BI applications mostly consist of data aggregation and would therefore run faster on column oriented databases.
For a list of column-oriented DBMSes refer to http://en.wikipedia.org/wiki/Column-oriented_DBMS
16 Jul, 2008
MySQL vs Postgres
Posted by Bhavin Turakhia | (10) Comments
In my perpetual comparison between MySQL and Postgres I am beginning to lean towards MySQL offlate. There are many reasons, but a short list that is currently relevant to us is here -
- MySQL supports multiple backend storage engines providing more flexibility of choice. For instance one can choose MyISAM for tables where transactions and ACID compliance does not matter, and gain a performance advantage. Or one can use a Memory storage engine for temporary in-memory tables
- InnoDB supports optional MVCC, thus providing best of both worlds
- MySQL supports native replication and shared nothing clusters
- MySQL has better integration with memcached
- MySQL uses multi-threading as opposed to process-forking, making it less heavy
- More people are using MySQL than Postgres – eg Facebook, Youtube etc
- MySQL is now owned by Sun, and despite their recent lay-offs they are a company I respect
There are many other reasons, but currently these are the ones that are relevant to the products we are working on.









