<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Bhavin's Blog &#187; rdbms</title>
	<atom:link href="http://bhavin.directi.com/tag/rdbms/feed/" rel="self" type="application/rss+xml" />
	<link>http://bhavin.directi.com</link>
	<description></description>
	<lastBuildDate>Fri, 05 Aug 2011 12:27:37 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>A Compendium of solutions for scaling a Data Store</title>
		<link>http://bhavin.directi.com/a-compendium-on-scaling-your-da/</link>
		<comments>http://bhavin.directi.com/a-compendium-on-scaling-your-da/#comments</comments>
		<pubDate>Mon, 17 Aug 2009 22:12:15 +0000</pubDate>
		<dc:creator>Bhavin Turakhia</dc:creator>
				<category><![CDATA[0-cosmos]]></category>
		<category><![CDATA[TechTalk]]></category>
		<category><![CDATA[bigtable]]></category>
		<category><![CDATA[cassandra]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[hypertable]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[ndb]]></category>
		<category><![CDATA[postgres]]></category>
		<category><![CDATA[rdbms]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[simpledb]]></category>
		<category><![CDATA[voldemort]]></category>

		<guid isPermaLink="false">http://bhavin.directi.com/?p=198</guid>
		<description><![CDATA[Achieving infinite scalability on your data store is the holy grail of scaling an application. App servers are typically stateless and therefore a cinch to scale. This document serves as a comprehensive compendium on my thoughts and research on scaling a data store.
Requirements

Inifinite Scalability
High Availability (0% downtime)
Data Redundancy
High Performance
Storage Flexibility &#8211; ability to store any [...]]]></description>
			<content:encoded><![CDATA[<p><span style="font-size: small;">Achieving infinite scalability on your data store is the holy grail of scaling an application. App servers are typically stateless and therefore a cinch to scale. This document serves as a comprehensive compendium on my thoughts and research on scaling a data store.</span></p>
<p><strong><span style="font-size: small;">Requirements</span></strong></p>
<ul>
<li><span style="font-size: small;">Inifinite Scalability</span></li>
<li><span style="font-size: small;">High Availability (0% downtime)</span></li>
<li><span style="font-size: small;">Data Redundancy</span></li>
<li><span style="font-size: small;">High Performance</span></li>
<li><span style="font-size: small;">Storage Flexibility &#8211; ability to store any type of data</span></li>
<li><span style="font-size: small;">Query Flexibility &#8211; ability to perform simple gets, range based gets, range based updates, and possibly complex joins</span></li>
</ul>
<p><strong><span style="font-size: small;">Features that a solution must have to deliver the above Requirements</span></strong></p>
<ul>
<li><span style="font-size: small;">Replication &#8211; Each unit of data should be copied to multiple nodes so that if an underlying node crashes there is no data loss</span></li>
<li><span style="font-size: small;">Partitioning &#8211; Data should be divided across multiple nodes based on specific keys so that the data layer is infinitely scalable</span></li>
<li><span style="font-size: small;">Online node addition &#8211; Solution should support adding new nodes online, with automated data distribution upon new node addition</span></li>
<li><span style="font-size: small;">Load Balancing &#8211; Queries should be load balanced between nodes</span></li>
<li><span style="font-size: small;">Persistence &#8211; Should have a data persistence layer so that data is not volatile</span></li>
<li><span style="font-size: small;">Caching &#8211; Should support flexible in-memory caching for increased retrieval speeds</span></li>
<li><span style="font-size: small;">Tree based Indexing &#8211; To support range based queries ideally one may need tree type indexes for keys on which range queries maybe made</span></li>
</ul>
<div><strong><span style="font-size: small;">Proposed Solutions</span></strong></div>
<div><span style="font-size: small;">The below solutions are a result of researching a ton of options (refer Research seciton below). They also represent solutions that are practical to deploy and are being used in production. There are exotic variants that I came up with during my research but I have left them out.</span></div>
<div><span style="font-size: small;"><br />
</span></div>
<div><span style="text-decoration: underline;"><span style="font-size: small;">Solution 1: Google App Engine</span></span></div>
<div>
<ul>
<li><span style="font-size: small;">Provides BigTable &#8211; Googles implementation of a scalable database</span></li>
<li><span style="font-size: small;">BigTable is not an RDBMS, but has a fairly flexible API that supports creative data fetching methods</span></li>
<li><span style="font-size: small;">BigTable is distributed and self-balancing &#8211; scaling is no longer the application developers problem</span></li>
<li><span style="font-size: small;">You will need to host your application on Google&#8217;s App Engine</span></li>
<li><span style="font-size: small;">Your application needs to use the BigTable API for data storage</span></li>
</ul>
<div><span style="text-decoration: underline;"><span style="font-size: small;">Solution 2: MySQL NDB Cluster</span></span></div>
<div>
<ul>
<li><span style="font-size: small;">MySQL NDB Cluster is a master-master, self-partitioned, replicated storage engine that technically seems to provides all the features listed above</span></li>
<li><span style="font-size: small;">It also offers access via SQL or a high-performant native NDB API</span></li>
<li><span style="font-size: small;">It seems like the holy grail of database scaling</span></li>
<li><span style="font-size: small;">It however stores indexes entirely in memory &#8211; and has lesser flexibility w.r.t persistence</span></li>
<li><span style="font-size: small;">I could not find much material on the performance of an NDB Cluster</span></li>
</ul>
</div>
<div><span style="text-decoration: underline;"><span style="font-size: small;">Solution 3: HyperTable</span></span></div>
<div>
<ul>
<li><span style="font-size: small;">HyperTable is an opensource BigTable clone and provides essentially the same features as described in the BigTable whitepaper</span></li>
<li><span style="font-size: small;">It is supported by Baidu, Zvents and Rediff</span></li>
</ul>
<div><span style="text-decoration: underline;"><span style="font-size: small;">Solution 4: Project Voldemort</span></span></div>
<div>
<ul>
<li><span style="font-size: small;">Voldemort is a big, distributed, persistent, fault-tolerant hash</span></li>
<li><span style="font-size: small;">Developed and used by Linkedin</span></li>
<li><span style="font-size: small;">Java based API</span></li>
<li><span style="font-size: small;">Reasonably performant (10-20k ops on commodity hardware)</span></li>
<li><span style="font-size: small;">Maintains replicated copies of data over multiple nodes and automatically handles server failures</span></li>
<li><span style="font-size: small;">Does not support range based querying</span></li>
</ul>
<div><span style="text-decoration: underline;"><span style="font-size: small;">Solution 5: Tokyo Tyrant</span></span></div>
<div>
<ul>
<li><span style="font-size: small;">Tokyo Tyrant is a layer on top of Tokyo Cabinet &#8211; a highly performant, persistent data strore (site claims over 2 million qps)</span></li>
<li><span style="font-size: small;">Tokyo Tyrant itself claims to deliver upwards of 58,000 qps</span></li>
<li><span style="font-size: small;">It supports multiple language bindings (Java, Perl, PHP etc)</span></li>
<li><span style="font-size: small;">Supports various data structures &#8211; hash, tree, B+tree, array, table etc</span></li>
<li><span style="font-size: small;">Supports caching</span></li>
<li><span style="font-size: small;">Tokyo Tyrant does not support active-active master-master replication, thus failing out on redundancy. It also does not support data partitioning out of the box</span></li>
</ul>
</div>
</div>
</div>
<div><span style="text-decoration: underline;"><span style="font-size: small;">Solution 6: Postgres + Pl/Proxy + Replication (Slony / Continuent) + PGBouncer (connection pooler)</span></span></div>
<div>
<ul>
<li><span style="font-size: small;">Postgres is an extremely mature RDBMS</span></li>
<li><span style="font-size: small;">Using PL/Proxy one can abstracte horizontal partitioning concerns out of the database layer and into an abstracted underlying layer</span></li>
<li><span style="font-size: small;">Using Slony or Continuent one can ensure that multiple copies of any set of rows exist at any given point in time (Synchronous or Async replication)</span></li>
<li><span style="font-size: small;">PGBouncer provides a light-weight connection pooler for PL/Proxy</span></li>
<li><span style="font-size: small;">Together this will satisfy all our requirements above</span></li>
<li><span style="font-size: small;">You may club Voldemort or memcached or redis with this to provide a caching layer</span></li>
<li><span style="font-size: small;">Postgres also has hooks for connecting to memcached that make cache population and invalidation easier</span></li>
</ul>
</div>
<div><strong><span style="font-size: small;">Research</span></strong></div>
<div><span style="font-size: small;">A lot of reading went behind discovering the above solutions. The below links are a good start -</span></div>
<div>
<div><span style="text-decoration: underline;"><span style="font-size: small;">BigTable</span></span></div>
<div>
<ul>
<li><span style="font-size: small;">The BigTable paper &#8211; <a href="http://labs.google.com/papers/bigtable.html">http://labs.google.com/papers/bigtable.html</a></span></li>
<li><span style="font-size: small;">A powerpoint that touches upon bigtable, gfs etc -<a href="http://research.microsoft.com/en-us/projects/boxwood/default.aspx"> http://cbcg.net/talks/googleinternals/</a></span></li>
<li><span style="font-size: small;">Condor &#8211; a specialized workload management and job queue system &#8211; <a href="http://www.cs.wisc.edu/condor/description.html">http://www.cs.wisc.edu/condor/description.html</a></span></li>
<li><span style="font-size: small;">Notes on Jeff Dean&#8217;s talk at Univ of Washington on BigTable &#8211; <a href="http://andrewhitchcock.org/?post=214">http://andrewhitchcock.org/?post=214</a></span></li>
<li><span style="font-size: small;">Video on bigtable &#8211; <a href="http://video.google.com/videoplay?docid=7278544055668715642&amp;q=bigtable">http://video.google.com/videoplay?docid=7278544055668715642&amp;q=bigtable</a></span></li>
<li><span style="font-size: small;">A blog post walkthrough of the bigtable paper &#8211; <a href="http://hnr.dnsalias.net/wordpress/2008/10/bigtable-googles-distributed-data-store/">http://hnr.dnsalias.net/wordpress/2008/10/bigtable-googles-distributed-data-store/</a></span></li>
<li><span style="font-size: small;">A description of Paxos &#8211; <a href="http://en.wikipedia.org/wiki/Paxos_algorithm">http://en.wikipedia.org/wiki/Paxos_algorithm</a></span></li>
<li><span style="font-size: small;">Project Boxwood &#8211; Microsoft Research project for a scalable data layer &#8211; <a href="http://research.microsoft.com/en-us/projects/boxwood/default.aspx">http://research.microsoft.com/en-us/projects/boxwood/default.aspx</a></span></li>
</ul>
</div>
<div><span style="text-decoration: underline;"><span style="font-size: small;">Google App Engine</span></span></div>
<div>
<ul>
<li><a href="http://code.google.com/appengine/"><span style="font-size: small;">http://code.google.com/appengine/</span></a></li>
<li><span style="font-size: small;">Campfire video introducing app engine &#8211; <a href="http://www.youtube.com/watch?v=3Ztr-HhWX1c">http://www.youtube.com/watch?v=3Ztr-HhWX1c</a></span></li>
<li><span style="font-size: small;">Getting started guide &#8211; <a href="http://code.google.com/appengine/docs/python/gettingstarted/">http://code.google.com/appengine/docs/python/gettingstarted/</a></span></li>
<li><span style="font-size: small;">Using the datastore &#8211; <a href="http://code.google.com/appengine/docs/python/gettingstarted/usingdatastore.html">http://code.google.com/appengine/docs/python/gettingstarted/usingdatastore.html</a></span></li>
</ul>
</div>
<div><span style="text-decoration: underline;"><span style="font-size: small;">MySQL Cluster</span></span></div>
<div>
<ul>
<li><span style="font-size: small;"><a href="http://mysql.com/products/database/cluster/ and all whitepapers and webinars linked therefrom">http://mysql.com/products/database/cluster/ and all whitepapers and webinars linked therefrom</a> (4 of them)</span></li>
<li><a href="http://en.wikipedia.org/wiki/MySQL_Cluster"><span style="font-size: small;">http://en.wikipedia.org/wiki/MySQL_Cluster</span></a></li>
<li><span style="font-size: small;">Complete Documentation &#8211; <a href="http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster.html">http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster.html</a></span></li>
</ul>
</div>
<div><span style="text-decoration: underline;"><span style="font-size: small;">Amazon SimpleDB</span></span></div>
<div>
<ul>
<li><a href="http://aws.amazon.com/simpledb/"><span style="font-size: small;">http://aws.amazon.com/simpledb/</span></a></li>
<li><a href="http://www.sriramkrishnan.com/blog/2007/12/amazon-simpledb-technical-overview.html"><span style="font-size: small;">http://www.sriramkrishnan.com/blog/2007/12/amazon-simpledb-technical-overview.html</span></a></li>
</ul>
</div>
<div><span style="text-decoration: underline;"><span style="font-size: small;">Cassandra</span></span></div>
<div>
<ul>
<li><a href="http://incubator.apache.org/cassandra/"><span style="font-size: small;">http://incubator.apache.org/cassandra/</span></a></li>
<li><a href="http://wiki.apache.org/cassandra/DataModel"><span style="font-size: small;">http://wiki.apache.org/cassandra/DataModel</span></a></li>
</ul>
</div>
<div><span style="text-decoration: underline;"><span style="font-size: small;">Hypertable</span></span></div>
<div>
<ul>
<li><a href="http://hypertable.org/"><span style="font-size: small;">http://hypertable.org/</span></a></li>
<li><a href="http://hypertable.org/documentation.html"><span style="font-size: small;">http://hypertable.org/documentation.html</span></a></li>
</ul>
</div>
<div><span style="text-decoration: underline;"><span style="font-size: small;">MongoDB</span></span></div>
<div>
<ul>
<li><a href="http://www.mongodb.org/"><span style="font-size: small;">http://www.mongodb.org/</span></a></li>
</ul>
</div>
<div><span style="text-decoration: underline;"><span style="font-size: small;">neo4j</span></span></div>
<div>
<ul>
<li><a href="http://neo4j.org/"><span style="font-size: small;">http://neo4j.org/</span></a></li>
<li><a href="http://wiki.neo4j.org/content/Main_Page"><span style="font-size: small;">http://wiki.neo4j.org/content/Main_Page</span></a></li>
<li><a href="http://wiki.neo4j.org/content/Neo_Performance_Guide"><span style="font-size: small;">http://wiki.neo4j.org/content/Neo_Performance_Guide</span></a></li>
<li><a href="http://wiki.neo4j.org/content/FAQ"><span style="font-size: small;">http://wiki.neo4j.org/content/FAQ</span></a></li>
<li><a href="http://dist.neo4j.org/neo-technology-introduction.pdf"><span style="font-size: small;">http://dist.neo4j.org/neo-technology-introduction.pdf</span></a></li>
<li><a href="http://wiki.neo4j.org/content/Getting_Started_Guide"><span style="font-size: small;">http://wiki.neo4j.org/content/Getting_Started_Guide</span></a></li>
<li><a href="http://wiki.neo4j.org/content/Getting_Started_In_One_Minute_Guide"><span style="font-size: small;">http://wiki.neo4j.org/content/Getting_Started_In_One_Minute_Guide</span></a></li>
</ul>
</div>
<div><span style="text-decoration: underline;"><span style="font-size: small;">Project Voldemort</span></span></div>
<div>
<ul>
<li><a href="http://project-voldemort.com/"><span style="font-size: small;">http://project-voldemort.com/</span></a></li>
</ul>
</div>
<div><span style="text-decoration: underline;"><span style="font-size: small;">Redis</span></span></div>
<div>
<ul>
<li><a href="http://code.google.com/p/redis/"><span style="font-size: small;">http://code.google.com/p/redis/</span></a></li>
<li><a href="http://code.google.com/p/redis/wiki/CommandReference"><span style="font-size: small;">http://code.google.com/p/redis/wiki/CommandReference</span></a></li>
</ul>
</div>
<div><span style="text-decoration: underline;"><span style="font-size: small;">Tokyo Cabinet</span></span></div>
<div>
<ul>
<li><a href="http://tokyocabinet.sourceforge.net/"><span style="font-size: small;">http://tokyocabinet.sourceforge.net/</span></a></li>
</ul>
</div>
<div><span style="text-decoration: underline;"><span style="font-size: small;">Lightcloud</span></span></div>
<div>
<ul>
<li><a href="http://opensource.plurk.com/LightCloud/"><span style="font-size: small;">http://opensource.plurk.com/LightCloud/</span></a></li>
<li><a href="http://highscalability.com/are-cloud-based-memory-architectures-next-big-thing"><span style="font-size: small;">http://highscalability.com/are-cloud-based-memory-architectures-next-big-thing</span></a></li>
<li><a href="http://opensource.plurk.com/LightCloud/Design_spec/"><span style="font-size: small;">http://opensource.plurk.com/LightCloud/Design_spec/</span></a></li>
</ul>
</div>
<div><span style="text-decoration: underline;"><span style="font-size: small;">Gigaspaces</span></span></div>
<div>
<ul>
<li><a href="http://www.gigaspaces.com/xap"><span style="font-size: small;">http://www.gigaspaces.com/xap</span></a></li>
<li><a href="http://www.gigaspaces.com/files/InsideXAP.pdf"><span style="font-size: small;">http://www.gigaspaces.com/files/InsideXAP.pdf</span></a></li>
</ul>
</div>
<div><span style="text-decoration: underline;"><span style="font-size: small;">Coherence</span></span></div>
<div>
<ul>
<li><a href="http://www.oracle.com/technology/products/coherence/index.html"><span style="font-size: small;">http://www.oracle.com/technology/products/coherence/index.html</span></a></li>
<li><a href="http://www.oracle.com/products/middleware/coherence/docs/oracle-coherence-data-grid-datasheet.pdf"><span style="font-size: small;">http://www.oracle.com/products/middleware/coherence/docs/oracle-coherence-data-grid-datasheet.pdf</span></a></li>
<li><a href="http://www.oracle.com/technology/products/coherence/coherencedatagrid/coherence_solutions.html"><span style="font-size: small;">http://www.oracle.com/technology/products/coherence/coherencedatagrid/coherence_solutions.html</span></a></li>
</ul>
</div>
<div><span style="text-decoration: underline;"><span style="font-size: small;">Postgres</span></span></div>
<div>
<ul>
<li><a href="http://www.slony.info/"><span style="font-size: small;">http://www.slony.info/</span></a></li>
<li><a href="http://www.slony.info/documentation/"><span style="font-size: small;">http://www.slony.info/documentation/</span></a></li>
<li><a href="https://developer.skype.com/SkypeGarage/DbProjects/PlProxy"><span style="font-size: small;">https://developer.skype.com/SkypeGarage/DbProjects/PlProxy</span></a></li>
<li><a href="https://developer.skype.com/SkypeGarage/DbProjects/PgBouncer"><span style="font-size: small;">https://developer.skype.com/SkypeGarage/DbProjects/PgBouncer</span></a></li>
<li><a href="http://www.continuent.com/solutions/overview"><span style="font-size: small;">http://www.continuent.com/solutions/overview</span></a></li>
<li><a href="http://www.continuent.com/images/stories/pdfs/tungsten%20overview%20white%20paper.pdf"><span style="font-size: small;">http://www.continuent.com/images/stories/pdfs/tungsten%20overview%20white%20paper.pdf</span></a></li>
</ul>
</div>
</div>
</div>
 <img src="http://bhavin.directi.com/wp-content/plugins/feed-statistics.php?view=1&post_id=198" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://bhavin.directi.com/a-compendium-on-scaling-your-da/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Infinitely Scalable Infrastructure and RDBMSes</title>
		<link>http://bhavin.directi.com/infinitely-scalable-infrastructure-and-rdbmses/</link>
		<comments>http://bhavin.directi.com/infinitely-scalable-infrastructure-and-rdbmses/#comments</comments>
		<pubDate>Mon, 17 Aug 2009 01:04:31 +0000</pubDate>
		<dc:creator>Bhavin Turakhia</dc:creator>
				<category><![CDATA[0-cosmos]]></category>
		<category><![CDATA[TechTalk]]></category>
		<category><![CDATA[rdbms]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://bhavin.directi.com/?p=194</guid>
		<description><![CDATA[Since the last several months I have been spending part of my time on conceptualizing an abstracted infrastructure layer that is highly scalable and can be leveraged by any application without having to worry too much about it. I have researched and continue to research conventional and unconventional techniques &#8211; partitioning, clustering, replication, shared-nothing architectures, [...]]]></description>
			<content:encoded><![CDATA[<p><span style="font-family: Verdana; ">Since the last several months I have been spending part of my time on conceptualizing an abstracted infrastructure layer that is highly scalable and can be leveraged by any application without having to worry too much about it. I have researched and continue to research conventional and unconventional techniques &#8211; partitioning, clustering, replication, shared-nothing architectures, grid computing and so on. This article represents a sliver of my thoughts concerning scalability and RDBMSes -</span></p>
<p><span style="font-family: Verdana; ">The holy grail of scalability is to be able to scale your data store. And as data stores go, RDBMSes seem to be the predominant choice (though that is changing &#8211; refer </span><a href="http://bit.ly/2lnRet"><span style="font-family: Verdana; ">http://bit.ly/2lnRet</span></a><span style="font-family: Verdana; ">). RDBMSes by their very nature, due to the features they provide (ACID compliance, Transaction safety etc) tend to be difficult to easily scale. This has resulted in the recent mushrooming of data storage options that are feature-poor but scalable out of the box (eg Voldemort, HyperTable etc)</span></p>
<p><span style="font-family: Verdana; ">I wanted to chronicle the list of features that a standard RDBMS provides, that we take for granted, so that I have a reference of the features that one may have to compromise on w.r.t application development in favor of easier scalability -</span></p>
<ul>
<li><span style="font-family: Verdana; "><em>Range based selects and updates</em><strong><span style="font-weight: normal; "> &#8211; Being able to fire queries on a table specifying a range of values (eg where age &gt;35). Typically RDBMSes use B+ Tree based indexes which support range based row selection. This in turn allows one to fire range based queries.</span></strong></span></li>
<li><span style="font-family: Verdana; "><em>Transactions</em> &#8211; In an RDBMS one can perform multiple operations within a transaction and ensure that all of them or none of them go through. This ensures data integrity</span></li>
<li><span style="font-family: Verdana; ">B+ tree indexes</span></li>
<li><span style="font-family: Verdana; ">Foreign key relationships and referential integrity</span></li>
<li><span style="font-family: Verdana; ">Joins and nested selects</span></li>
<li><span style="font-family: Verdana; ">Aggregations (sum, avg etc)</span></li>
<li><span style="font-family: Verdana; ">Advanced scripting using non-native languages (java etc)</span></li>
<li><span style="font-family: Verdana; ">Stored procedures (allow encapsulation of business logic in the database layer)</span></li>
<li><span style="font-family: Verdana; ">Triggers</span></li>
</ul>
<p class="MsoNormal">
 <img src="http://bhavin.directi.com/wp-content/plugins/feed-statistics.php?view=1&post_id=194" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://bhavin.directi.com/infinitely-scalable-infrastructure-and-rdbmses/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Column Oriented DBMS</title>
		<link>http://bhavin.directi.com/column-oriented-dbms/</link>
		<comments>http://bhavin.directi.com/column-oriented-dbms/#comments</comments>
		<pubDate>Wed, 22 Jul 2009 08:41:19 +0000</pubDate>
		<dc:creator>Bhavin Turakhia</dc:creator>
				<category><![CDATA[0-cosmos]]></category>
		<category><![CDATA[TechTalk]]></category>
		<category><![CDATA[dbms]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[rdbms]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://bhavin.directi.com/?p=174</guid>
		<description><![CDATA[Conventionally we take DBMS for granted as a structured data store that stores data in the form of rows. Infact most application developers can begin visualizing their data as rows in an RDBMS quite naturally.
While RDBMS serve the purposes of OLTP applications well, OLAP / data anlytics type applications have conventionally not been able to [...]]]></description>
			<content:encoded><![CDATA[<p>Conventionally we take DBMS for granted as a structured data store that stores data in the form of rows. Infact most application developers can begin visualizing their data as rows in an RDBMS quite naturally.</p>
<p>While RDBMS serve the purposes of OLTP applications well, OLAP / data anlytics type applications have conventionally not been able to obtain the type of performance needed from RDBMses. This is where <a href="http://en.wikipedia.org/wiki/Column-oriented_DBMS">column oriented DBMS</a> can help.</p>
<p>In the simplest form the difference between a conventional RDBMS and a column oriented database is that the latter stores data in a column form rather than a row form when persisted to disk. Another way to look at this is that the storage in a column oriented DBMS transposes the rows and columns of the storage in a conventional RDBMS.</p>
<p>For eg</p>
<table border="0">
<tbody>
<tr>
<td><strong>ID</strong></td>
<td><strong>Name</strong></td>
<td><strong>Age</strong></td>
</tr>
<tr>
<td>1</td>
<td>Bhavin</td>
<td>29</td>
</tr>
<tr>
<td>2</td>
<td>Roger</td>
<td>30</td>
</tr>
</tbody>
</table>
<p>This would be persisted in a conventional RDBMS as follows -</p>
<p>1,Bhavin,29|2,Roger,30</p>
<p>In a column oriented DBMS this would be persisted as -</p>
<p>1,2|Bhavin,Roger|29,30</p>
<p>It is common knowledge that the slowest piece of a DB query is its disk seek time. While the RDBMS favors queries which require fetching all data of a given row, the latter model favors queries which require aggregates. For instance &#8211; count of all users with age &gt;20, or sum of ages of all users, and so on. These type of queries will run much faster on a column oriented DBMS due to lesser seek time required to obtain the data.</p>
<p>OLAP and BI applications mostly consist of data aggregation and would therefore run faster on column oriented databases.</p>
<p>For a list of column-oriented DBMSes refer to <a href="http://en.wikipedia.org/wiki/Column-oriented_DBMS">http://en.wikipedia.org/wiki/Column-oriented_DBMS</a></p>
 <img src="http://bhavin.directi.com/wp-content/plugins/feed-statistics.php?view=1&post_id=174" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://bhavin.directi.com/column-oriented-dbms/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>MySQL vs Postgres</title>
		<link>http://bhavin.directi.com/mysql-vs-postgres/</link>
		<comments>http://bhavin.directi.com/mysql-vs-postgres/#comments</comments>
		<pubDate>Wed, 16 Jul 2008 09:35:52 +0000</pubDate>
		<dc:creator>Bhavin Turakhia</dc:creator>
				<category><![CDATA[TechTalk]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[postgres]]></category>
		<category><![CDATA[rdbms]]></category>

		<guid isPermaLink="false">http://bhavin.directi.com/2008/07/16/mysql-vs-postgres/</guid>
		<description><![CDATA[In my perpetual comparison between MySQL and Postgres I am beginning to lean towards MySQL offlate. There are many reasons, but a short list that is currently relevant to us is here -

MySQL supports multiple backend storage engines providing more flexibility of choice. For instance one can choose MyISAM for tables where transactions and ACID [...]]]></description>
			<content:encoded><![CDATA[<p>In my perpetual comparison between MySQL and Postgres I am beginning to lean towards MySQL offlate. There are many reasons, but a short list that is currently relevant to us is here -</p>
<ul>
<li>MySQL supports <a href="http://dev.mysql.com/doc/refman/6.0/en/storage-engines.html">multiple backend storage engines</a> providing more flexibility of choice. For instance one can choose MyISAM for tables where transactions and ACID compliance does not matter, and gain a performance advantage. Or one can use a <a href="http://dev.mysql.com/doc/refman/6.0/en/memory-storage-engine.html">Memory</a> storage engine for temporary in-memory tables</li>
<li>InnoDB supports optional MVCC, thus providing best of both worlds</li>
<li>MySQL supports native replication and shared nothing clusters</li>
<li>MySQL has better integration with <a href="http://www.mysql.com/products/enterprise/memcached.html">memcached</a></li>
<li>MySQL uses multi-threading as opposed to process-forking, making it less heavy</li>
<li>More people are using MySQL than Postgres &#8211; eg Facebook, Youtube etc</li>
<li>MySQL is now owned by Sun, and despite their recent lay-offs they are a company I respect</li>
</ul>
<p>There are many other reasons, but currently these are the ones that are relevant to the products we are working on.</p>
 <img src="http://bhavin.directi.com/wp-content/plugins/feed-statistics.php?view=1&post_id=28" width="1" height="1" style="display: none;" />]]></content:encoded>
			<wfw:commentRss>http://bhavin.directi.com/mysql-vs-postgres/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>

