0-cosmos

11 Jan, 2010

The best programmers in the country battle it out at Directiplex

Posted by Bhavin Turakhia | (4) Comments

On January 10th 2010, we witnessed the awesomest programming teams from campuses all over India battle it out in person at our Mumbai headquarters of Directi. 639 teams of three each, had registered for the CodeChef Campus SnackDown to be crowned as the best coders in the country. The first round of the SnackDown took place on 21st November 2009. We then flew the top 7 teams from various corners of India to Mumbai to prove their mettle.

I was most excited with the results – the winning team comprised of the youngest programmers of the lot – three juniors in their 2nd and 3rd year at IIIT Hyderabad, who overtook their mentors and seniors and took the first place :) . You will find details about the winners and a recap of the events in the codechef blog post here

Category : 0-cosmos | Directi

31 Dec, 2009

Guide on GUIDs

Posted by Bhavin Turakhia | (4) Comments

One of the seemingly trivial yet daunting challenges to scaling a datastore is the prevalence of auto-increment IDs to represent unique records in a database. Since any scaling involves horizontal partitioning of data, thus distributing inserts, how does one ensure uniqueness with respect to generation of IDs on these independent machines. One replacement method is using GUIDs (or UUIDs) which is nothing more than a randomly generated 128 bit number (there are various methods of generating one).

The uniqueness guarantee comes from the extremely low probability of two randomly generated 128 bit numbers ever colliding. Just to give you a sense of the size of the space - If a computer was to generate a new GUID every milli second, it would take 10790283070806014188970529154.99 years to generate all GUIDs. That is roughly 83 million billion times the estimated age of the universe.

Is a GUID truly unique

Mathematically speaking no. Since the GUID space consists of 2^128 possibilities, one cannot generate 2^128+1 unique GUIDs. Since the time taken to generate all combinations is so high the probability of a potential collision however within an application space is quite low. However this probability increases as the space of generated GUIDs increase, due to the birthday paradox. As per the birthday paradox in a sample set of n values from a total space of s, the probability that there is atleast one collision is given by the formula – P =  (s! / s^n * (s-n)!).

Applying this formula to a space of 2^128 values, the probability of atleast one collision becomes non-trivial when the number of values reach about 10^17 to 10^18 (about 0.001% to 0.14%).

Advantages of using GUIDs

* no centralization
* obfuscation of id
  • Globally unique without central generation. Allows easier partitioning of data without having to rely on a central auto-incrementer
  • GUIDs are obfuscated and cannot be guessed. Auto-increment IDs have a disadvantage in that one can guess subsequent IDs given a starting point. This allows attacks such as data-scraping and potentially even DOS attacks by simply querying a service for incrementing IDs from a starting point
  • Can be generated by the middle tier as opposed to the data layer

Disadvantages of using GUIDs

  • Take additional processing power to generate
  • Do not index as easily as smaller int values, thus increasing time taken for standard CRUD operations
  • Take up additional space (4 times as much – 16 bytes versus 4)
  • Can result in data and index fragmentation if a proper indexing mechanism is not chosen
  • Can be un-intutive

References

Category : 0-cosmos | TechTalk

29 Dec, 2009

Investigating Message Queueing systems

Posted by Bhavin Turakhia | (6) Comments

In my constant quest for creating scalable systems and architecture a robust message queuing system was the missing link. I have begun reviewing some of the available options. This is a rough list of some of the interesting links I came across during this process -

Watch this space for more info …
http://wiki.secondlife.com/wiki/Message_Queue_Evaluation_Notes
The most detailed comparison of queuing systems and their advantages and disadvantages. Covers RabbitMQ and Apache qpid in detail and then several others. The most comprehensive list of message queueing systems.
http://stackoverflow.com/questions/731233/activemq-or-rabbitmq-or-zeromq-or
a list of links comparing various queueing systems
http://www.unlimitednovelty.com/2009/04/twitter-blaming-ruby-for-their-mistakes.html
A detailed critique on how twitter selected its message queue implementaions with comments from the developers at twitter and why they wrote their own queue. Do read the comments. They have a ton of meat
http://gojko.net/2009/03/16/qcon-london-2009-upgrading-twitter-without-service-disruptions/
Talks a bit about how twitter uses a messaging framework at the backend
http://blog.evanweaver.com/articles/2009/03/13/qcon-presentation/
Evan Weaver’s talk on twitters scalability issues and how they were solved
http://www.zeromq.org/whitepapers:brokerless
An interesting paper on differences between brokerless message queues and the advantages thereof
Category : 0-cosmos | TechTalk

25 Dec, 2009

The evolution of push email

Posted by Bhavin Turakhia | (11) Comments

I have been meaning to compile research on push technologies and the evolution of push email since a while. A conversation with ramki on how gmail offers push email using Microsoft Direct Push triggered my research and here is a summarized compilation of my findings

Blackberry

Blackberry works by downloading your email to its NOC. Whenever it finds a new mail it pushes it out to your cellphone. This is why Blackberry requires you to subscribe to a specific blackberry plan through your Telco. If you have a blackberry plan, your Telco forwards your registration to Blackberry and whenever you are online, Blackberry knows how to find you. Whenever the Blackberry NOC downloads a new mail from your mail servers it pushes the mail out to your cellphone through your Telco. This last bit is proprietary and presumably requires support from the Telcos end. Since a data-push is made to your cellphone only when there is data to be pushed, your battery life is conserved

References

Microsoft AUTD

This was the earlier version of Microsoft’s push technology. Basically Microsoft Exchange (or any mail server) is configured to simply send out a special SMS to your cellphone number each time new data needs to be pushed to your cellphone, and your cellphone is not currently connected. This SMS does not get displayed in your inbox, but rather triggers the mail application (or any other applicable app) in the background to sync with the server. This mechanism requires the use of an SMS gateway and/or possible support from your Telco to be able to send out these type of SMSes.

References

Microsoft Direct Push

This is the latest version of Microsoft’s push technology. It sounded esoteric, but upon close investigation it turned out to be a fancy microsoft name for a technology that has existed for nearly a decade viz. Comet. The way this works is as follows -

  • The application on your mobile (lets say your email client) makes a Comet HTTP request to an http server. Along with this request it sends the server a timeout value – say T=15 minutes
  • If in the next 15 minutes the server receives a new mail, the server immediately sends an HTTP response with a status that results in the client syncing with the server and then once again issuing a long poll http request
  • If in the next 15 minutes the server receives no new mail, the server must at the end of 15 minutes send an empty status response to the client, upon receiving which, the client once again initiates a new HTTP request
  • Since this method requires a poll to the server every ‘x’ minutes it typically consumes more battery than Blackberry but the difference should not be too high
  • Since the Telco en route may have its own enfroced timeout values on any http connections, it may disconnect a long-running HTTP connection
  • Therefore the actual timeout time must be negotiated between the client handset and the server during bootstrapping by attempting higher and higher timeout values until the highest value that the Telco permits is discovered
  • The HTTP connection may also get dropped due to network failures and switch-overs. Clients can detect this at the TCP/IP level and re-attempt connections. I am not entirely sure about this part, but the HTTP client presumably also uses TCP Keep alive internally to ensure the TCP connection remains alive
  • One may want to lookup the actual port numbers and protocol implementation of Microsoft Direct Push, since it is likely that most Telcos support Microsoft Direct Push and therefore if one was to emulate their push service on the same server port it is likely that Telcos would let the long-running HTTP connection persist
  • The advantage of this method is it does not require any special Blackberry plan

References

Misc References

Category : 0-cosmos | TechTalk

10 Dec, 2009

A detailed primer on building cross platform mobile applications

Posted by Bhavin Turakhia | (9) Comments

I finally had a few hours today morning to wrap up my study on comparison of mobile application platforms that allow developing cross-device applications easily using familiar technologies. Here is a quick braindump of all the links and resources I went through -

Rhomobile

Phone gap

  • http://phonegap.com/
  • Check the video on their site
  • Notes
    • Fully open source and free
    • Code written in html+javascript
    • Supports iphone, blackberry and android

Pyxis Mobile

  • http://pyxismobile.com/platform/technical-overview/
  • Build one configuration and deploy to BlackBerry, iPhone, and Windows Mobile all at the same time
  • Skinning, scripting, localized languages, complex workflow management, push, hotkeys, mapping & LBS, camera support, signature capture, GUI calendar, disambiguation, hotkeys, and much more

Titanium Mobile

Quick Connect

Comparison sites and articles

Some others

Category : 0-cosmos | TechTalk

14 Nov, 2009

RAM Speed

Posted by Bhavin Turakhia | (6) Comments

To test the speed of RAM, I got Ramki to run a small program that writes a set of bytes into memory a billion times and ran 4 instances of it on a dual proc quad core machine. Below are the results of running four instances of the program simultaneously.

Result

output.1:       User time (seconds): 545.99
output.1:       System time (seconds): 1.33
output.1:       Elapsed (wall clock) time (h:mm:ss or m:ss): 9:07.38
output.1:       Involuntary context switches: 820

output.2:       User time (seconds): 250.90
output.2:       System time (seconds): 1.18
output.2:       Elapsed (wall clock) time (h:mm:ss or m:ss): 4:12.12
output.2:       Involuntary context switches: 378

output.3:       User time (seconds): 250.30
output.3:       System time (seconds): 1.15
output.3:       Elapsed (wall clock) time (h:mm:ss or m:ss): 4:11.49
output.3:       Involuntary context switches: 373

output.4:       User time (seconds): 563.62
output.4:       System time (seconds): 1.31
output.4:       Elapsed (wall clock) time (h:mm:ss or m:ss): 9:25.00
output.4:       Involuntary context switches: 845

Observations

  • The write speed was between 0.25 seconds per million writes to 0.55 seconds
  • Output.2 and .3 took half the time as that of .1 or .4
  • Don’t have a specific theory on why 2 of the cores did better than the other two
  • No processor affinity was set, and the processes were being scheduled on random processors after every context switch.
  • Seemingly the processes were accessing RAM simultaneously. In my limited knowledge that could mean a few things – Multi-channel FSB (Dual) and additionally while oneprocess was computing stuff the other processes could access the FSB. The program was using lrand48 to generate a random number to write data to random locations so as to ensure that we do not rely too much on the L1/L2 cache

Some reading

Category : 0-cosmos | TechTalk

4 Nov, 2009

Obese Footers :)

Posted by Bhavin Turakhia | (7) Comments

Fat footers are no longer an emerging trend and have rapidly become a standard navigation paradigm. This short post contains a bunch of useful links I gathered while researching “fat footers” (try and say that 5 times in rapid succession :) ) -

Feed your footers away!!

Category : 0-cosmos | TechTalk

31 Oct, 2009

My Ideal layered distributed clustered redundant self healing Filesystem

Posted by Bhavin Turakhia | (4) Comments

This is a collection of notes on the features that my dream filesystem would support -

  • multi-Layered storage – ability to support layers of slower to faster disks and move data at block and/or file level between them atomically, keeping the most frequently read data on the fastest disks. If there is block level support for this it would be a boon for databases, where frequently accessed pages could be kept in faster SSDs and less frequently accessed pages would be stored on slower SATA drives
  • Distributed – ability to have multiple clients participate and access a single virtual storage device
  • Replicated – each file (or block) is replicated ‘n’ times using master-master sync/async replication
  • No FSCK
  • Self-healing
  • Compression – native support for compression at file or block level (ideally the former)
    • Ability to access the file in its compressed form (useful where we can send out a compressed byte stream to the client and the uncompression is handled at the client)
  • Snapshot capability
  • Parity based RAID without write penalties (like Raid-Z)

ZFS supports many of the above features.

Category : 0-cosmos | TechTalk

10 Oct, 2009

Judging Humility in an Interview

Posted by Bhavin Turakhia | (16) Comments

At Directi, one of the most important qualities we value in potential candidates is humility. Infact, in the constantly dynamic landscape that is our industry, the only way to keep up is to know that you don’t know [it all]. Infact I include humility as an important attribute in my document on Skills and attributes that a good developer must possess.

I never really got a handle on how one can judge humility of an individual, until it struck me recently. A technique that has actually effectively worked in the past, but I have never paid attention to it. Humble individuals are always respectful, and do not have an air about them. One of the ways I have been able to distinguish individuals who are not humble are those who feel specific interview questions are beneath them to answer. We have all seen this category. Often I will fire an extremely easy or fundamental or theoretical question in my interview to a candidate – and they will respond with a short answer – accompanied by negative body language or verbal cues or in some cases a direct rebuke that essentially states – “Are you kidding me? Why are you asking me such a question at my level. I am above this type of questioning.”

There are only two reasons (not mutually exclusive) for this type of a response – (1) Ignorance – the candidate does not know the answer to the question and instead of acknowledging it he prefers to go down the path of “this question is beneath me”, (2) Lack of Humility

At Directi -

  • no question is ever beneath someone
  • all of us know that we have a lot to learn
  • none of us feel uncomfortable in acknowledging something we don’t know
  • all of us are respectful

So if you want to judge the humility of an individual during an interview – ask a couple of really easy questions – and see how they respond :)

Do you feel you would fit into our work culture? Apply at http://careers.directi.com

Category : 0-cosmos | Directi | Random Musings

29 Sep, 2009

Writing a Wordpress Plugin

Posted by Bhavin Turakhia | (7) Comments

I was just reading up on building wordpress plugins andthe simplicity and architecture impressed me enough to quickly pen down a short post. Now it does not make any sense to pen down a detailed HowTo since the documentation on the wordpress site is adequate and self-explanatory. However here are some quick notes -

  • Start off by reading – Writing a Plugin – it lays down the framework of creating a plugin and defines what your plugin should be called, file names, structure and even a header for your plugin
  • Next review – Plugin API – which describes the simple yet powerful Hooks, Filters and Actions mechanism provided by Wordpress to plugin developers
  • Hooks are provided by WordPress to allow your plugin to ‘hook into’ the rest of WordPress; that is, to call functions in your plugin at specific times, and thereby set your plugin in motion.
  • There are two kinds of hooks – Actions and Filters
  • Actions are the hooks that the WordPress core launches at specific points during execution, or when specific events occur. Your plugin can specify that one or more of its PHP functions are executed at these points, using the Action API.
  • Filters are the hooks that WordPress launches to modify text of various types before adding it to the database or sending it to the browser screen. Your plugin can specify that one or more of its PHP functions is executed to modify specific types of text at these times, using the Filter API.

I like the architecture – it allows any plugin developer to modify pretty much any functionality provided within Wordpress. Many application platforms can be modeled around this same event-based plugin architecture providing powerful extensibility to plugin developers.

Category : 0-cosmos | TechTalk