Archive for December, 2009
31 Dec, 2009
Guide on GUIDs
Posted by Bhavin Turakhia | (4) Comments
One of the seemingly trivial yet daunting challenges to scaling a datastore is the prevalence of auto-increment IDs to represent unique records in a database. Since any scaling involves horizontal partitioning of data, thus distributing inserts, how does one ensure uniqueness with respect to generation of IDs on these independent machines. One replacement method is using GUIDs (or UUIDs) which is nothing more than a randomly generated 128 bit number (there are various methods of generating one).
The uniqueness guarantee comes from the extremely low probability of two randomly generated 128 bit numbers ever colliding. Just to give you a sense of the size of the space - If a computer was to generate a new GUID every milli second, it would take 10790283070806014188970529154.99 years to generate all GUIDs. That is roughly 83 million billion times the estimated age of the universe.
Is a GUID truly unique
Mathematically speaking no. Since the GUID space consists of 2^128 possibilities, one cannot generate 2^128+1 unique GUIDs. Since the time taken to generate all combinations is so high the probability of a potential collision however within an application space is quite low. However this probability increases as the space of generated GUIDs increase, due to the birthday paradox. As per the birthday paradox in a sample set of n values from a total space of s, the probability that there is atleast one collision is given by the formula – P = (s! / s^n * (s-n)!).
Applying this formula to a space of 2^128 values, the probability of atleast one collision becomes non-trivial when the number of values reach about 10^17 to 10^18 (about 0.001% to 0.14%).
Advantages of using GUIDs
- Globally unique without central generation. Allows easier partitioning of data without having to rely on a central auto-incrementer
- GUIDs are obfuscated and cannot be guessed. Auto-increment IDs have a disadvantage in that one can guess subsequent IDs given a starting point. This allows attacks such as data-scraping and potentially even DOS attacks by simply querying a service for incrementing IDs from a starting point
- Can be generated by the middle tier as opposed to the data layer
Disadvantages of using GUIDs
- Take additional processing power to generate
- Do not index as easily as smaller int values, thus increasing time taken for standard CRUD operations
- Take up additional space (4 times as much – 16 bytes versus 4)
- Can result in data and index fragmentation if a proper indexing mechanism is not chosen
- Can be un-intutive
References
- http://en.wikipedia.org/wiki/Globally_Unique_Identifier
- http://www.yafla.com/dennisforbes/To-GUID-or-not-to-GUID-In-Your-Databases/To-GUID-or-not-to-GUID-In-Your-Databases.html
- http://en.wikipedia.org/wiki/Birthday_problem
- http://stackoverflow.com/questions/1705008/simple-proof-that-guid-is-not-unique
- http://stackoverflow.com/questions/1097870/handling-close-to-impossible-collisions-on-should-be-unique-values
29 Dec, 2009
Investigating Message Queueing systems
Posted by Bhavin Turakhia | (6) Comments
In my constant quest for creating scalable systems and architecture a robust message queuing system was the missing link. I have begun reviewing some of the available options. This is a rough list of some of the interesting links I came across during this process -
- http://wiki.secondlife.com/wiki/Message_Queue_Evaluation_Notes - A detailed comparison of queuing systems and their advantages and disadvantages. Covers RabbitMQ and Apache qpid and several others. The most comprehensive list of message queueing systems I found
- http://stackoverflow.com/questions/731233/activemq-or-rabbitmq-or-zeromq-or - A list of links comparing various queueing systems
- http://www.unlimitednovelty.com/2009/04/twitter-blaming-ruby-for-their-mistakes.html - A detailed critique on how twitter selected its message queue implementaions with comments from the developers at twitter and why they wrote their own queue. Do read the comments. They have a ton of meat
- http://www.zeromq.org/whitepapers:brokerless - An interesting paper on brokerless message queues and the advantages thereof
- http://gojko.net/2009/03/16/qcon-london-2009-upgrading-twitter-without-service-disruptions/ - Talks a bit about how twitter uses a messaging framework at the backend
- http://blog.evanweaver.com/articles/2009/03/13/qcon-presentation/ - Evan Weaver’s talk on scaling twitter
25 Dec, 2009
The evolution of push email
Posted by Bhavin Turakhia | (11) Comments
I have been meaning to compile research on push technologies and the evolution of push email since a while. A conversation with ramki on how gmail offers push email using Microsoft Direct Push triggered my research and here is a summarized compilation of my findings
Blackberry
Blackberry works by downloading your email to its NOC. Whenever it finds a new mail it pushes it out to your cellphone. This is why Blackberry requires you to subscribe to a specific blackberry plan through your Telco. If you have a blackberry plan, your Telco forwards your registration to Blackberry and whenever you are online, Blackberry knows how to find you. Whenever the Blackberry NOC downloads a new mail from your mail servers it pushes the mail out to your cellphone through your Telco. This last bit is proprietary and presumably requires support from the Telcos end. Since a data-push is made to your cellphone only when there is data to be pushed, your battery life is conserved
References
Microsoft AUTD
This was the earlier version of Microsoft’s push technology. Basically Microsoft Exchange (or any mail server) is configured to simply send out a special SMS to your cellphone number each time new data needs to be pushed to your cellphone, and your cellphone is not currently connected. This SMS does not get displayed in your inbox, but rather triggers the mail application (or any other applicable app) in the background to sync with the server. This mechanism requires the use of an SMS gateway and/or possible support from your Telco to be able to send out these type of SMSes.
References
Microsoft Direct Push
This is the latest version of Microsoft’s push technology. It sounded esoteric, but upon close investigation it turned out to be a fancy microsoft name for a technology that has existed for nearly a decade viz. Comet. The way this works is as follows -
- The application on your mobile (lets say your email client) makes a Comet HTTP request to an http server. Along with this request it sends the server a timeout value – say T=15 minutes
- If in the next 15 minutes the server receives a new mail, the server immediately sends an HTTP response with a status that results in the client syncing with the server and then once again issuing a long poll http request
- If in the next 15 minutes the server receives no new mail, the server must at the end of 15 minutes send an empty status response to the client, upon receiving which, the client once again initiates a new HTTP request
- Since this method requires a poll to the server every ‘x’ minutes it typically consumes more battery than Blackberry but the difference should not be too high
- Since the Telco en route may have its own enfroced timeout values on any http connections, it may disconnect a long-running HTTP connection
- Therefore the actual timeout time must be negotiated between the client handset and the server during bootstrapping by attempting higher and higher timeout values until the highest value that the Telco permits is discovered
- The HTTP connection may also get dropped due to network failures and switch-overs. Clients can detect this at the TCP/IP level and re-attempt connections. I am not entirely sure about this part, but the HTTP client presumably also uses TCP Keep alive internally to ensure the TCP connection remains alive
- One may want to lookup the actual port numbers and protocol implementation of Microsoft Direct Push, since it is likely that most Telcos support Microsoft Direct Push and therefore if one was to emulate their push service on the same server port it is likely that Telcos would let the long-running HTTP connection persist
- The advantage of this method is it does not require any special Blackberry plan
References
- How Direct Push really works - http://www.techatplay.com/?p=11
- Comparison between Direct Push and Blackberry - http://www.techatplay.com/?p=13
- What uses lesser traffic – Microsoft Direct Push or Blackberry – http://www.rysavy.com/Articles/Rysavy_Wireless_EMail.pdf
- http://en.wikipedia.org/wiki/Comet_(programming)
- http://www.emansio.com/ – A push email product for windows mobile
- http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html – Overview on TCP Keep Alive
Misc References
- A detailed guide on Direct Push - http://www.techatplay.com/?p=8
- Definition of Push email - http://www.techatplay.com/?p=32
- A detailed list of all the push technologies out there – http://en.wikipedia.org/wiki/Push_e-mail
-
http://en.wikipedia.org/wiki/Lemonade_Profile
- Another interesting push email technology that uses IMAP notify and IDLE – http://en.wikipedia.org/wiki/Lemonade_Profile
10 Dec, 2009
A detailed primer on building cross platform mobile applications
Posted by Bhavin Turakhia | (9) Comments
I finally had a few hours today morning to wrap up my study on comparison of mobile application platforms that allow developing cross-device applications easily using familiar technologies. Here is a quick braindump of all the links and resources I went through -
Rhomobile
- Google TechTalk on Rhodes - http://www.youtube.com/watch?v=T2pztOky_L0
- http://rhomobile.com/products/rhodes/
- http://www.ultrasaurus.com/sarahblog/2009/07/cross-platform-mobile-apps-with-rhomobile/
- http://www.rhohub.com/
- Notes
- Dual licensed. Though license is cheap – $500
- Code is written in html and ruby
(though a python interpreter would have gotten more smileys from me
) - interesting approach – uses the native browser component of the cellphone itself to render the html and a web server to host the app – so javascript support will be random based on the phone browser support
- sqlite support
- Supports iphone, windows mobile, blackberry, android, Symbisn etc
- Basically rhodes runs a mini ruby web server and an html rendering engine all in 2.3MB
- Supports native capabilities like camera, gps, PIM data, SMS etc
Phone gap
- http://phonegap.com/
- Check the video on their site
- Notes
- Fully open source and free
- Code written in html+javascript
- Supports iphone, blackberry and android
Pyxis Mobile
- http://pyxismobile.com/platform/technical-overview/
- Build one configuration and deploy to BlackBerry, iPhone, and Windows Mobile all at the same time
- Skinning, scripting, localized languages, complex workflow management, push, hotkeys, mapping & LBS, camera support, signature capture, GUI calendar, disambiguation, hotkeys, and much more
Titanium Mobile
- http://www.appcelerator.com/products/titanium-mobile/
- This is an upcoming mobile platform by appcelerator
- I am quite familiar with the company since we already use their Titanium Desktop and have two fulltime contributors to it
Quick Connect
Comparison sites and articles
- http://blog.twinapex.fi/2009/09/30/cross-platform-mobile-application-development-and-payment/
- http://en.wikipedia.org/wiki/Mobile_development
- http://www.infoworld.com/d/open-source/building-native-mobile-applications-open-source-mobile-platforms-735
- http://www.infoworld.com/d/mobilize/iphone-development-tools-work-way-you-do-309
- http://news.cnet.com/8301-1035_3-10202598-94.html
- http://www.slideshare.net/inouemak/rhodes-and-phone-gap
- http://techboise.com/multi-platform-mobile-development-and-quickconnect
Some others









