Archive for August, 2009

24 Aug, 2009

Google makes 190x the revenue of facebook per pageview

Posted by Bhavin Turakhia | (12) Comments

This article provides a rough idea of how much money some of the highest ranking web destinations are making from their users -

Google

  • April-June 2009 Revenue: $5.5 billion
  • 97% of above revenues are from advertising
  • April-June Revenue from Google Properties: $3.6 billion
  • Total US revenue April-June Revenue from Google Properties: $2.6 billion
  • Number of searches performed by Americans on Google Apr-June: 27.5 billion (approx) (source comscore)
  • Revenue per search: 9.5 cents
  • Revenue per 1000 searches: $95

Sources

Ask

  • IAC total April-June Revenue: $340 million
  • Revenue from Media and Ads (Ask.com, Citysearch, Dictionary.com etc): $168 million
  • 84% of this is from US: $141
  • 72% of this is proprietary properties => $101 million
  • Bulk of this can be assumed to come from Ask.com (lets say $90 million)
  • Number of searches performed by Americans on Google Apr-June: 1.5 billion (approx) (source comscore)
  • Revenue per search: 6 cents
  • Revenue per 1000 searches: $60

Sources

Facebook

  • Dec 2008: 80 billion pageviews
  • Registered users: 222 million
  • Page views per user: 360 pageviews per user per month (or 12 pageviews per day avg)
  • June 2009: 340 million unique visitors (77 million from US)
  • May 2009: 87 billion page views (20 billion from US)
  • Expected to generate over $500 million in revenue in 2009
  • Rough total pageviews in 2009 => 1000 billion
  • Rough Revenue per 1000 pageviews: 50 cents
  • Breakup of their $550 million revenue – 125 – brand ads, 150 – deal with Msft, 75 – virtual goods, 200 – self service ads

Sources

Baidu

  • April-June 2009 revenue: $160 million
  • Revenue per advertising customer: $791
  • July 2008 Searches: 7.4 billion
  • July 2008 quarter extrapolated: 22.2 billion searches
  • Revenue in quarter of July 2008: $135.4 million
  • Revenue per search: 0.6 cents
  • Revenue per 1000 searches: $6

Sources

Linkedin

  • Projected revenues in 2008: $100 million
  • Revenue from Advertising: 25%
  • Funding so far: $103 million
  • Unique users as of 2009: 45 million
  • March 2008 monthly visitors: 11 million
  • March 2008 monthly pageviews: 115 million
  • March 2008 avg minutes per visitor: 7.8 min

Sources

Ebay, Skype and Paypal

  • April-June 2009 Revenue: $2 billion
  • Marketplaces revenue (ebay.com, shopping.com etc): $1 billion (transaction) + $200 million (advertising)
  • Marketplace Gross volume: $13.4 billion (ebay made around 10% of this in its revenues which is impressive)
  • Payments revenue (paypal.com, bill-me-later): $630 million (transaction) + $39 million (advertising)
  • International component of Payments revenue: $286.2 million (45%)
  • Payments Total volume: $16.7 billion (ebay made around 3.9% here – which is surprising considering their paypal rates are much lower)
  • Communications revenue (skype): $155 million (transaction) + $14 million (advertising)
  • International component of Communications revenue: $128.5 million (83%)
  • Skype registered users – 480 million
  • Skypeout minutes – 2.9 billion
  • Per user revenue – 32 cents per registered user per quarter
  • Per user minutes – 6 minutes per user per quarter
  • US revenue: $959 million
  • International revenue: $1.1 billion
  • Skype Q3 2007: 10 million concurrent online users at peak. 4 million at trough.

Sources

Apple

  • Apr-June 2009 sales: $8.3 billion
  • Geo distribution – America – 3.8, Europe – 2, others – remaining
  • Product distribution – Mac – 3.3, ipod – 1.5, other music products – 1, iphone and related services – 1.6
  • Units of product sold – Desktops – 0.8m, Portables – 1.7m, ipod – 10.2m, iphone – 5.2m

Sources

Category : Uncategorized

21 Aug, 2009

Directi Campus Recruitment 2009-10 off to a great start

Posted by Bhavin Turakhia | (222) Comments

We just began our campus recruitment exercise, and thus far the first week has been a resounding success. This year we plan on visiting upwards of 60 institutions, engineering and management, across 4 countries, in our search for the best, kick-ass talent available. Check out our Campus drive page for further details on the campuses we will be visiting. We will continue to add entries in there as we fix the dates.

This week we visited 5 colleges. The process – an online test at CodeChef.com, a personal interview,a telephonic interview, and then a visit to our uber cool HQ at Mumbai for the final rounds. We hade several hundred students compete across multiple cities and institutions for the top spot in a fast-paced, fun hackfest at 5 different venues. So far 2 offers have been made. Our talent scouts – Ankush, Nishant, Anup and Anirban had fun, met a ton of interesting people and generally spread the word. Here are some pictures  and trivia -


Created with flickr slideshow.

Institution Num of Students
NIT Trichy 85
NIT Warangal 120
RVCEBangalore 350
PESIT Bangalore 500+
NIT Suratkal 200+
Category : 0-cosmos | Directi

20 Aug, 2009

Bought my first plane and flew to Lake Tahoe

Posted by Bhavin Turakhia | (26) Comments

Just got back from a week long super-active trip to the US – visited our friends at Yahoo, participated in HostingCon, checked out office spaces in San Francisco … and … bought our very first Cessna StationAir :) )


Created with Admarket’s flickrSLiDR.

We (me and Div) went to Lake Tahoe for an evening, took off from San Carlos airport, had dinner at Lake Tahoe, won some money playing Poker and flew back. Cant wait till we get this baby shipped back to Mumbai – weekend trips to Goa whenever I want :)

Category : 0-cosmos | Random Musings

17 Aug, 2009

A Compendium of solutions for scaling a Data Store

Posted by Bhavin Turakhia | (5) Comments

Achieving infinite scalability on your data store is the holy grail of scaling an application. App servers are typically stateless and therefore a cinch to scale. This document serves as a comprehensive compendium on my thoughts and research on scaling a data store.

Requirements

  • Inifinite Scalability
  • High Availability (0% downtime)
  • Data Redundancy
  • High Performance
  • Storage Flexibility – ability to store any type of data
  • Query Flexibility – ability to perform simple gets, range based gets, range based updates, and possibly complex joins

Features that a solution must have to deliver the above Requirements

  • Replication – Each unit of data should be copied to multiple nodes so that if an underlying node crashes there is no data loss
  • Partitioning – Data should be divided across multiple nodes based on specific keys so that the data layer is infinitely scalable
  • Online node addition – Solution should support adding new nodes online, with automated data distribution upon new node addition
  • Load Balancing – Queries should be load balanced between nodes
  • Persistence – Should have a data persistence layer so that data is not volatile
  • Caching – Should support flexible in-memory caching for increased retrieval speeds
  • Tree based Indexing – To support range based queries ideally one may need tree type indexes for keys on which range queries maybe made
Proposed Solutions
The below solutions are a result of researching a ton of options (refer Research seciton below). They also represent solutions that are practical to deploy and are being used in production. There are exotic variants that I came up with during my research but I have left them out.

Solution 1: Google App Engine
  • Provides BigTable – Googles implementation of a scalable database
  • BigTable is not an RDBMS, but has a fairly flexible API that supports creative data fetching methods
  • BigTable is distributed and self-balancing – scaling is no longer the application developers problem
  • You will need to host your application on Google’s App Engine
  • Your application needs to use the BigTable API for data storage
Solution 2: MySQL NDB Cluster
  • MySQL NDB Cluster is a master-master, self-partitioned, replicated storage engine that technically seems to provides all the features listed above
  • It also offers access via SQL or a high-performant native NDB API
  • It seems like the holy grail of database scaling
  • It however stores indexes entirely in memory – and has lesser flexibility w.r.t persistence
  • I could not find much material on the performance of an NDB Cluster
Solution 3: HyperTable
  • HyperTable is an opensource BigTable clone and provides essentially the same features as described in the BigTable whitepaper
  • It is supported by Baidu, Zvents and Rediff
Solution 4: Project Voldemort
  • Voldemort is a big, distributed, persistent, fault-tolerant hash
  • Developed and used by Linkedin
  • Java based API
  • Reasonably performant (10-20k ops on commodity hardware)
  • Maintains replicated copies of data over multiple nodes and automatically handles server failures
  • Does not support range based querying
Solution 5: Tokyo Tyrant
  • Tokyo Tyrant is a layer on top of Tokyo Cabinet – a highly performant, persistent data strore (site claims over 2 million qps)
  • Tokyo Tyrant itself claims to deliver upwards of 58,000 qps
  • It supports multiple language bindings (Java, Perl, PHP etc)
  • Supports various data structures – hash, tree, B+tree, array, table etc
  • Supports caching
  • Tokyo Tyrant does not support active-active master-master replication, thus failing out on redundancy. It also does not support data partitioning out of the box
Solution 6: Postgres + Pl/Proxy + Replication (Slony / Continuent) + PGBouncer (connection pooler)
  • Postgres is an extremely mature RDBMS
  • Using PL/Proxy one can abstracte horizontal partitioning concerns out of the database layer and into an abstracted underlying layer
  • Using Slony or Continuent one can ensure that multiple copies of any set of rows exist at any given point in time (Synchronous or Async replication)
  • PGBouncer provides a light-weight connection pooler for PL/Proxy
  • Together this will satisfy all our requirements above
  • You may club Voldemort or memcached or redis with this to provide a caching layer
  • Postgres also has hooks for connecting to memcached that make cache population and invalidation easier
Research
A lot of reading went behind discovering the above solutions. The below links are a good start -
BigTable
Google App Engine
MySQL Cluster
Amazon SimpleDB
Cassandra
Hypertable
MongoDB
neo4j
Project Voldemort
Redis
Tokyo Cabinet
Lightcloud
Gigaspaces
Coherence
Postgres
Category : 0-cosmos | TechTalk

16 Aug, 2009

Infinitely Scalable Infrastructure and RDBMSes

Posted by Bhavin Turakhia | (1) Comments

Since the last several months I have been spending part of my time on conceptualizing an abstracted infrastructure layer that is highly scalable and can be leveraged by any application without having to worry too much about it. I have researched and continue to research conventional and unconventional techniques – partitioning, clustering, replication, shared-nothing architectures, grid computing and so on. This article represents a sliver of my thoughts concerning scalability and RDBMSes -

The holy grail of scalability is to be able to scale your data store. And as data stores go, RDBMSes seem to be the predominant choice (though that is changing – refer http://bit.ly/2lnRet). RDBMSes by their very nature, due to the features they provide (ACID compliance, Transaction safety etc) tend to be difficult to easily scale. This has resulted in the recent mushrooming of data storage options that are feature-poor but scalable out of the box (eg Voldemort, HyperTable etc)

I wanted to chronicle the list of features that a standard RDBMS provides, that we take for granted, so that I have a reference of the features that one may have to compromise on w.r.t application development in favor of easier scalability -

  • Range based selects and updates – Being able to fire queries on a table specifying a range of values (eg where age >35). Typically RDBMSes use B+ Tree based indexes which support range based row selection. This in turn allows one to fire range based queries.
  • Transactions – In an RDBMS one can perform multiple operations within a transaction and ensure that all of them or none of them go through. This ensures data integrity
  • B+ tree indexes
  • Foreign key relationships and referential integrity
  • Joins and nested selects
  • Aggregations (sum, avg etc)
  • Advanced scripting using non-native languages (java etc)
  • Stored procedures (allow encapsulation of business logic in the database layer)
  • Triggers

Category : 0-cosmos | TechTalk

5 Aug, 2009

A Compendium on Swine Flu – Facts and Info

Posted by Bhavin Turakhia | (10) Comments

India has reported its first Swine flu death, and tabulated over 500 positive cases so far (amongst those reported). This is slightly worrisome. I have compiled this compendium of resources as a quick reference for myself, my Company and anyone else who wants to get their facts and figures on the swine flu.

Note: Nothing in this document constitutes actual medical advice. I have not verified the factual accuracy of any source or excerpt. This is merely meant to be a compilation of excerpts from a multitude of sources. Please consult your doctor for actual medical advice.

Background and facts

  • Names: Swine Flu, SIV, H1N1, Influenza A, Novel H1N1
  • The H1N1 viral strain is due to a new strain of H1N1 not previously reported
  • Mortality: “in the US it appears that for every 1000 people who get infected, about 40 people need admission to hospital and about one person dies”
  • Current totals – in early July WHO officials gave up trying to count the number of cases. CDC’s Schuchat said it was “more than a million.”
  • The United States continues to report the largest number of novel H1N1 cases of any country worldwide, however, most people who have become ill have recovered without requiring medical treatment.

How does it spread

  • Novel influenza A (H1N1) spreads in the same way that regular seasonal influenza viruses spread
  • Coughs and sneezes of people who are sick
  • Touching infected objects and then touching your nose or mouth
  • Transmission is human to human – cooked pork products are safe to eat as the virus cannot be transmitted by eating foods

Symptoms

Novel H1N1 infection has been reported to cause a wide range of flu-like symptoms, including fever, cough, sore throat, body aches, headache, chills and fatigue. The following table from the CDC facts and figures page provides a breakdown on the % of infected people displaying a particular symptom -

Symptom
Number (%)
Fever* 249 (93%)
Cough 223 (83%)
Shortness of breath 145 (54%)
Fatigue/Weakness 108 (40%)
Chills 99 (37%)
Myalgias 96 (36%)
Rhinorrhea 96 (36%)
Sore Throat 84 (31%)
Headache 83 (31%)
Vomiting 78 (29%)
Wheezing 64 (24%)
Diarrhea 64 (24%)

Prevention

  • A vaccine is in development and is likely to become available soon – http://www.cdc.gov/h1n1flu/vaccination/acip.htm
  • Anyone exhibiting symptoms must stay quarantined and not come into contact with others. They should also get tested immediately.

Testing

  • In early June, WHO and the U.S. Food and Drug Administration (FDA) acknowledged that a new laboratory test used to identify the virus was about 90 per cent accurate, while rapid diagnostic tests have an accuracy of 50-70%

Cure

  • Over-the-counter drugs relieve symptoms, they do not kill the virus
  • Antiviral drugs can be used to treat swine flu or to prevent infection with swine flu viruses. These medications must be prescribed by a health care professional.
  • Influenza antiviral drugs also can be used to prevent influenza when they are given to a person who is not ill, but who has been or may be near a person with swine influenza. When used to prevent the flu, antiviral drugs are about 70% to 90% effective. When used for prevention, the number of days that they should be used will vary depending on a person’s particular situation
  • Most patients were expected to recover without medical attention, although those with pre-existing or underlying medical conditions were more prone to complications

Other References

Category : 0-cosmos | Random Musings