Archive for January, 2009

20 Jan, 2009

HTTP vs REST vs SOAP

Posted by Bhavin Turakhia | (17) Comments

I have been an active proponent of SOAP since its inception. SOAP revolutionzed RPC and loose coupling to a great extent. However off late I have been giving APIs and interfaces considerable thought and am leaning a lot more towards simple HTTP based APIs with an XML or JSON response format as opposed to SOAP. In this post I pen down some random thoughts on the merits and demerits of each.

Introduction
Let me first clarify the terminology -

  • SOAP refers to Simple Object Access Protocol
  • HTTP based APIs refer to APIs that are exposed as one or more HTTP URIs and typical responses are in XML / JSON. Response schemas are custom per object
  • REST on the other hand adds an element of using standrdized URIs, and also giving importance to the HTTP verb used (ie GET / POST / PUT etc)

Typing
SOAP provides relatively stronger typing since it has a fixed set of supported data types. It therefore guarantees that a return value will be available directly in the corresponding native type in a particular platform. Incase of HTTP based APIs the return value needs to be de-serialized from XML, and then type-casted. This may not represent much effort, especially for dynamic languages. Infact, even incase of copmlex objects, traversing an object is very similar to traversing an XML tree, so there is no definitive advantage in terms of ease of client-side coding.

Client-side effort
Making calls to an HTTP API is significantly easier than making calls to a SOAP API. The latter requires a client library, a stub and a learning curve. The former is native to all programming languages and simply involves constructing an HTTP request with appropriate parameters appended to it. Even psychologically the former seems like much less effort.

Testing and Troubleshooting
It is also easy to test and troubleshoot an HTTP API since one can construct a call with nothing more than a browser and check the response inside the browser window itself. No troubleshooting tools are required to generate a request / response cycle. In this lies the primary power of HTTP based APIs

Server-side effort
Most Programming languages make it extremely easy to expose a method using SOAP. The serialization and deserialization is handled by the SOAP Server library. To expose an object’s methods as an HTTP API can be relatively more challenging since it may require serialization of output to XML. Making the API Rest-y involves additional work to map URI paths to specific handlers and to import the meaning of the HTTP request in the scheme of things. Offcourse many frameworks exist to make this task easier. Nevertheless, as of today, it is still easier to expose a set of methods using SOAP than it is to expose them using regular HTTP.

Caching
Since HTTP based / Rest-ful APIs can be consumed using simple GET requests, intermediate proxy servers / reverse-proxies can cache their response very easily. On the other hand, SOAP requests use POST and require a complex XML request to be created which makes response-caching difficult

Conclusions
In the end I believe SOAP requires greater implementation effort and understanding on the client side while HTTP based or REST based APIs require greater implementation effort on the server side. API adoption can increase considerably if a HTTP based interface is provided. Infact an HTTP-based API with XML/JSON responses represents the best of both breeds and is easy to implement on the server as well as easy to consume from a client

Category : 0-cosmos | TechTalk

1 Jan, 2009

Solid State Drives vs Hard disk drives

Posted by Bhavin Turakhia | (10) Comments

Intro

  • A solid state drive stores its data in solid-state memory (Flash / SRAM / DRAM)
  • Flash does not require constant power and is non-volatile while SRAM and DRAM are volatile

Speeds

  • Flash maybe slower than even tradition HDDs on big file access
  • Flash is considerably slower than conventional disks for small writes. This is partly due to their large erase block size of 0.5-1 MB
  • SSDs are faster than HDDs for small random reads due to negligible seek time (no moving parts)
  • Check the comparison table at http://www.storagesearch.com/ssd-ram-v-flash.html.  When Flash based SSDs are used for equal reads and writes they are actually slower than HDDs. However if small random reads far outweigh writes, the performance gains can be upto 100x!!?
  • Download the paper – Comparison of Drive Technologies for High-Transaction Databases. Findings below -
    • HDDs: Small reads – 175 iops/s, Small writes – 280 iops/s
    • Flash SSDs: Small reads – 1075 iops/s (6x), Small writes – 21 iops/s (0.1x)
    • DRAM SSDs: Small reads – 4091 iops/s (23x), Small writes – 4184 iops/s (14x)
  • Another whitepaper on Flash vs HDDs is Understanding Flash SSD Performance. Findings below -
    • Read performance: Flash outperforms hdds by a large magnitude for small block size
    • It is with write performance that Flash SSDs become problematic. The issue here is the internal structure used within the Flash storage array. This structure includes a collection of bytes called an “erase block”. When you write to a Flash SSD, the drive itself cannot just update the sectors you are changing, but must merge your changes with existing data to update a complete erase block. As Flash SSDs have gotten faster and larger, erase blocks have grown as well. Flash erase blocks used to be 16K in length. Now they are 1 Megabyte for small SSDs extending up to as large as 4 Megabytes for some models.
    • If you are doing pure reads, a Flash SSD will typically be 20x faster than a hard disk for small random reads. If you are doing pure random writes, the same drive might be 15x slower than a hard disk
    • Of pertinence is the table which shows how a small % of writes can destroy Flash SSD Performance. It is for this reason alone that Flash SSDs, by themselves, are not very effective with random update applications like on-line databases, mail queues, and other environments that involve a lot of small updates
  • One can improve write performance of a Flash SSD using the following methods -
    • OS Write caching – OS buffers writes which eventually get written to disk making the writes appear faster
    • File systems optimized to minimize random writes – YAFFS, JFFS2.
    • Managed Flash Technology – a patent pending technology by easyco which enables Flash Drives to write clusters of random data in linear streams

Costs

  • As of mid-2008, SSD prices are still considerably higher per gigabyte than are comparable conventional hard drives: consumer grade drives are typically US$2.00 to US$3.45 per GB for flash drives and over US$80.00 per GB for RAM-based compared to about US$0.12 per gigabyte for hard drives
  • DRAM based SSD require more power than hard disks when operating, and need continuous power when not in use if the data needs to be persistent
  • Check article Flash vs DRAM Price Projections – for SSD Buyers
    • In the first half of 2007 the difference in user price between a RAM versus Flash SSD was about 45 to 1. A year later in the first half of 2008 that ratio had changed to 25 to 1
    • However NAND has been on a steeper price decline than DRAM for its entire existence. The price of a gigabyte of DRAM declines (on average) 32% per year. There are indications that this decline may slow. Meanwhile, NAND’s price per gigabyte declines faster, at an average of 50% per year

My Conclusions

  • DRAM based SSDs are crazy expensive. They serve best for volatile caches (eg, memcached pools etc). If you have servers dedicated to serve in-memory cache data, it may reduce your cost to add DRAM SSDs to these clusters since they are likely not going to bottle-neck on CPU anyways
  • Flash based SSDs would work in an environment where the % of writes is low. As can be seen in some of the above benchmarks, a flash based SSD starts degrading in performance in comparison to HDDs in environments with just 5% writes. If one wants to use Flash based SSDs in environments with substantial writes, one should use special filesystems (YAFFS / JFFS2)  and/or use Managed Flash Technology
  • Flash based SSDs work like a charm in a read-only or mostly read environment
Category : 0-cosmos | TechTalk