Thursday, August 18, 2011


I started to learn NoSQL. In this post and future articles I am writing as I learn about NoSQL.
As for as web server is concerned, A modern Web application should support millions of concurrent users by balancing the load across a collection of application servers (cluster) behind a load balancer. Upgrades to the system can be rolled out incrementally without requiring application downtime by gradually replacing the software on individual servers.

For example, Facebook, slowly dials up new functionality by rolling out new software to a subset of their entire application server tier in a stepwise manner. If any issue comes up, servers can be quickly reverted to the previous known good build. All this can be done without ever taking the application “offline.”

Lets compare the past and present day user base, application and infrastructure…
In the past, applications supports approximately maximum 2000 user bases, the number of users are static, The cost of CPU, memory and disk are very high, low speed network, typically centralized computing (mainframe), highly structured data.

Now the user base could increase at maximum of two billion (if mobile users also included it may double the number) and it should run 24/7, unpredictable load, low cost hardware, distributed computing, semi-structured/structured/unstructured data, high speed network.  Almost everything has changed except database technology.

When demand increase that can be easily achieved by increasing the number of application servers. But Database technology is not supporting such large scaling infrastructure even though we have few techniques such as sharding.

When user base increases the response time and scalability goes down as for as DB is concerned. Since a typical scaling technique used is vertical scaling.  
To address the shortcomings of RDBMS technology when used behind modern software systems, developers have adopted a number of “bandaid” tactics such as Sharding, Denormalizing, Distributed caching.

In response to this problem, Google (Big Table), Amazon (Dynamo)  and other internet scale companies forced to invent new approach to data management at internet scale.

1) schema is required. A non-relational database (Example: document-oriented databases, key-value stores, BigTable-clones and graph databases) is simply any type of database that doesn't follow the relational model. Data can be inserted in a NoSQL database without first defining a rigid database schema. The format of the data being inserted can be changed at any time, without application disruption. This provides immense application flexibility, which ultimately delivers substantial business flexibility.
2) Auto-sharding (elasticity). A NoSQL database automatically spreads data across servers, without requiring applications to participate. Servers can be added or removed from the data layer without application downtime, with data (and I/O) automatically spread across the servers. Most NoSQL databases also support data replication, storing multiple copies of data across the cluster, and even across data centers, to ensure high availability and support disaster recovery. A properly managed NoSQL database system should never need to be taken offline, for any reason, supporting 24x7x365 continuous operation of applications.
3) Integrated caching. To reduce latency and increase sustained data throughput, advanced NoSQL database technologies transparently cache data in system memory. This behavior is transparent to the application developer and the operations team, in contrast to RDBMS technology where a caching tier is usually a separate infrastructure tier that must be developed to, deployed on separate servers, and explicitly managed by the ops team.

A number of commercial and open source database technologies such as Couchbase (a database combining the leading NoSQL data management technologies CouchDB, Membase and Memcached), MongoDB, Cassandra, Riak and others are now available and increasingly represent the “go to” data management technology behind new Web applications.

No comments: