Introduction to Cassandra DB ~ TwinrainbO

Originally developed by facebook, it is now an Apache project and at present has incubator status.

From my point of view, Cassandra stands out from the crowd of non-relational databases for three reasons:

1. Reference Sites: Apart from facebook, Cassandra recently replaced MySQL for parts of Dig. Also Rackspace are doing something secret with it; although I don’t know what it is, my guess would be something along the lines of Amazon’s SimpleDB.

2. True peer clustering: Cassandra does not require a central master. A key feature of Cassandra is you can write to any node in the cluster, at any time. Writes are never blocked. The trade off for this is you get consistency eventually. So transactions aren’t strictly ACIDic, but depending on what you are doing, that might not matter at all.

3. Column Querying: BigTable really popularised (and proved) the concept of the column database for large scale applications. Cassandra really is similar to BigTable from this perspective, but introduces theSuperColumn.

In addition to those things, a couple of other nice features of Cassandra are:

1. It’s JVM based, which makes it nicely portable. It really was a 1 minute job to get it up and running on Snow Leopard.

2. Cross platform API, via a remote Thrift interface.

My intention is to use Cassandra in our current project, where I need a horizontally scalable data store with geographically separate cluster nodes. Fortunately eventual consistency suits this project very well indeed.

Addendum: Here is the best getting started guide I have found so far.

Addendum 2: Eric Flo also sums up Cassandra nicely, although in a slightly different context.