Cassandra vs MongoDB: The basics

Cassandra and MongoDB are two of the more popular NoSQL databases. I've been using Cassandra extensively over the past 6 months and I've recently started using MongoDB. Here is a brief description of the two, I'll follow up this post with a deeper comparison of the more advanced features.

Cassandra

Cassandra was originally created by Facebook and is written in Java, however it is now a Apache project. Traditionally Cassandra can be thought of as a column orientated database or a row orientated database depending on how you use columns. Each row is uniquely identified by a row key, like a primary key in a relational database. Unlike a relational database each row can have a different set of columns and it is common to use both the column name and the column value to store data. Rows are contained bya  column family which can be thought of as a table.

Client's use the thrift transport protocol and queries look like:

set Person['chbatey']['fname'] = 'Chris Batey';

Where Person is the column family, chbatey is the row key, fname is the column name and "Chris Batey" is the column value. Column names are dynamic so a client can store any key/value pairs. In this sense Cassandra is quite schemaless.

Then came Cassandra 1.* and CQL 3. Cassandra Query Language (CQL) is a SQL like language for Cassandra. Suddenly Cassandra, from a client's perspective, become much more like a relational database. Queries now look like this:

insert into Person(fname) values ('chbatey')

Using CQL3 there are no more dynamic column names and you create tables rather than column families (however the map type basically gives the same functionality). It's all still column families under the covers, CQL3 is just a very nice abstraction (a simplification). 

Cassandra appears to be moving away from a thrift protocol and moving to a proprietary protocol referred to as a native protocol. 

Overall Cassandra is quite a "rough around the edges" database to use (less so with CQL3) from a client perspective. It's real power comes from its horizontal scalability and tuneable eventual consistency. More on this in a future post.

MongoDB

MongoDB is a document database written in C++. Document databases are very intuitive as you simply store and retrieve documents! No crazy data model to learn, for MongoDB you simply store and retrieve JSON (BSON) objects.

Storing looks like this:

db.people.save({_id: 'chbatey', fname:'Chris Batey'})

Retrieving looks like this:

db.people.find({_id: 'chbatey'})

Simple!

MongoDB has a very rich JSON based querying language and a fantastic aggregation framework. From a client's perspective MongoDB is a vastly more featured database with support for ad-hoc querying (Cassandra you must index everything you want to search by). 

Conclusion

This post was a very brief description of Cassandra and MongoDB. In future posts I will compare:
  • Fault tolerance - replication
  • Read and write consistency
  • Clients
  • Hadoop support
Particularly for Cassandra it is very important how your data centre and Cassandra cluster are laid out as to which read and write consistency levels you need to get the desired behaviour.