Sunday, August 25, 2019

Cassandra vs MongoDB

This article is about differentiating between two of the most important NoSQL database products that are available in the current market - Cassandra and MongoDB. Both of these NoSQL databases might tend to look like they are similar but they are different in many aspects, and we will spend a little while discussing these products anyway. 


Content of this article

Understanding Cassandra and MongoDB

To understand Cassandra a little bit, it is introduced in the year 2008 by a couple of developers from Facebook which was later released as an Open Source project. It is currently being backed by the Apache Software Foundation and Apache is currently maintaining this project for any further enhancements. Support for this database comes from 3rd party companies as like Impetus, Datastax and URImagination. Cassandra finds its usage in organizations like Facebook, Instagram, IBM, Reddit, and Netflix.
To provide some background for MongoDB, it is introduced in the year 2009 by a company named 10gen. 10gen was later renamed to MongoDB Inc. which now looks after the development of the software and also sells the enterprise version of this database. MongoDB Inc. handles all the support with their excellent enterprise-grade support round the clock. They provide lifetime support which means customers choose to use any version of MongoDB and they if wish to upgrade, it would be supported anytime. It also provides with them an opportunity to be in sync with all the security fixes that the company provides round the clock. MongoDB finds its usage in bigger organizations like Google, Adobe, Forbes, eBay, Cisco and many more.

Similarities between Cassandra and MongoDB

Now, with the understanding of these two NoSQL databases, let us try to understand some of the similarities between these two:
· Both of these are NoSQL database types
· None of these is a replacement to the traditional RDBMS database types
· Both of these are not ACID compliant databases
· Consistency and Normalization are two concepts that these two database types not satisfy (as these lean more towards the RDBMS database types)

Cassandra Vs. MongoDB

In this section, we will take a look at the differences between Cassandra and MongoDB.
 Features  
Cassandra
MongoDB
Data Model
Cassandra has a more orthodox data model with rows and columns.
Data is structured in the case of Cassandra and each of these columns is of a specific type which gets assigned during the table creation itself.
In comparison, MongoDB provides more rich data model than that of Cassandra.
MongoDB has a data-oriented or an object-oriented data model. 
This model can further be represented using any of the data structures based on the user domain. 
Data can further be nested into multiple levels if there is a need.
Master Node
Cassandra has multiple master nodes in a cluster, and if one master node goes down, its place will be taken by another node.
Because of the above, there is no effect on the cluster and is always available.
In comparison, Cassandra has a higher availability over MongoDB.
MongoDB has only one master node in a cluster which further controls a set of slave nodes. 
If the master goes down, a slave is elected as master and takes about 20-30 seconds for the same. In this duration, the cluster won’t be able to accept any incoming requests.
Secondary Indices
Cassandra has cursor support for the secondary index. This is limited to only a single column and equality comparison.
It is very easy to index any property that is stored in the MongoDB database.
MongoDB is better than Cassandra if your application requires secondary indices along with flexibility in the data model.
Scalability
Cassandra can have multiple master nodes in a cluster which makes it ideal in the case of Scalability.
Cassandra is more scalable in comparison with MongoDB as it can have more than one master node in a cluster.
MongoDB has only one master node in the cluster at any given point in time, which is the only point to cater incoming requests. Hence, it is not ideal when we think about scalability.
Query Language
There is a proprietary query language for Cassandra named CQL, which is very similar to SQL.
Cassandra has a user-friendly set of queries with CQL and is adaptable within the developers who have prior knowledge of SQL.
There is no support for any query language for MongoDB.
Queries are structured as JSON fragments in MongoDB.
Aggregation
Cassandra doesn’t have any built-in support for aggregation and heavily relies on tools like Hadoop or Apache Spark
MongoDB has built-in support for aggregation which can be used to run an ETL pipeline in transforming the required data.
MongoDB’s aggregation framework supports both small and medium data traffic. With the increased complexity, the framework gets tougher to debug as well.
MongoDB is better in comparison with Cassandra, as it has a built-in aggregation framework.
Schema
Cassandra doesn’t provide the facility to alter schema but provides static typing.
MongoDB provides the facility to alter schema for the Users
Performance
Cassandra performs better in applications with heavy data load as it can provide multiple master nodes in a cluster.
MongoDB is not ideal for applications with heavy data load as it can’t scale with the performance.
 

With all the details that we have just gone through, let us summarize all of these points in simpler terms to understand which database fares well on what aspect, shall we? In the following listing, we will only list what aspect is better in which database type.
Feature
Cassandra
MongoDB
  Data Model

Better
  Master Nodes
Better

  Secondary Indices

Better
  Scalability
Better

  Query Language
Better

  Aggregation

Better
Schema

Better
  Performance
Better

Conclusion

In this article, we have gone through two variants of NoSQL databases available in the current market and understood each of these NoSQL databases in detail, and alongside to that, we have also seen most of the similarities between these two database products. Besides that, we have also taken a detailed look at the differences between these two database products and also understood where these products their usage to the most