Cassandra Database

By Pushpam Matah May 11, 2018

The Apache Cassandra database is the right choice of a database if you are looking for scalability and high availability without compromising performance for your mission-critical applications. Cassandra, in layman’s terms, is a NoSQL database developed in JavaOne. One of Cassandra’s many benefits is that it’s an open source DB with deep developer support. It is also a fully distributed DB, meaning that there is no master DB (unlike Oracle or MySQL) so this allows this database to have no point of failure. It also touts being linearly scalable, meaning that if you have 2 nodes and a throughput of 100,000 transactions per second, and you added 2 more nodes, you would now get 200,000 transactions per second, and so forth.

Masterless Architecture

In other Master-Slave databases like MongoDB or HBase, there will be a downtime if the Master goes down and we need to wait for the next Master to come up. That’s not the case in Cassandra. It has no special nodes i.e. the cluster has no masters, no slaves or elected leaders. This enables Cassandra to be highly available while having no single point of failure. This is the reason it supports ‘A’ in CAP. This also answers the question why we need Cassandra in our application. Applications that demand zero downtime need a masterless architecture and that’s where Cassandra drives the value. In simple words, Write and Read can happen from any node in the cluster at any point in time.

Apache Cassandra Query Language

The Cassandra Query Language (CQL) allows you to query Cassandra using queries similar to SQL. It was first introduced in Cassandra 0.8 and is the most preferred way to communicate with the Cassandra database. You can use CQL through the CQL shell, cqlsh. You can create keyspaces, tables, insert tables and use many more features which are available in CQL. CQL3 also supports JSON, user-defined functions (UDFs), user-defined aggregates (UDAs) and role-based access control (RBAC).
So if your application has a large amount of data, and if you are planning to scale it, then Cassandra will definitely help you. The main difference between a relational database and Cassandra is that the former breaks data into many tables, but Cassandra tends to keep as much of it as possible intact within the same row to avoid having to join that data for retrieval.

Why to Use Cassandra ?

1) The requirement for fast writes: Easily deals with data velocity, data variety, and data complexity issues.

2) Can handle massive data sets.

3) Homogeneous environment.

4) Highly fault-tolerant.

5) Proven success in enterprise applications and in many use cases already.

6) Ease of administration.

7) Amazing community.