Introduction to Cassandra — Summary of the documentation

Writing this post to kickstart my CS4224 Project

Architecture

  • Apache Cassandra is an open source, distributed, NoSQL database. It is a partitioned wide column storage model with eventually consistent semantics.
  • Systems like Cassandra are designed for these challenges and seek the following design objectives:

Full multi-master database replication
Global availability at low latency
Scaling out on commodity hardware
Linear throughput increase with each additional processor
Online load balancing and cluster growth
Partitioned key-oriented queries
Flexible schema

  • CQL is an SQL-like language to create and update database schema and access data.
  • Cassandra does not support operations that require cross partition coordination such as cross partition transactions, distributed joins and foreign keys or referential integrity.
  • Cassandra has 2 clustering components: (1) Request coordination over a partitioned dataset (2) Ring membership and failure detection, while using a storage engine based on a Log Structured Merge Tree.
  • Cassandra uses consistent hashing to partition its data stored in the system.
  • Multi-master Replication: Cassandra replicates every partition of data to many nodes across the cluster to maintain high availability and durability.
  • Gossip: Cassandra protocol to propagate basic cluster bootstrapping information such as endpoint membership and internode network protocol versions. In Cassandra’s gossip system, nodes exchange state information not only about themselves but also about other nodes they know about
  • As replicas in Cassandra can accept mutations independently, it is possible for some replicas to have newer data than others.
  • Write operations are always sent to all replicas, regardless of consistency level. The consistency level simply controls how many responses the coordinator waits for before responding to the client.
  • For read operations, the coordinator generally only issues read commands to enough replicas to satisfy the consistency level. The one exception to this is when speculative retry may issue a redundant read request to an extra replica if the original replicas have not responded within a specified time window.

The Cassandra Query Language

Definitions

  • Conventions
  • Square brackets [item] for optional items
  • * and + for repeated items
  • Identifiers (case insensitive)
  • SELECT
  • WITH
  • Quoted Identifier (Case sensitive)
  • Allows to declare columns with arbitrary names
  • Don’t name your column applied
  • Constants
  • A string constant is an arbitrary sequence of characters enclosed by single quote.
  • Or $$ XX(any character)XX $$
  • Comments
  • A line beginning by double dashes (- -) or double slash (//)

Data Types

  • Native Data types such as BOOLEAN, DOUBLE, DECIMAL
  • CQL supports 3 kind of collections: Maps, Sets and Lists. Collections are meant for storing/denormalizing relatively small amount of data. They work well for things like “the phone numbers of a given user”, “labels applied to an email”, etc. But when items are expected to grow unbounded (“all messages sent by a user”, “events registered by a sensor”…), then collections are not appropriate and a specific table (with clustering columns) should be used.
  • User Defined Types: Such a type can be created, modified and removed using the create_type_statement, alter_type_statement and drop_type_statement described below. But once created, a UDT is simply referred to by its name:
CREATE TYPE phone (
country_code int,
number text,
)

CREATE TYPE address (
street text,
city text,
zip text,
phones map<text, phone>
)

CREATE TABLE user (
name text PRIMARY KEY,
addresses map<text, frozen<address>>
)
  • Attempting to create an already existing type will result in an error unless the IF NOT EXISTS option is used.

Data Definition

  • CQL stores data in tables, whose schema defines the layout of said data in the table, and those tables are grouped in keyspaces. A keyspace defines a number of options that applies to all the tables it contains, most prominently of which is the replication strategy used by the keyspace. It is generally encouraged to use one keyspace by application, and thus many cluster may define only one keyspace.
  • Data -> Table -> Keyspaces
  • By default, keyspace and table names are case insensitive (myTable is equivalent to mytable) but case sensitivity can be forced by using double-quotes ("myTable" is different from mytable).
CREATE KEYSPACE excelsior
WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3};

CREATE KEYSPACE excalibur
WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1' : 1, 'DC2' : 3}
AND durable_writes = false;
  • Create Table
CREATE TABLE monkeySpecies (
species text PRIMARY KEY,
common_name text,
population varint,
average_size int
) WITH comment='Important biological records';

CREATE TABLE timeline (
userid uuid,
posted_month int,
posted_time uuid,
body text,
posted_by text,
PRIMARY KEY (userid, posted_month, posted_time)
) WITH compaction = { 'class' : 'LeveledCompactionStrategy' };

CREATE TABLE loads (
machine inet,
cpu int,
mtime timeuuid,
load float,
PRIMARY KEY ((machine, cpu), mtime)
) WITH CLUSTERING ORDER BY (mtime DESC);
  • Some columns can be declared as STATIC in a table definition. A column that is static will be “shared” by all the rows belonging to the same partition (having the same partition key).
  • All table must define a PRIMARY KEY (and only one).PRIMARY KEY definition is composed of one or more of the columns defined in the table. Syntactically, the primary key is defined the keywords PRIMARY KEY followed by comma-separated list of the column names composing it within parenthesis, but if the primary key has only one column, one can alternatively follow that column definition by the PRIMARY KEY keywords. The order of the columns in the primary key definition matter.
  • A CQL primary key is composed of 2 parts:
  • partition key part. It is the first component of the primary key definition. It can be a single column or, using additional parenthesis, can be multiple columns. A table always have at least a partition key, the smallest possible table definition is: CREATE TABLE t (k text PRIMARY KEY);
  • the clustering columns. Those are the columns after the first component of the primary key definition, and the order of those columns define the clustering order. Clustering key are used for sorting within a partition.
  • Some example of primary key definition are:

PRIMARY KEY (a): a is the partition key and there is no clustering columns.

PRIMARY KEY (a, b, c) : a is the partition key and b and c are the clustering columns.

PRIMARY KEY ((a, b), c) : a and b compose the partition key (this is often called a composite partition key) and c is the clustering column.

  • Within a table, CQL defines the notion of a partition. A partition is simply the set of rows that share the same value for their partition key.
  • Note that a table always has a partition key, and that if the table has no clustering columns, then every partition of that table is only comprised of a single row (since the primary key uniquely identifies rows and the primary key is equal to the partition key if there is no clustering columns).
  • The most important property of partition is that all the rows belonging to the same partition are guarantee to be stored on the same set of replica nodes. In other words, the partition key of a table defines which of the rows will be localized together in the Cluster, and it is thus important to choose your partition key wisely so that rows that needs to be fetch together are in the same partition (so that querying those rows together require contacting a minimum of nodes).

Data manipulation

SELECT name, occupation FROM users WHERE userid IN (199, 200, 207);
SELECT JSON name, occupation FROM users WHERE userid = 199;
SELECT name AS user_name, occupation AS user_occupation FROM users;

SELECT time, value
FROM events
WHERE event_type = 'myEvent'
AND time > '2011-02-03'
AND time <= '2012-01-01'

SELECT COUNT (*) AS user_count FROM users;
  • Not all relations are allowed in a query. For instance, non-equal relations (where IN is considered as an equal relation) on a partition key are not supported. Moreover, for a given partition key, the clustering columns induce an ordering of rows and relations on them is restricted to the relations that allow to select a contiguous (for the ordering) set of rows. For instance, given:
CREATE TABLE posts (
userid text,
blog_title text,
posted_at timestamp,
entry_title text,
content text,
category int,
PRIMARY KEY (userid, blog_title, posted_at)
)
  • The following query is allowed:
SELECT entry_title, content FROM posts
WHERE userid = 'john doe'
AND blog_title='John''s Blog'
AND posted_at >= '2012-01-01' AND posted_at < '2012-01-31'
  • But the following one is not, as it does not select a contiguous set of rows (and we suppose no secondary indexes are set):
// Needs a blog_title to be set to select ranges of posted_at
SELECT entry_title, content FROM posts
WHERE userid = 'john doe'
AND posted_at >= '2012-01-01' AND posted_at < '2012-01-31'
  • Groupby, Limit…..
  • Insert
  • Update
  • Delete
  • Batch
  • The BATCH statement group multiple modification statements (insertions/updates and deletions) into a single statement. It serves several purposes:
  • It saves network round-trips between the client and the server (and sometimes between the server coordinator and the replicas) when batching multiple updates.
  • All updates in a BATCH belonging to a given partition key are performed in isolation.
  • By default, all operations in the batch are performed as logged, to ensure all mutations eventually complete (or none will). See the notes on UNLOGGED batches for more details.

Secondary Indexes

CREATE INDEX userIndex ON NerdMovies (user);
CREATE INDEX ON Mutants (abilityId);
CREATE INDEX ON users (keys(favs));
CREATE CUSTOM INDEX ON users (email) USING 'path.to.the.IndexClass';
CREATE CUSTOM INDEX ON users (email) USING 'path.to.the.IndexClass' WITH OPTIONS = {'storage': '/mnt/ssd/indexes/'};
  • The CREATE INDEX statement is used to create a new (automatic) secondary index for a given (existing) column in a given table. A name for the index itself can be specified before the ON keyword, if desired. If data already exists for the column, it will be indexed asynchronously. After the index is created, new data for the column is indexed automatically at insertion time.

Materialised Views

Security

Functions

  • User-defined functions allow execution of user-provided code in Cassandra. By default, Cassandra supports defining functions in Java and JavaScript.

JSON Support

Triggers

Data Modeling

  • Data modeling is the process of identifying entities and their relationships. In relational databases, data is placed in normalized tables with foreign keys used to reference related data in other tables. Queries that the application will make are driven by the structure of the tables and related data are queried as table joins.
  • In Cassandra, data modeling is query-driven. The data access patterns and application queries determine the structure and organization of data which then used to design the database tables.
  • Unlike a relational database model in which queries make use of table joins to get data from multiple tables, joins are not supported in Cassandra so all required fields (columns) must be grouped together in a single table.
  • Apache Cassandra is a distributed database that stores data across a cluster of nodes. A partition key is used to partition data among the nodes. Cassandra partitions data over the storage nodes using a variant of consistent hashing for data distribution.
  • After the conceptual data model has been created, a schema may be designed for a query.
CREATE TABLE magazine_name (id int PRIMARY KEY, name text, publicationFrequency text)
  • For Q2 the schema definition would include a clustering key for sorting.
CREATE TABLE magazine_publisher (publisher text,id int,name text, publicationFrequency text,
PRIMARY KEY (publisher, id)) WITH CLUSTERING ORDER BY (id DESC)

Design Differences Between RDBMS and Cassandra

Let’s take a minute to highlight some of the key differences in doing data modeling for Cassandra versus a relational database.

No joins

You cannot perform joins in Cassandra. If you have designed a data model and find that you need something like a join, you’ll have to either do the work on the client side, or create a denormalized second table that represents the join results for you. This latter option is preferred in Cassandra data modeling. Performing joins on the client should be a very rare case; you really want to duplicate (denormalize) the data instead.

No referential integrity

Although Cassandra supports features such as lightweight transactions and batches, Cassandra itself has no concept of referential integrity across tables. In a relational database, you could specify foreign keys in a table to reference the primary key of a record in another table. But Cassandra does not enforce this. It is still a common design requirement to store IDs related to other entities in your tables, but operations such as cascading deletes are not available.

Denormalization

In relational database design, you are often taught the importance of normalization. This is not an advantage when working with Cassandra because it performs best when the data model is denormalized. It is often the case that companies end up denormalizing data in relational databases as well. There are two common reasons for this. One is performance. Companies simply can’t get the performance they need when they have to do so many joins on years’ worth of data, so they denormalize along the lines of known queries. This ends up working, but goes against the grain of how relational databases are intended to be designed, and ultimately makes one question whether using a relational database is the best approach in these circumstances.

A second reason that relational databases get denormalized on purpose is a business document structure that requires retention. That is, you have an enclosing table that refers to a lot of external tables whose data could change over time, but you need to preserve the enclosing document as a snapshot in history. The common example here is with invoices. You already have customer and product tables, and you’d think that you could just make an invoice that refers to those tables. But this should never be done in practice. Customer or price information could change, and then you would lose the integrity of the invoice document as it was on the invoice date, which could violate audits, reports, or laws, and cause other problems.

In the relational world, denormalization violates Codd’s normal forms, and you try to avoid it. But in Cassandra, denormalization is, well, perfectly normal. It’s not required if your data model is simple. But don’t be afraid of it.

Historically, denormalization in Cassandra has required designing and managing multiple tables using techniques described in this documentation. Beginning with the 3.0 release, Cassandra provides a feature known as materialized views which allows you to create multiple denormalized views of data based on a base table design. Cassandra manages materialized views on the server, including the work of keeping the views in sync with the table.

Read for more on normalisation

Query-first design

Relational modeling, in simple terms, means that you start from the conceptual domain and then represent the nouns in the domain in tables. You then assign primary keys and foreign keys to model relationships. When you have a many-to-many relationship, you create the join tables that represent just those keys. The join tables don’t exist in the real world, and are a necessary side effect of the way relational models work. After you have all your tables laid out, you can start writing queries that pull together disparate data using the relationships defined by the keys. The queries in the relational world are very much secondary. It is assumed that you can always get the data you want as long as you have your tables modeled properly. Even if you have to use several complex subqueries or join statements, this is usually true.

By contrast, in Cassandra you don’t start with the data model; you start with the query model. Instead of modeling the data first and then writing queries, with Cassandra you model the queries and let the data be organized around them. Think of the most common query paths your application will use, and then create the tables that you need to support them.

Detractors have suggested that designing the queries first is overly constraining on application design, not to mention database modeling. But it is perfectly reasonable to expect that you should think hard about the queries in your application, just as you would, presumably, think hard about your relational domain. You may get it wrong, and then you’ll have problems in either world. Or your query needs might change over time, and then you’ll have to work to update your data set. But this is no different from defining the wrong tables, or needing additional tables, in an RDBMS.

Designing for optimal storage

In a relational database, it is frequently transparent to the user how tables are stored on disk, and it is rare to hear of recommendations about data modeling based on how the RDBMS might store tables on disk. However, that is an important consideration in Cassandra. Because Cassandra tables are each stored in separate files on disk, it’s important to keep related columns defined together in the same table.

A key goal that you will see as you begin creating data models in Cassandra is to minimize the number of partitions that must be searched in order to satisfy a given query. Because the partition is a unit of storage that does not get divided across nodes, a query that searches a single partition will typically yield the best performance.

Sorting is a design decision

In an RDBMS, you can easily change the order in which records are returned to you by using ORDER BY in your query. The default sort order is not configurable; by default, records are returned in the order in which they are written. If you want to change the order, you just modify your query, and you can sort by any list of columns.

In Cassandra, however, sorting is treated differently; it is a design decision. The sort order available on queries is fixed, and is determined entirely by the selection of clustering columns you supply in the CREATE TABLE command. The CQL SELECT statement does support ORDER BY semantics, but only in the order specified by the clustering columns.

How are Cassandra Transactions different from RDBMS Transactions in terms of ACID properties?

  • Cassandra does not use RDBMS ACID Transactions with rollback or locking mechanism. Not Consistent. Only offers atomic, isolated and durable transactions.
  • The fact that it does not support joins or foreign keys means it does not offer consistency.
  • Consistent: refers to how up-to-date and synchronized a row of Cassandra data is on all of its replicas. Using repair operations, Cassandra data will eventually be consistent in all replicas. Repairs work to decrease the variability in replica data, but at a given time, stale data can be present. Cassandra is a AP system according to the CAP theorem, providing high availability and partition tolerance. Cassandra does have flexibility in its configuration, though, and can perform more like a CP (consistent and partition tolerant) system according to the CAP theorem, depending on the application requirements. Two consistency features are tunable consistency and linearizable consistency.
  • Tunable consistency allows individual read or write operations to be as strongly consistent as required by the client application. The consistency level determines only the number of replicas that need to acknowledge the read or write operation success to the client application. For read operations, the read consistency level specifies how many replicas must respond to a read request before returning data to the client application. Read operations will use read repair to update stale data in the background if discovered during a read operation. For write operations, the write consistency level specified how many replicas must respond to a write request before the write is considered successful. Even at low consistency levels, Cassandra writes to all replicas of the partition key, including replicas in other datacenters. The write consistency level just specifies when the coordinator can report to the client application that the write operation is considered completed. Write operations will use hinted handoffs to ensure the writes are completed when replicas are down or otherwise not responsive to the write request. Typically, a client specifies a consistency level that is less than the replication factor specified by the keyspace. Another common practice is to write at a consistency level of QUORUM and read at a consistency level of QUORUM. The choices made depend on the client application’s needs, and Cassandra provides maximum flexibility for application design.
  • Atomic: A write is at the partition level. Inserting or updating in a row is created as one write operation. if using a write consistency level of QUORUM with a replication factor of 3, Cassandra will replicate the write to all nodes in the cluster and wait for acknowledgement from two nodes. If the write fails on one of the nodes but succeeds on the other, Cassandra reports a failure to replicate the write on that node. However, the replicated write that succeeds on the other node is not automatically rolled back. Cassandra uses client-side timestamps to determine the most recent update to a column. The latest timestamp always wins when requesting data, so if multiple client sessions update the same columns in a row concurrently, the most recent update is the one seen by readers.
  • Isolation: Full row-level isolation is in pace, which means that writes to a row are isolated to the client performing the write and are not visible to any other use until they are complete. Delete operations are performed in isolation. All updates in a batch operation belonging to a given partition key are performed in isolation.
  • Durability: All writes to a replica node are recorded both in memory and in a commit log on disk before they are acknowledged as a success. If a crash or server failure occurs before the memtables are flushed to disk, the commit log is replayed on restart to recover any lost writes. In addition to the local durability (data immediately written to disk), the replication of data on other nodes strengthens durability.

In short, Cassandra is good for heavy writes applications.

The End :)

This is a repository of my thoughts on my personal life, my random interests & notes taken down as I navigate my way through the tech world!