Installing Cassandra and Spark with Ansible

I am currently doing a proof of concept with Spark and Cassandra. I quickly need to be able to create and start Cassandra and Spark clusters. Ansible to the rescue!

I split this my ansible playbook into three roles:

Cassandra
Ops center
Spark

My main playbook is very simple:

I have some hosts defined in a separate hosts file called m25-cassandra. I've decided to install htop, I could have out this in a general server role.

I also define a few variables, these of course course could be defined else where per role:

cluster_name - this will replace the cluser name in each of the hosts cassandra.yaml
seeds - as above

So lets take a look at each role.

Cassandra

Here are the tasks:

This is doing the following:

Installing a JRE
Adding the Apache Cassandra debian repository
Adding the keys for the debian repository
Installing the latest version of Cassandra
Replacing the cassandra.yaml (details later)
Ensuring Cassandra is started

The template cassandra.yaml uses the following variables:

cluster_name: '{{ cluster_name }}' - So we can rename the cluster
- seeds: "{{ seeds }}" - So when we add a new node it connects to the cluster
listen_address: {{ inventory_hostname }} - Listen on the nodes external IP so other nodes can communicate with it
rpc_address: {{ inventory_hostname }} - So we can connect ops center and cqlsh to the nodes

Magic! Now adding new hosts to my hosts file with the tag m25_cassandra will get Cassandra installed, connected to the cluster and started.

Ops Center

The tasks file for ops center:

This is doing the following:

Adding the Datastax community debian repository
Adding the key for the repo
Installing Ops Center
Starting Ops Center

No templates here as all the default configuration is fine.

Spark

The spark maven build can build a debian package but I didn't find a public debian repo with it in so the following just downloads and unzips the Spark package:

I start the workers using the start-slaves.sh script from my local master do don't need to start anything on the nodes that have Cassandra on.

Conclusion

Ansible makes it very easy to install distributed systems like Cassandra. The thought of doing it manually fills me with pain! This is just got a PoC, I don't suggest downloading Spark from the public internet or always installing the latest version of Cassandra for your production systems. The full souce including templates and directory structure is here.

← Previous Post Next Post →