I am currently doing a proof of concept with Spark and Cassandra. I quickly need to be able to create and start Cassandra and Spark clusters. Ansible to the rescue!
I split this my ansible playbook into three roles:
- Cassandra
- Ops center
- Spark
My main playbook is very simple:
I have some hosts defined in a separate hosts file called m25-cassandra. I've decided to install htop, I could have out this in a general server role.
I also define a few variables, these of course course could be defined else where per role:
- cluster_name - this will replace the cluser name in each of the hosts cassandra.yaml
- seeds - as above
So lets take a look at each role.
Cassandra
Here are the tasks:
This is doing the following:
- Installing a JRE
- Adding the Apache Cassandra debian repository
- Adding the keys for the debian repository
- Installing the latest version of Cassandra
- Replacing the cassandra.yaml (details later)
- Ensuring Cassandra is started
The template cassandra.yaml uses the following variables:
- cluster_name: '{{ cluster_name }}' - So we can rename the cluster
- - seeds: "{{ seeds }}" - So when we add a new node it connects to the cluster
- listen_address: {{ inventory_hostname }} - Listen on the nodes external IP so other nodes can
communicate with it
- rpc_address: {{ inventory_hostname }} - So we can connect ops center and cqlsh to the nodes
Magic! Now adding new hosts to my hosts file with the tag m25_cassandra will get Cassandra installed, connected to
the cluster and started.
Ops Center
The tasks file for ops center:
This is doing the following:
- Adding the Datastax community debian repository
- Adding the key for the repo
- Installing Ops Center
- Starting Ops Center
No templates here as all the default configuration is fine.
Spark
The spark maven build can build a debian package but I didn't find a public debian repo with it in so the
following just downloads and unzips the Spark package:
I start the workers using the start-slaves.sh script from my local master do don't need to start anything on the
nodes that have Cassandra on.
Conclusion
Ansible makes it very easy to install distributed systems like Cassandra. The thought of doing it manually fills me
with pain! This is just got a PoC, I don't suggest downloading Spark from the public internet or always installing
the latest version of Cassandra for your production systems. The full souce including templates and directory
structure is
here.