aims to provide a user-friendly command line tool to create, manage and setup computing clusters hosted on cloud infrastructures like Amazon's Elastic Compute Cloud EC2, Google Compute Engine, or a private OpenStack cloud. Its main goal is to get your compute cluster up and running with just a few commands.
The architecture of ElastiCluster is quite simple: a configuration file defines a set of cluster configurations and information on how to access a specific cloud webservice.
Using the command line (or, very soon, a simple API), you can start a cluster. ElastiCluster will connect to the desired cloud, start the virtual machines and wait until they are accessible via SSH.
After all the virtual machines are up and running, ElastiCluster will use Ansible to configure them.
ElastiCluster provides automated setup of:
ElastiCluster is in active development, and offers the following features at the moment:
The following sample shows how a cluster is configured with ElastiCluster and the basic commands to interact with ElastiCluster through the command line interface. For a full description of the configuration and command line interface see the documentation. It's also possible to bundle all the examples below in a single configuration file.
(You can see more configuration examples in the
examples/ directory of the source tree.)
This example shows how to set up a SLURM batch-queuing cluster with 4 compute nodes and a single front-end node (also acting as NFS server for shared home directories). More details on the SLURM configuration can be found here.
[cluster/slurm] cloud=openstack login=ubuntu setup_provider=slurm security_group=default # Ubuntu image image_id=16618a82-92fd-4615-86e6-d354f9f66af5 flavor=4cpu-16ram-hpc frontend_nodes=1 compute_nodes=4 ssh_to=frontend [cloud/openstack] provider=openstack auth_url=http://openstack.example.org:5000/v2.0 username=****REPLACE WITH YOUR USERNAME**** password=****REPLACE WITH YOUR PASSWORD**** project_name=****REPLACE WITH YOUR PROJECT/TENANT NAME**** [login/ubuntu] image_user=ubuntu image_user_sudo=root image_sudo=True user_key_name=elasticluster user_key_private=~/.ssh/id_rsa user_key_public=~/.ssh/id_rsa.pub [setup/slurm] provider=ansible frontend_groups=slurm_master compute_groups=slurm_worker
$ elasticluster start slurm -n mycluster
$ elasticluster list-nodes mycluster
$ elasticluster resize mycluster -a 10:compute
$ elasticluster ssh mycluster
$ elasticluster sftp mycluster
$ elasticluster stop mycluster
This example shows how to set up a Hadoop 2.x cluster, complete with HDFS. Each of the 8 "worker" nodes are both HDFS data nodes and YARN execution nodes. The single "master" node acts as YARN resource manager and HDFS name node.
Spark (together with Python and R support) is installed on top of Hadoop YARN; as soon as the cluster is installed, it is possible to log in to the "master" node and start submitting Spark or Map/Reduce jobs.
More details on the Hadoop/Spark configuration can be found here.
[cluster/hadoop] cloud=amazon-us-east-1 login=ubuntu setup=hadoop security_group=all_tcp_ports image_id=ami-00000048 flavor=m1.small master_nodes=1 worker_nodes=8 ssh_to=hadoop-name [cloud/amazon-us-east-1] provider=ec2_boto ec2_url=https://ec2.us-east-1.amazonaws.com ec2_access_key=****REPLACE WITH YOUR ACCESS ID**** ec2_secret_key=****REPLACE WITH YOUR SECRET KEY**** ec2_region=us-east-1 [login/ubuntu] image_user=ubuntu [setup/hadoop] provider=ansible master_groups=hadoop_master worker_groups=hadoop_worker
$ elasticluster start hadoop -n cluster1
$ elasticluster start hadoop -n cluster2
$ elasticluster list
$ elasticluster list-nodes cluster1
$ elasticluster resize cluster1 -a 10:worker
$ elasticluster ssh cluster1
$ elasticluster sftp cluster1
$ elasticluster stop cluster1 cluster2
This example shows how to set up a single-node JupyterHub server: ElastiCluster will provision a virtual machine, install Jupyter/IPython and configure the JupyterHub server to run as a service over the default HTTPS website. (By default, a self-signed TLS/SSL certificate is used for HTTPS.)
[cluster/jupyterhub] cloud=google login=google setup=jupyterhub security_group=tcp_port_443 image_id=debian-8-jessie-v20170124 flavor=n1-standard-1 server_nodes=1 ssh_to=server image_userdata= [cloud/google] provider=google gce_project_id=****REPLACE WITH YOUR PROJECT ID**** gce_client_id=****REPLACE WITH YOUR CLIENT ID**** gce_client_secret=****REPLACE WITH YOUR SECRET KEY**** [login/google] image_user=****REPLACE WITH YOUR GOOGLE ID**** image_user_sudo=root image_sudo=True user_key_name=elasticluster user_key_private=~/.ssh/id_rsa user_key_public=~/.ssh/id_rsa.pub [setup/jupyterhub] provider=ansible server_groups=jupyterhub
$ elasticluster start jupyterhub
$ elasticluster ssh jupyterhub
$ elasticluster stop jupyterhub
General discussion over ElastiCluster's usage, features, and bugs takes place on the
firstname.lastname@example.org mailing-list (only subscribers can post).