Recently I wanted to try Elasticsearch cluster on a docker swarm. Since I’m using VMware for virtualization I had to overcome several obstacles to install a docker-swarm on it, configure the multi-host networking (an experimental feature on its own at this time) and then deploy Elasticsearch cluster.

Prerequisites:

On Linux:

  • docker
  • docker-machine
  • docker-swarm

On OSX/Windows:

Everywhere

First things first…

Docker-machine on VMware

To interact with VMware ESXi or vCenter the docker-machine depends on govc. Download and put it somewhere into your $PATH. On windows put the govc binary into %PROGRAMFILES%\Docker Toolbox\

docker-machine is using boot2docker images for deploying virtual machines on VMware. At the time of writing this document the multi-host networking feature of docker is available only in experimental version of docker (it might not be the case once the docker 1.9 is out). To enable this feature I had to use an experimental version of boot2docker images that is conveniently available here.

At first I tried to interact directly with EXSi host (later I’ll show how to interact with vCenter and the VMware cluster).

docker-machine create --driver vmwarevsphere --vmwarevsphere-boot2docker-url $(curl https://api.github.com/repos/ahbeng/boot2docker-experimental/releases/latest | grep -o https://.*/boot2docker.iso) --vmwarevsphere-vcenter $ESXi_IP --vmwarevsphere-network "VM Network" --vmwarevsphere-datastore datastore1 --vmwarevsphere-datacenter ha-datacenter  --vmwarevsphere-username root --vmwarevsphere-password $PASSWORD docker1

The key argument are kind of obvious. The caveats is

--vmwarevsphere-datacenter

that has to be set to “ha-datacenter” when you interact directly with ESXi host.

Another issue I had was that the boot2docker.iso I previously  downloaded was cached and for some reason was not replaced with the new one on the ESXi. The simple docker-machine rm does not remove the iso from the datastore. I had to browse the datastore and remove the iso file manually. [Also see the P.S.]

Once the machine is successfully deployed you should be able to verify the docker version by conenctign to it:

# docker-machine ssh docker1
                        ##         .
                  ## ## ##        ==
               ## ## ## ## ##    ===
           /"""""""""""""""""\___/ ===
      ~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ /  ===- ~~~
           \______ o           __/
             \    \         __/
              \____\_______/
 _                 _   ____     _            _
| |__   ___   ___ | |_|___ \ __| | ___   ___| | _____ _ __
| '_ \ / _ \ / _ \| __| __) / _` |/ _ \ / __| |/ / _ \ '__|
| |_) | (_) | (_) | |_ / __/ (_| | (_) | (__|   <  __/ |
|_.__/ \___/ \___/ \__|_____\__,_|\___/ \___|_|\_\___|_|
Boot2Docker version 1.8.1, build master : 5986633 - Thu Aug 13 04:17:54 UTC 2015
Docker version 1.9.0-dev, build 5dadfa8, experimental

The network configuration of such machine when deployed on ESXi is rather peculiar. The machine has two network interfaces, both connected to the very same network yet each has different purpose. It derives from the default Virtualbox configuration where one interface is host-only and the other one is used to access the internet via the NAT on the host.

Once I was able to deploy the docker machine directly to ESXi host I wanted to deploy it via vCener to VMware cluster. The trick is to use proper values for:

--vmwarevsphere-datacenter
--vmwarevsphere-pool

Whereas datacenter value is kind of obvious finding the proper value for pool was a bit tricky as it’s not well documented. After digging up through various documents and source files I found that the proper value should look like this:

--vmwarevsphere-pool "$CLUSTER/Resources/"

Where $CLUSTER was an actual cluster name as seen in the vCenter console and Resources (verbatim) indicated that it should be deployed directly to that cluster (rather than to any resource pool beneath that cluster). For example in my case it was:

--vmwarevsphere-pool "Cluster 3/Resources/"

The complete command looked like:

docker-machine create --driver vmwarevsphere --vmwarevsphere-boot2docker-url $(curl https://api.github.com/repos/ahbeng/boot2docker-experimental/releases/latest | grep -o https://.*/boot2docker.iso) --vmwarevsphere-vcenter $vCENTER_IP --vmwarevsphere-network "Cluster Network" --vmwarevsphere-datastore $CLUSTER_DS --vmwarevsphere-datacenter $DATACENTER --vmwarevsphere-pool "$CLUSTER/Resources/" --vmwarevsphere-username $VCENTER_USER --vmwarevsphere-password $PASSWORD docker1

Having that in place I was able to start creating the swarm with multi-host networking.

Deploying the swarm

I already had docker1 machine so the next step was to deploy the swarm. Since the swarm does not (yet) support the multi-host networking it requires a bit of fiddling. To be honest I took most of instructions for this step from this blog post.

Create the Consul node and start Consul

docker $(docker-machine config docker1) run -d -p 8500:8500 progrium/consul -server -bootstrap-expect 1

Create a swarm token

export SWARM_TOKEN=$(docker $(docker-machine config docker1) run swarm create)

Create swarm master

Note that I had to use eth0 for communication.

docker-machine create --driver vmwarevsphere --vmwarevsphere-boot2docker-url $(curl https://api.github.com/repos/ahbeng/boot2docker-experimental/releases/latest | grep -o https://.*/boot2docker.iso) --vmwarevsphere-vcenter $vCENTER_IP --vmwarevsphere-network "Cluster Network" --vmwarevsphere-datastore $CLUSTER_DS --vmwarevsphere-datacenter $DATACENTER --vmwarevsphere-pool "$CLUSTER/Resources/" --vmwarevsphere-username $VCENTER_USER --vmwarevsphere-password $PASSWORD --engine-opt="default-network=overlay:multihost" --engine-opt="kv-store=consul:$(docker-machine ip docker1):8500" --engine-label="com.docker.network.driver.overlay.bind_interface=eth0" swarm-0

Start swarm manually

Pay attention to the volume mounting option as the boot2docker uses /mnt/sda1 for persistent storage.

docker $(docker-machine config swarm-0) run -d --restart="always" --net="bridge" swarm:latest join --addr "$(docker-machine ip swarm-0):2376" "token://$SWARM_TOKEN"
docker $(docker-machine config swarm-0) run -d --restart="always" --net="bridge" -p "3376:3376" -v "/mnt/sda1/var/lib/boot2docker/:/etc/docker swarm:latest manage --tlsverify --tlscacert="/etc/docker/ca.pem" --tlscert="/etc/docker/server.pem" --tlskey="/etc/docker/server-key.pem" -H "tcp://0.0.0.0:3376" --strategy spread "token://$SWARM_TOKEN"

Create a Swarm node

Any number of the nodes can be created this way (just use distinct names :):

docker-machine create --driver vmwarevsphere --vmwarevsphere-boot2docker-url $(curl https://api.github.com/repos/ahbeng/boot2docker-experimental/releases/latest | grep -o https://.*/boot2docker.iso) --vmwarevsphere-vcenter $vCENTER_IP --vmwarevsphere-network "Cluster Network" --vmwarevsphere-datastore $CLUSTER_DS --vmwarevsphere-datacenter $DATACENTER --vmwarevsphere-pool "$CLUSTER/Resources/" --vmwarevsphere-username $VCENTER_USER --vmwarevsphere-password $PASSWORD --engine-opt="default-network=overlay:multihost" --engine-opt="kv-store=consul:$(docker-machine ip docker1):8500" --engine-label="com.docker.network.driver.overlay.bind_interface=eth0" --engine-label="com.docker.network.driver.overlay.neighbor_ip=$(docker-machine ip swarm-0)" swarm-1
docker $(docker-machine config swarm-1) run -d --restart="always" --net="bridge" swarm:latest join --addr "$(docker-machine ip swarm-1):2376" "token://$SWARM_TOKEN"

Point Docker at the Swarm

export DOCKER_HOST=tcp://"$(docker-machine ip swarm-0):3376"
export DOCKER_TLS_VERIFY=1
export DOCKER_CERT_PATH="$HOME/.docker/machine/machines/swarm-0"

Once you pointed you docked at swarm you should be able to do something like this:

~# docker info
Containers: 3
Images: 6
Role: primary
Strategy: spread
Filters: affinity, health, constraint, port, dependency
Nodes: 2
 swarm-0: 10.168.113.22:2376
 â Containers: 2
 â Reserved CPUs: 0 / 2
 â Reserved Memory: 0 B / 2.053 GiB
 â Labels: com.docker.network.driver.overlay.bind_interface=eth0, executiondriver=native-0.2, kernelversion=4.0.9-boot2docker, operatingsystem=Boot2Docker 1.8.1 (TCL 6.3); master : 5986633 - Thu Aug 13 04:17:54 UTC 2015, provider=vmwarevsphere, storagedriver=aufs
 swarm-1: 10.168.113.36:2376
 â Containers: 1
 â Reserved CPUs: 0 / 2
 â Reserved Memory: 0 B / 2.053 GiB
 â Labels: com.docker.network.driver.overlay.bind_interface=eth0, com.docker.network.driver.overlay.neighbor_ip=10.168.113.22, executiondriver=native-0.2, kernelversion=4.0.9-boot2docker, operatingsystem=Boot2Docker 1.8.1 (TCL 6.3); master : 5986633 - Thu Aug 13 04:17:54 UTC 2015, provider=vmwarevsphere, storagedriver=aufs
CPUs: 4
Total Memory: 4.107 GiB
Name: 0731404c89e3
~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
~# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
307756de33f3 swarm:latest "/swarm join --addr 1" 3 days ago Up 3 days 2375/tcp swarm-1/swarm-1-join
0731404c89e3 swarm:latest "/swarm manage --tlsv" 3 days ago Up 3 days 2375/tcp, 10.168.113.22:3376->3376/tcp swarm-0/swarm-manage
da353fc9251b swarm:latest "/swarm join --addr 1" 3 days ago Up 3 days 2375/tcp swarm-0/swarm-0-join

Deploying Elasticsearch on swarm

Having swarm on VMware cluster in place I was able to move forward and deploy the Elasticsearch which would be quite straightforward… Unfortunately I stumbled upon a small bug in experimental docker networking functionalities that prevented me from attaching a container to a multi-host network while simultaneously publishing a port on a host in an elegant way so I had to invent a workaround.

Deploying clustered Elasticsearch

To deploy a cluster of Elasticsearch I crated a script that does the trick of providing proper configuration to containers. I had to do it this way as multicasts are not working properly across hosts at this moment (plus multicast deployment is kind of insecure and not recommended for production deployments). This approach was inspired by this post.

#!/bin/bash

CLUSTER_NAME="es"

REMOTE_DOCKER_HOSTS=$(docker run --rm swarm list token://$SWARM_TOKEN | sort | uniq)
MIN_MASTER_NODES=$(( $(echo $REMOTE_DOCKER_HOSTS | wc -l) / 2 + 1))

UNICAST_HOSTS=$(echo $REMOTE_DOCKER_HOSTS | sed -e 's/ /\n/' -e 's/:2376/:9300/g' | sed '/./=' | sed '/./N;s/\n/ /;s/\([0-9]\) [0-9\.]*/es\1/;' | paste -s -d" ")

i=1
for host in $REMOTE_DOCKER_HOSTS; do
 name=${CLUSTER_NAME}${i}
 echo "Starting container $name"
 docker -H tcp://$host \
 run -d \
 --name $name --hostname $name \
 elasticsearch \
 elasticsearch \
 -Des.cluster.name=${CLUSTER_NAME}-swarm \
 -Des.multicast.enabled=false \
 -Des.transport.publish_host=$name \
 -Des.discovery.zen.ping.unicast.hosts=$UNICAST_HOSTS \
 -Des.discovery.zen.minimum_master_nodes=$MIN_MASTER_NODES
 sleep 3
 ((i++))
done

Make sure to use the sames session with $SWARM_TOKEN environment variable set by previous steps. You can set the CLUSTER_NAME to any value in case you’d like to deploy say a production and development clusters.

Once the script is run you should see something like this:

~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
cdbeb9f007bf        elasticsearc        "/docker-entrypoint.s"   3 minutes ago      Up 2 minutes                           swarm-1/es2
dc3c6967a59c        elasticsearc        "/docker-entrypoint.s"   3 minutes ago      Up 2 minutes                           swarm-0/es1


~# docker logs es1
[2015-08-31 11:38:29,966][INFO ][node ] [Captain Ultra] version[1.7.1], pid[1], build[b88f43f/2015-07-29T09:54:16Z]
[2015-08-31 11:38:29,967][INFO ][node ] [Captain Ultra] initializing ...
[2015-08-31 11:38:30,035][INFO ][plugins ] [Captain Ultra] loaded [], sites []
[2015-08-31 11:38:30,071][INFO ][env ] [Captain Ultra] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/sda1)]], net usable_space [16.5gb], net total_space [18.1gb], types [ext4]
[2015-08-31 11:38:32,279][INFO ][node ] [Captain Ultra] initialized
[2015-08-31 11:38:32,279][INFO ][node ] [Captain Ultra] starting ...
[2015-08-31 11:38:32,340][INFO ][transport ] [Captain Ultra] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[es1/172.21.0.34:9300]}
[2015-08-31 11:38:32,376][INFO ][discovery ] [Captain Ultra] es-swarm/zvhqUtShRQeEKXMaQ4Wd8Q
[2015-08-31 11:38:36,159][INFO ][cluster.service ] [Captain Ultra] new_master [Captain Ultra][zvhqUtShRQeEKXMaQ4Wd8Q][es1][inet[es1/172.21.0.34:9300]], reason: zen-disco-join (elected_as_master)
[2015-08-31 11:38:36,195][INFO ][http ] [Captain Ultra] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.17.0.23:9200]}
[2015-08-31 11:38:36,195][INFO ][node ] [Captain Ultra] started
[2015-08-31 11:38:36,202][INFO ][gateway ] [Captain Ultra] recovered [0] indices into cluster_state
[2015-08-31 11:38:39,897][INFO ][cluster.service ] [Captain Ultra] added {[Nox][aiPPPqtsTrSIA1coRDG7hA][es2][inet[/172.21.0.35:9300]],}, reason: zen-disco-receive(join from node[[Nox][aiPPPqtsTrSIA1coRDG7hA][es2][inet[/172.21.0.35:9300]]])

~# docker logs es2
[2015-08-31 11:38:33,542][INFO ][node ] [Nox] version[1.7.1], pid[1], build[b88f43f/2015-07-29T09:54:16Z]
[2015-08-31 11:38:33,543][INFO ][node ] [Nox] initializing ...
[2015-08-31 11:38:33,613][INFO ][plugins ] [Nox] loaded [], sites []
[2015-08-31 11:38:33,648][INFO ][env ] [Nox] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/sda1)]], net usable_space [16.6gb], net total_space [18.1gb], types [ext4]
[2015-08-31 11:38:35,780][INFO ][node ] [Nox] initialized
[2015-08-31 11:38:35,780][INFO ][node ] [Nox] starting ...
[2015-08-31 11:38:35,858][INFO ][transport ] [Nox] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[es2/172.21.0.35:9300]}
[2015-08-31 11:38:35,891][INFO ][discovery ] [Nox] es-swarm/aiPPPqtsTrSIA1coRDG7hA
[2015-08-31 11:38:39,712][INFO ][cluster.service ] [Nox] detected_master [Captain Ultra][zvhqUtShRQeEKXMaQ4Wd8Q][es1][inet[/172.21.0.34:9300]], added {[Captain Ultra][zvhqUtShRQeEKXMaQ4Wd8Q][es1][inet[/172.21.0.34:9300]],}, reason: zen-disco-receive(from master [[Captain Ultra][zvhqUtShRQeEKXMaQ4Wd8Q][es1][inet[/172.21.0.34:9300]]])
[2015-08-31 11:38:39,757][INFO ][http ] [Nox] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.17.0.38:9200]}
[2015-08-31 11:38:39,757][INFO ][node ] [Nox] started

As you can see Elasticsearch used 172.21 network (a multi-host networking) and successfully set up a cluster even though each of es containers was created on different nodes (swarm-0 and swarm-1). Neither of the containers published any ports on the hosts which makes it super secure but pretty useless as there are no means to communicate with the cluster. :)

As I mentioned before there is a small bug in the experimental version of the docker that prevents me from reliably publishing the port on the host while using the multi-host networking. In theory I could set up another docker machine say on my laptop using virtualbox, connect it to swarm and then use it as ambassador publishing the port on the localhost. Alternatively I created an ambassador container on one of the swarm hosts using my socat container:

docker run -d --name socat -p 9200:9200 -e affinity:container==es1 emsi/socat socat TCP4-LISTEN:9200,fork,reuseaddr TCP4:es1:9200

To optimize network traffic I used -e affinity to spawn the socat container on the same host as es1 container which the socat utility is connecting to. Unfortunately due to a small bug the published port is not manifested properly when I invoke:

~# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
d4bcb7db0881        emsi/socat          "socat TCP4-LISTEN:92"   2 minutes ago       Up About a minute                       swarm-0/socat
cdbeb9f007bf        elasticsearc        "/docker-entrypoint.s"   28 minutes ago      Up 28 minutes                           swarm-1/es2
dc3c6967a59c        elasticsearc        "/docker-entrypoint.s"   28 minutes ago      Up 28 minutes                           swarm-0/es1

However the port IS published. You can test it like that:

~# curl -XGET http://$(docker-machine ip swarm-0):9200/
{
 "status" : 200,
 "name" : "Captain Ultra",
 "cluster_name" : "es-swarm",
 "version" : {
 "number" : "1.7.1",
 "build_hash" : "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19",
 "build_timestamp" : "2015-07-29T09:54:16Z",
 "build_snapshot" : false,
 "lucene_version" : "4.10.4"
 },
 "tagline" : "You Know, for Search"
}

And make sure the cluster is health:

~# curl -XGET http://$(docker-machine ip swarm-0):9200/_cluster/health?pretty
{
 "cluster_name" : "es-swarm",
 "status" : "green",
 "timed_out" : false,
 "number_of_nodes" : 2,
 "number_of_data_nodes" : 2,
 "active_primary_shards" : 0,
 "active_shards" : 0,
 "relocating_shards" : 0,
 "initializing_shards" : 0,
 "unassigned_shards" : 0,
 "delayed_unassigned_shards" : 0,
 "number_of_pending_tasks" : 0,
 "number_of_in_flight_fetch" : 0
}

That would conclude the procedure :)

One last hint is to install the marvel plugin:

~# docker exec es1 /usr/share/elasticsearch/bin/plugin -i elasticsearch/marvel/latest

P.S. If you ever experience anything like this:

Downloading boot2docker.iso from https://github.com/ahbeng/boot2docker-experimental/releases/download/754c104/boot2docker.iso...
Generating SSH Keypair...
Uploading Boot2docker ISO ...
Unable to find boot2docker ISO at /root/.docker/machine/cache/boot2docker.iso
Error creating machine: Incomplete vSphere information: missing /root/.docker/machine/cache/boot2docker.iso
You will want to check the provider to make sure the machine and associated resources were properly removed.

then try to download that iso file manualy:

wget $(curl https://api.github.com/repos/ahbeng/boot2docker-experimental/releases/latest | grep -o https://.*/boot2docker.iso)  -O /root/.docker/machine/cache/boot2docker.iso
Advertisements