Fix broken elasticsearch cluster

Thomas Decaux
3 min readJan 4, 2021

--

When you start an instance of Elasticsearch, you are starting a node. An Elasticsearch cluster is a group of nodes that have the same cluster.name attribute. As nodes join or leave a cluster, the cluster automatically reorganizes itself to evenly distribute the data across the available nodes.

But sometimes, clustering is broken (a node cant join a cluster for whatever reason) then the only solution you have is to remove the data folder!

Here some search references:

I dig a bit to find a more elegant solution than remove all stuff, after all, cluster info is stored as file, and came across https://www.elastic.co/guide/en/elasticsearch/reference/current/node-tool.html

The elasticsearch-node command enables you to perform certain unsafe operations on a node that are only possible while it is shut down. This command allows you to adjust the role of a node, unsafely edit cluster settings and may be able to recover some data after a disaster or start a node even if it is incompatible with the data on disk.

<!> Warning: the solution I will give you can cause arbitrary data loss <!>

As all the universe is just files, I suggest to do a simple copy/paste backup first.

Create a cluster

For this demo, we are using Kubernetes and the official elasticsearch Helm chart, from https://github.com/elastic/helm-charts/tree/master/elasticsearch.

Helm values.yaml

Run

This will create a 3 master nodes cluster “test”.

Break it

Edit values.yaml and change the `clusterName` value, then upgrade the Helm release.

Kubernetes perform a rolling upgrade of the StatefullSet, you can notice a pod dont get ready and complains:

handshake with [{10.233.80.53:9300}{A3vTBniwSfSfO8WvWeQiiA}{es-test-master-headless}{10.233.80.53:9300}] failed: remote cluster name [test] does not match local cluster name [test2]

which is normal, so it’s impossible to change cluster name like this!

As K8s wait for the pod ready before upgrading the others, nothing happen, cluster stay with the old name.

Fix it

The elasticsearch-node utility bin must be run when elasticsearch daemon is stopped, so we cant use it when pod is running. The (clever) solution is to stop elasticsearch daemon pod, to run a Kubernetes job that use its PVC as volume.

We are going to remove all running pods, keep data folder (by keeping k8S PVC objets).

kubectl scale sts es-test-master --replicas=0

Now we can use `elasticsearch-node` utility to fix the cluster, with Kubernetes jobs that can interact with existing data files:

`detach-cluster` remove clustering information, but let the node in MUST_JOIN_ELECTED_MASTER state.

unsafe-bootstrap let the node in a virgin state (https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/cluster/coordination/UnsafeBootstrapMasterCommand.java#L83),

After that, we have 2 nodes ready to join a new cluster with an elected master, and 1 node ready to be the master!

Re-create the cluster

Simply re-scale the Statefulset:

kubectl scale sts es-test-master --replicas=3

What about shard data?

Once node up, if you dont see all shards/indices, no panic! Cluster state will not automatically be refreshed, hence node can have “dangling” indices.

At v7.0, elastic team did an another awesome utility tool, available via API:

So, it’s really easy to recover missing data thanks to the import dangling index API:

Et voilà, a broken cluster have been recovered without removing any data folder, and no data (should…) be lost!

--

--

Thomas Decaux
Thomas Decaux

No responses yet