Split-brain in a multi-site cluster

Ilia Sokolinski <ilia@xxxxxxxxxxxxxxxx> · Thu, 2 Feb 2017 11:01:21 -0500

Hi,

We are testing a multi-site CEPH cluster using 0.94.5 release.
There are 2 sites with 2 CEPH nodes in each site. 
Each node is running a monitor and a bunch of OSDs.
The CRUSH rules are configured to require a copy of data in each site.
The sites are connected by a private high-speed link.
In addition, there is a 5th monitor placed in AWS, and both sites have AWS connectivity.

We are doing a split-brain testing where we use iptables to simulate a cut in the link between two sites.
However, the connectivity to AWS from both sites is not affected in this test.

The expected behavior would be that one site will go down, but the other one will continue to function.

The observed behavior is as follows:

The monitors behave as expected:
   2 monitors in one site are declared dead, and 2 monitors + AWS monitor form a new quorum

The OSDs do not behave well:
>From the logs, each OSD can’t heartbeat to any OSD in the other site. This is expected.

However, the OSDs on “dead” site are not declared “down”.
Some of them go down and then back up, but mostly they stay up.

As result, all PGs are stuck in the “peering” state, and the cluster is not usable - no clients can do any reads or writes in either site.

Is this expected?
Are there any parameters that can be changed to improve the behavior?

Thanks

Ilia Sokolinski

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com