Re: OSDs going down when we bring down some OSD nodes Or cut-off the cluster network link between OSD nodes

Venkata Manojawa Paritala <manojawapv@xxxxxxxxxx> · Fri, 12 Aug 2016 17:38:32 +0530

Hi,
As part of our testing over a period of time, we used a lot of parameters in Ceph.conf. With that configuration, we observed issues when we pulled down 2 sites as mentioned earlier.

In the last couple of days, we cleaned up a lot of parameters and configured only couple of mandatory parameters and we are not seeing any issues when we bring down 2 sites. FYI..

Thanks & Regards,
Manoj

On Sat, Aug 6, 2016 at 8:23 PM, Venkata Manojawa Paritala <manojawapv@xxxxxxxxxx> wrote:
Hi,

We have configured single Ceph cluster in a lab with the below specification.

1. Divided the cluster into 3 logical sites (SiteA, SiteB & SiteC). This is to simulate that nodes are part of different Data Centers and having network connectivity between them for DR.
2. Each site operates in a different subnet and each subnet is part of one VLAN. We have configured routing so that OSD nodes in one site can communicate to OSD nodes in the other 2 sites.
3. Each site will have one monitor  node, 2  OSD nodes (to which we have disks attached) and IO generating clients.
4. We have configured 2 networks.	
	4.1. Public network - To which all the clients, monitors and OSD nodes are connected 
	4.2. Cluster network - To which only the OSD nodes are connected for - Replication/recovery/hearbeat traffic.

5. We have 2 issues here.
	5.1. We are unable sustain IO for clients from individual sites when we isolate the OSD nodes by bringing down ONLY the cluster network between sites. Logically this will make the individual sites to be in isolation with respect to the cluster network. Please note that the public network is still connected between the sites. 
	5.2. In a fully functional cluster, when we bring down 2 sites (shutdown the OSD services of 2 sites - say Site A OSDs and Site B OSDs) then, OSDs in the third site (Site C) are going down (OSD Flapping).

We need workarounds/solutions to  fix the above 2 issues.

Below are some of the parameters we have already mentioned in the Cenf.conf to sustain the cluster for a longer time, when we cut-off the links between sites. But, they were not successful.

--------------
[global]
public_network = 10.10.0.0/16
cluster_network = 192.168.100.0/16,192.168.150.0/16,192.168.200.0/16
osd hearbeat address = 172.16.0.0/16

[monitor]
mon osd report timeout = 1800

[OSD}
osd heartbeat interval = 12
osd hearbeat grace = 60
osd mon heartbeat interval = 60
osd mon report interval max = 300
osd mon report interval min = 10
osd mon act timeout = 60
.
.
----------------

We also confiured the parameter "osd_heartbeat_addr" and tried with the values - 1) Ceph public network (assuming that when we bring down the cluster network hearbeat should happen via public network). 2) Provided a different network range altogether and had physical connections. But both the options did not work.

We have a total of 49 OSDs (14 in Site A, 14 in SiteB, 21 in SiteC) in the cluster. One Monitor in each Site. 

We need to try the below two options.

	A) Increase the "mon osd min down reporters" value. Question is how much. Say, if I give this value to 49, then will the client IO sustain when we cut-off the cluster network links between sites. In this case one issue would be that if the OSD is really down we wouldn't know.

	B) Add 2 monitors to each site. This would make each site with 3 monitors and the overall cluster will have 9 monitors. The reason we wanted to try this is, we think that the OSDs are going down as the the quorum is unable to find the minimum number nodes (may be monitors) to sustain.

Thanks & Regards,
Manoj

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com