Hi everybody, We have 3 monitors in our ceph cluster: 2 in one local site (2 data centers a few km away from each other), and the 3rd one on a remote site, with a maximum round-trip time (RTT) of 30ms between the local site and the remote site. All OSDs run on the local site. The reason for the remote monitor is to keep the cluster running if any DC fails. Is that a valid configuration ? What is the maximum RTT valid in such a Ceph cluster ? Here are some details about our running cluster: Current monmap: ----------- epoch 4 fsid ... last_changed 2015-05-12 08:39:35.600843 created 0.000000 0: <IP addr local0>:6789/0 mon.<local0> 1: <IP addr local1>:6789/0 mon.<local1> 2: <IP addr remote>:6789/0 mon.<remote> ----------- In our running cluster, the mon logs show that the "leader" monitor is on the local site, while the other two are "peons". Being curious, I increased runtime log-level debug settings for a few subsystems (ms, mon, paxos...) to see if there was some kind of heartbeat between the monitors. I noticed messages such as these ones... ------ 2015-07-01 07:01:05.840845 7fd569bbe700 1 -- <IP local1>:6789/0 --> mon.0 <IP local0>:6789/0 -- mon_health( service 1 op tell e 0 r 0 ) v1 -- ?+0 0x3b9b200 2015-07-01 07:01:05.840871 7fd569bbe700 20 -- <IP local1>:6789/0 submit_message mon_health( service 1 op tell e 0 r 0 ) v1 remote, <IP local0>:6789/0, have pipe. 2015-07-01 07:01:05.840885 7fd569bbe700 1 -- <IP local1>:6789/0 --> mon.2 <IP remote>:6789/0 -- mon_health( service 1 op tell e 0 r 0 ) v1 -- ?+0 0x3b98a00 2015-07-01 07:01:05.840894 7fd569bbe700 20 -- <IP local1>:6789/0 submit_message mon_health( service 1 op tell e 0 r 0 ) v1 remote, <IP remote>:6789/0, have pipe. ------ ... but none which tells me what I want: the idea was to see if anybody could complain about a high RTT, and to monitor that value. Any idea on how to do it ? Thank you. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com