ceph-devel may be a better place for this... This looks related to the recent change that allows the mon to bind to a different address from the advertised address. Notice that the config below has different addresses for "public addr" and "cluster addr". Could this be causing paxos to take some time to settle? Another datapoint is that if there is only one mon (instead of three), the quorum only takes about 10 seconds to establish instead of 60s. ____________________________________________________ From: Travis Nielsen <travis.nielsen@xxxxxxxxxxx> Date: Tuesday, August 8, 2017 at 10:49 AM To: "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx> Subject: Mon time to form quorum At cluster creation I'm seeing that the mons are taking a while time to form quorum. It seems like I'm hitting a timeout of 60s somewhere. Am I missing a config setting that would help paxos establish quorum sooner? When initializing with the monmap I would have expected the mons to initialize very quickly. The scenario is: * Luminous RC 2 * The mons are initialized with a monmap * Running in Kubernetes (Rook) The symptoms are: * When all three mons start in parallel, they appear to determine their rank immediately. I assume this means they establish communication. A log message is seen such as this in each of the mon logs: - 2017-08-08 17:03:16.383599 7f8da7c85f40 0 mon.rook-ceph-mon1@-1(probing) e0 my rank is now 0 (was 1) * Now paxos enters a loop that times out every two seconds and lasts about 60s, trying to probe the other monitors. During this wait, I am able to curl the mon endpoints successfully. - 2017-08-08 17:03:17.345877 7f02b779af40 10 mon.rook-ceph-mon0@1(probing) e0 probing other monitors - 2017-08-08 17:03:19.346032 7f02ae568700 4 mon.rook-ceph-mon0@1(probing) e0 probe_timeout 0x55c93678bb00 * After about 60 seconds the probe succeeds and the mons start responding - 2017-08-08 17:04:17.356928 7f02ae568700 10 mon.rook-ceph-mon0@1(probing) e0 probing other monitors - 2017-08-08 17:04:17.366587 7f02a855c700 10 mon.rook-ceph-mon0@1(probing) e0 ms_verify_authorizer 10.0.0.254:6790/0 mon protocol 2 The relevant settings in the config are: mon initial members = rook-ceph-mon0 rook-ceph-mon1 rook-ceph-mon2 mon host = 10.0.0.24:6790,10.0.0.163:6790,10.0.0.139:6790 public addr = 10.0.0.24 cluster addr = 172.17.0.5 The full log for this mon at debug log level 20 can be found here: https://gist.github.com/travisn/2c2641a6b80a7479b3b22accb41a5193 Any ideas? Thanks, Travis -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html