Hi, we are consistently seeing this issue after upgrading to luminous from jewel , the behavior looks like monitor cannot handle mon_subscribe from client fast enough, then we see high cpu (1600% + with simple messenger) for monitor cluster pg state changing slowly as OSDs cannot get latest map fast enough. in some cases like reboot an OSD node( 24 OSDs per node) can cause even bigger impact, OSDs even cannot update their auth in time and after a while we saw massive OSDs been marked down due to heartbeat failure, like 2018-04-11 21:19:24.772558 7f6bbb7f5700 0 cephx server osd.234: unexpected key: req.key=690bba2ca98774a2 expected_key=f63feaae2014a837 2018-04-11 21:19:26.539295 7f6bbb7f5700 0 cephx server osd.365: unexpected key: req.key=a0eb995e1bef1bf4 expected_key=bafe2e4d55a63478 There are a bit more details about the attempts we have made , in the ticket http://tracker.ceph.com/issues/23713. Any suggestion is much appreciated. Thanks. Xiaoxi -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html