Looks like crush_hash_32_3 is from prime_pg_temp and the bufferptr stuff is from reencoding compact osdmap. With the fact that client version is not easy to upgrade, what would be the best workaround for this issue ? should I add more monitors into quorum so that clients load can be more distributed? 2018-04-14 13:09 GMT+08:00 Xiaoxi Chen <superdebuger@xxxxxxxxx>: > Thanks Sage, that make sense, most of our clients are pre-luminous , > jewel taking 83% percent and rest are mostly hammer. > But we dont enable balancer , nor do `reweight by utilization` at all. > > I just `ceph osd down osd.0~9` , the osd.9 take a very long time to > _boot, during that period, the `crush_hash32_3` looks like the biggest > and reach to 30%+, and also a unnamed function ` 0x00000038e9a` also > takes 20%. > > sometimes buffer::ptr also takes 20% > 21.69% ceph-mon [.] > ceph::buffer::ptr::ptr(ceph::buffer::ptr const&, unsigned int, > unsigned int) ◆ > 20.60% ceph-mon [.] ceph::buffer::ptr::copy_out(unsigned > int, unsigned int, char*) const > ▒ > 15.77% ceph-mon [.] ceph::buffer::ptr::release() > > > > From 'top -H `-p pid of mon` ' , pipe_writers are full of the list. > > xiaoxi > > 2018-04-14 6:05 GMT+08:00 Sage Weil <sweil@xxxxxxxxxx>: >> On Sat, 14 Apr 2018, Xiaoxi Chen wrote: >>> Hi, >>> >>> we are consistently seeing this issue after upgrading to luminous >>> from jewel , the behavior looks like monitor cannot handle >>> mon_subscribe from client fast enough, then we see >>> >>> high cpu (1600% + with simple messenger) for monitor >>> cluster pg state changing slowly as OSDs cannot get latest map fast enough. >>> in some cases like reboot an OSD node( 24 OSDs per node) can cause >>> even bigger impact, OSDs even cannot update their auth in time and >>> after a while we saw massive OSDs been marked down due to heartbeat >>> failure, like >>> 2018-04-11 21:19:24.772558 7f6bbb7f5700 0 cephx server >>> osd.234: unexpected key: req.key=690bba2ca98774a2 >>> expected_key=f63feaae2014a837 >>> 2018-04-11 21:19:26.539295 7f6bbb7f5700 0 cephx server osd.365: >>> unexpected key: req.key=a0eb995e1bef1bf4 expected_key=bafe2e4d55a63478 >>> >>> There are a bit more details about the attempts we have made , in >>> the ticket http://tracker.ceph.com/issues/23713. >>> >>> Any suggestion is much appreciated. Thanks. >> >> My guess is that this is the compat reencoding of the OSDMap for the >> pre-luminous clients. >> >> Are you by chance making use of the crush-compat balancer? That would >> additionally require a reencoded crush map. >> >> Can you do a 'perf top -p `pidof ceph-mon`' while this is happening to see >> where the time is being spent? >> >> Thanks! >> sage >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html